06: Stem separation with Audioshake (Hashtag Beatles)
This is an automated AI transcript. Please forgive the mistakes!
Hello human listeners. I did it. I cloned my voice with the help of 11labs and
this is the result. That's me, the AI version of me. It is kind of okay,
but also not really close to my actual voice, I think. I guess the model needs
more computer power and samples, which means for me on the other side I would have
to pay more to have a better version. So let's say it like this. It is a good
start. So today I'm just playing around with it. It's not perfect until the day
when I have my perfect voice clone. But once playing around with it,
I also thought, well, I clone the voice of a German radio friend of mine. I got
his permission. His name is Andreas Müller and he is one of the major pop critics
in Germany. Together with his colleague, Martin Butcher, also a major pop critic.
They have a podcast called Pop nach Acht. So if you understand German or if you
have a great artificial intelligence that can translate German into English in real
time or whatever language you prefer, I recommend listening to their podcast. It is
a great conversation about music with a lot of humor and knowledge about music. But
please wait a bit because you are right now with me and my podcast, The Iliac
Suite. So here's the intro, spoken by my friend Andreas Müller.
This is The Iliac Suite, a podcast on AI -driven music. Join me as we dive into
the ever -evolving world of AI -generated music, where algorithms become the composers
and machines become the virtuosos. Yes, this music and text was written by a
computer, And I am not real, but I am and my name is Dennis Kastrup
I wanna hold the hand inside you
I
wanna take a breath that's true I
look to you and I see nothing
I'll look to you to see the truth
You
look your life, you go and shadow
You'll come upon and you'll go black Some
condonite into your darkness Colors
your eyes with what's not
♪ Fade into you ♪
♪ Strange you never knew ♪
♪ Fade into you ♪
♪
I think it's straight through the mud
The
♪ It comes on slowly,
a stranger's heart without a heart ♪
♪ You put your hands into your head ♪
♪ And when the smiles cover your heart
They damage you
Strange,
you may help me
Lay down to you I
think it's great to live
Lay
I'm to you,
strange you never move
I'm to you,
strange I think it's grey in there, I
think it's grey in there
Masystar with fate into you with the voice of Lana Del Rey, a user called MapRey
uploaded the version on SoundCloud, a song I used to love when I was younger and
this version is really well done. I love the Lana Del Rey voice on it. But to be
honest, these cover versions in which AI voices are used bore me a bit right now.
I mean, it was fun some months ago, but after many, many AI Elvis Presley and AI
Frank Sinatra tracks online, it is enough. So for this episode,
you will hear less of these versions. Instead, I used some AI -generated music from
Mubert to support the person speaking and in this episode, it will be Jessica
Powell, CEO and co -founder of AudioShake, a company that separates stems in music
with the help of artificial intelligence. They have been around quite a while now
and I had to think about them again when I heard about the new Beatles song "Now
and Then" or As the Beatles would probably put it, "I get by with a little help
from AI." For the song, they also had to separate the voice of John Lennon from
the piano in the background. The technology of audio shake does the same. The story,
how Jessica started, audio shake is fascinating and also stands for a lot of people
in the new AI music era. They all share a passion for music in the first place,
but Jessica tells you more about that. You know, I've worked in technology almost
all of my career, but music has always been a huge central part of my life,
either playing music, going to shows, interacting with people, building music
technology, being part of building music technology. It's been a kind of constant
throughout my both childhood and adult life. But I never actually thought of
combining it in a really, really specific way, meaning building something specifically
around or for audio. The way that that all happened was a lot more random,
2011 or so. I was living in Tokyo and I was doing a ton of karaoke along with
Luke, my co -founder. We did lots and lots of karaoke because it's kind of one of
the social things that you end up doing in Japan or at least back then, I think
still the case. And it's super fun, right? Like there is no better place to do
karaoke. But even then we were like, "Ugh, why can't we like karaokeed old punk?
Why can't we karaokeed old hip -hop?" What if you could just like and why are these
all re -records? Like we want to we want to like the number of nights where I
drunkenly did "Wonderwall" or "Brown Eyed Girl"
"California Dreamin'" like great songs, but you know kind of wanted a little bit
more variety and And I was like, "Well, what if you could like it'd be so cool if
you could actually just karaoke to anything and the original songs. And that was
just like a conversation that we probably had like a handful of, like whenever we
would do karaoke, it would be like, oh, where's this song? Where's that song? Oh,
it didn't exist. And then, you know, you go on with your life 'cause this is not
like the thing that like, necessarily, you know, changes your life overnight where
you're like, I must go build this thing.
And so that was just a thought that had been in our heads for a really long time.
Meanwhile, I'm back at Google. I end up doing a whole bunch of different things at
Google, worked across a bunch of different areas. My last role at Google, and in my
final years there, were solidly in communications. And I ended up running
communications for Google.
And when I left, I thought I was done with tech. Like, I thought I was never
going back to having meetings or thinking about like tech or regulation or any of
the things that you would have to grapple with every single day.
Never wanted to hear the term at scale again. Never wanted to have to think of, I
just didn't, I was done, I was really done. I thought I was gonna go do some sort
of cliched, like I don't know, go on a yoga retreat and find myself kind of thing.
But no sooner had I left that I started gravitating towards music and creating with
music, not with the idea of creating a company. It was just, I think, one of the
things that had
come at a cost to my professional life. And I'd had a great,
like, run at Google, and I was really lucky for all the opportunities I had and
learned a ton, but I also just worked a ton. And I had kind of stopped creating.
I had stopped playing music. I had stopped reading. I had stopped writing. Writing
was also something that was really important to me. And I'd stopped doing all those
things. And so when I left technology, that was the thing I wanted to get back
into. It was just, I wanted to feel more human again. And for me, feeling human is
like, it's something tied even if it's abstract to like creation. And so I got back
into all those things. I started playing piano again and I started writing again and
so forth. And as I was doing that, Luke was also kind of going through something
similar. He was at a FinTech company called Plaid and we were playing around with a
bunch of different ideas with some other friends too around music and different
things that could be done with music like remote collaboration and so forth.
And as we were in the process of doing that, we thought to ourselves, well, wait,
what if we went in the other direction and rather thinking about how to build up a
track? What if we could deconstruct it? Not necessarily the strict, I guess,
technical term of that, but just the conceptually, like how could we separate audio
into its components? And one of the first, like, I actually remember when Luke first
said it to her, 'cause he I think was the first one to be like, "What if we went
in the other direction?" I was like, "Well, why?" And then he was like, "Well,
remember karaoke?" And then we remembered like this complaint that we had. And there
was not like some, you know, wonderful, there was no exercise of sitting down and
being like, "Oh, this is what the market size could be "if you split none of
that," right? It was more just like, "Oh, let's go and build like a karaoke maker."
And it was entirely like a hobbyist thing. I think at the same time that we were
thinking about that, I was also trying to teach myself to bake bread. And I was
trying to like learn a new coding language. And also I thought that maybe I would
take up knitting. So like it was just to say that this was very, very casual. But
Luke had led data science at Plaid. He felt that the state of deep learning was at
a point where this could actually be done at a pretty high level of quality. And
so we started to work on it. And the first results were terrible. I remember we
separated Morrissey's voice from the Smiths and it sounded demonic because we had
done things wrong. But it was quite funny. But at the same time we were like,
"Wow, imagine if we had done this correctly." Like imagine this, this could sound
just a bit better, like what cool stuff you could do. And so it started this idea
of karaoke, but very quickly our heads started to spin with all these other ideas.
Like, oh, well, you could just, you could sample everything, like think about Jay
Dilla's Donuts and think about DJ Shadow and think about Public Enemy and thinking
about all these like sort of seminal sampling albums and works,
right? And getting expired by that. And then the thing that, you know, sitting in
the valley, at least, was probably a natural thing for us to think about was we
were like, oh, well, not only could you help with these existing kind of audio
tasks, but you could probably also help power totally new audio experiences at scale.
See, I said I was going to escape, like, that word and that idea. But of course,
like, you can, you know, take the girl out of technology, we can't take the
technology out of the girl I guess and like I immediately came back to you know
that concept again like oh if you could standardize all this audio what could you
do you know really in terms of entirely new audio experiences and that's how audio
shakes started. So after many nights in karaoke bars and probably many hardworking
hours audio shake was born but what exactly can you do with the AI? We separate
audio in order to make it more editable, accessible, interactive,
customizable. Basically, any kind of task where being able to get at the layers of
music or the stems would allow you to open that up to new possibilities. In the
new possibilities category, that can range from existing revenue streams today in the
music industry. So,
you know, helping labels create instrumentals so that they can represent catalog that
doesn't have stems, which is the majority of catalog. It could be working with film
studios to be able to
help them localize older content. So for example,
being able to take a one that we did is Dr.
Who. So the old, the BBC show, Dr. Who, all they had was the final tape.
They didn't have the dialogue track, the music and effects track, and we were able
to separate those so that they could keep the music, keep those great sci -fi
effects, but be able to get rid of the dialogue and replace it with a local end
up. So those kinds of things or we clean up audio for things like captioning. So
there's a ton of different workflows that range from very, let's say transactional
behind the scenes, making sure that your sports captioning is accurate,
even in a scene where there's a ton of crowd noise and a ton of music,
that the dialogue is being extracted for automated speech recognition and
transcription, those kinds of things which people might not even think about through
to more, say, consumer -facing experiences, like we just did something with AJR two
days ago where they had us and a producer trying guess at what is in their music
and their current single, right? And trying to figure out how they compose that, you
know? We did some of Green Day with a while back where they split a track 2000
light years away and uploaded the vocals the drums and the bass to tick -tock in
One go so that all their guitar playing fans could become the guitarist in Green
Day So we do all kinds of you really have a quite a range of cool experiences
That you can power when you can split audio up. So let's hear what Jessica is
talking about. She gave me the permission to explain that to you with the example
of the band Famous Yesterday who used audio shakes service. There is a band called
Famous Yesterday. They had been together I think either coming out of high school or
in college. They subsequently broke up but they'd had a record deal with a label
called Bonfire and Bonfire was approached by Taco Bell for a possible sync license
for a commercial where they really wanted to use famous yesterday's song Make You,
but they had no stems. And so they used AudioShake, they used our service AudioShake
Indie, I didn't even know about this at the time, they just uploaded it. And they
were able to get the instrumental and the acapella, which you'll hear now. And they
were able to land the sync, which was I think a lot of money and pretty exciting
exciting for the band.
I can't make you hate me I had you all alone from the start I can't make you
love me 'Cause I got a grip on your heart I can't make you hate me
(upbeat
music) Come close, you would be alright When
it's over, you would be all mine
There's your heart away somewhere in outer space Locked into your eyes and we can
never lose this feeling Girl, I know what you feel like Let me take you to see
the heart, yeah I can make you love me,
yeah 'Cause I got a grip on your heart, yeah I can make you hate me,
yeah I had you all in love from the start,
♪ I can make you love me, yeah, uh ♪ ♪ 'Cause I got a grip on your heart,
yeah, uh ♪ ♪ I can make you hate me, yeah, uh ♪
Make you from famous yesterday. You heard all the instruments and the singing
together But But the AI from AudioShake separated them. For that several hundred
thousand songs were used for practice purposes from various musical genres including
classical music and jazz, rock and pop were most frequently represented.
And this is how it sounds when the stems are elaborated. Bass.
(upbeat music)
Drum.
And what AudioShake calls it the other.
OK, we now know this is working, but how does the AI actually learn to do this? I
think in an easy way. It's not the same technical approach at all, but conceptually,
I think an easy way to think about what we do is to think about searching for
something on your iPhone or Google Photos app where you want to search for pictures
of a beach and you type in beach and all of a sudden all these pictures come up
of the beach.
And it's not because Apple or Google were necessarily ever told that those were the
beach, right? They've trained on thousands of images of the beach, presumably, and
they've created a concept of what the beach is. Now,
to be clear, again, it's not the same technical approach at all before all the AI
people come at me, but I think conceptually, that's a really easy way for people to
understand because what we're doing is we're training, not on beaches, but we're
training on thousands and thousands of real stems. And that's teaching our models to
understand the different qualities of stems, right, of music, of a guitar,
of a voice. And then we're able to separate the data. Yeah, we license or acquire
our data. And so that's how we've always operated. So months ago, I also talked
with Fabian Robert Stöter from the AudioShake research team and I wanted to know
more about how the data sets of these AIs work and learn. And he told me something
really interesting. There are still some classical instruments in pop rock music that
don't work so well because they're very versatile. An electric guitar doesn't work so
well yet it works quite well but not nearly as well as a piano for example.
Simply because electric guitars with amplifiers with all the effects, all the things
you can add. It's just very, very versatile. There are so many variations. So for
their AI, it is harder to learn how exactly a guitar sounds. The more they listen,
though, the better they get. And in the end, stays one question. Who is this all
for? What people use the service? We work with all three major label groups, a ton
of Indies, lots of distributors, Indie artists, major label artists, you know,
in the same-- and what's cool is it's accessible to everyone, right?
On the same day, like yesterday, Sia posted something to YouTube that we basically
created Lyric transcriptions and something called word alignment, which is essentially
almost like a karaoke style, word -by -word time stamping of a track. So we helped
her localize her videos into a ton of languages and had all the alignment and so
forth so she could reach all of her fans locally. So the same day that Sia is
launching something like this, we might have an indie artist using our on -demand
platform to split their own track so that they can, you they can hopefully land a
sink deal or something like that. But yeah, the uses, I would say, the most popular
uses on a one -off basis, meaning not used by apps and so forth.
But by artists and managers and labels, it would be sync licensing is a real big
one. You need to have instrumentals to be able to land to sync and sync for your
listeners who don't know is when you're watching film, TV, commercial, that kind of
thing. A lot of times the music you're hearing, the acapella is being lowered or is
not being used at all because they don't want the acapella to distract from what
the actors are saying. So sync licensing is a big one, creating immersive mixes like
the Dolby Atmos format where you have the sound placed in different perceptual
fields. It feels like it's all around you. That's another big use case, right? You
need the stems, you need the guitar stem or the drum stem. You need to put them
in different places, basically.
Remixing, like we had on the current SZA album,
one of the songs has an ODB sample from an old VHS tape. And they used AudioShake
to get at that sample. Similarly, in the other direction, a very, very famous album
that I can't name, they used the label used AudioShake to actually remove some of
the samples that they weren't able to clear before they put it on streaming
platforms. So you have it kind of going in both directions. Creating samples,
but all this is happening in a licensed context. Or removing samples that might be
problematic to clear. ♪ Hello darkness, my old friend ♪ ♪ I've come to talk to you
again ♪
♪ Because a vision softly creeping ♪ creeping left its seeds while I was sleeping
and the vision that was planted in my brain still remains within the sound of
silence
In restless dreams I walked alone
Narrow streets of cobblestone
Neath the halo of a street lamp I turned my collar to the cold in death When my
eyes were stabbed by the flash of a neon light to split the night
and touch the sound of silence
and in the naked light I saw
10 ,000 people maybe more
People talking without speaking. People hearing without listening.
People writing songs that voices never share.
No one dare disturb
the sound of silence.
Fool said, "I, you do not know." Silence like a cancer grows.
Hear my words that I might teach you.
Take my arms that I might reach you. But my words,
like silent raindrops fell
Echoed in the wells of silence
And the people bowed and prayed
To the Neon God they made
And the sign flashed out its warning In the words that it was forming And the sign
said the words of the prophets Are written on the subway walls And Tiananmen Halls
whispered in the south
I told you in the beginning of this episode that my passion for all these voices
that have been changed to famous musicians who then sing songs from others has
declined or vanished but I also told you that there are still some little nice
examples like this one. Johnny Cash sings "Sound of Silence" from Simon and
Garfunkel. I found this version from the user Vicarios and that reminded me of that
article I have recently read in Wired in which the cash manager Jos Metas or Jos
Metas was asked about all these AI cash versions online, all these covers and I
quote him. I'm not sure that it was Johnny Cash's intent to have his voice
manipulated to sing barbie girl. So he mentions the famous little piece that went
viral some weeks ago on TikTok and according to Wired, the manager added,
"The current crop of AI songs are parodies, and as the work of hobbyists, they are
not worth pursuing over any potential copyright claims." Which I think is a really
nice attitude towards it. In connections with AudioShake, their AI could theoretically
also help to separate or erase a voice from a track and then add a different voice
over that. Jessica, though, does not consider this as an interesting use.
- People use it, not AudioShake, but people use source separation all the time as
part of the workflow for vocal conversion. Yeah, absolutely, right? I mean, the sort
of infamous Drake weekend collab, the fake Drake song as they call it,
it was done that way, right? They would have taken, they would have used vocal
isolation to get the vocals from Drake or from the weekend, then you would do the
whole conversion process and everything. And yeah, sound separation is a part of
that. We, Because we don't have an on -demand platform where anyone can just split
anything, you wouldn't be able to come onto our platform and split Drake. You would
need to be on our enterprise platform and one of the labels.
So we're not, I think, part of those workflows, but yes, it's very much applicable
there. We have worked on specific projects with labels or artists around doing that.
The things that I think have been kind of the coolest to see are actually the
things that are not actually necessarily tied so much to taking a specific person's
voice and changing it to someone else's or something like that. I think I'm more
interested in or inspired by uses where people are using isolation or removal as a
way to like build on top of it, but with original content.
So by that I mean, for example, someone taking a popular track, converting it to an
instrumental and then freestyling on top of that or singing on top of that. Or a
friend of mine had recorded a track that was a guitar and saxophone.
And this is moving away from vocals, but you could easily, this would be applicable
with a vocal example as well. He was interested in what would it sound like if
that saxophone hadn't been a saxophone, but in fact had been a voice. And
specifically he was interested in hearing a female voice on that track. And so like,
I remember helping him do that. And like, and that was really cool. Like what came
of that, what it sounded like as output,
wasn't necessarily something that you would use commercially, right? It wasn't totally
there. A lot of these technologies on the generative side are still very much works
in progress. So they're super impressive and they just are getting better and better
by the day. But what I thought was so cool about it and what he said was he was
like, "This is so interesting because in just a few seconds," it was, by the way,
a few seconds for him, but more time for me 'cause this stuff is not turnkey, but
anyway. But he was like, "In just a few seconds," I was able to reimagine my work.
And it made me wonder, maybe I should actually re -record this piece with a female
vocalist, right? And it was giving him this sort of shortcut to something that would
have been very hard for him as an Indian musician to take on himself,
right? Go out and find someone to, like he just wouldn't have done it. And it
wouldn't have been something he could play with because would he invest the money in
going and exploring the female if not even knowing whether that might be promising,
you know? And so I think that that's where I get really excited is how do you
bring like the human creativity piece of it? And how do you open up new sort of
doors or whatever the cliched new metaphors we will use to talk about AI? How do
we like, how do you, but literally like, how do you open up these possibilities to
people By using these tools that that's the kind of stuff I get excited about more
than just saying. Oh, hey Can we take Beyonce's voice and change it into Ed
Sheeran? I searched intensely. There is no AI cover that changes Beyonce into Ed
Sheeran so far So if you enter that stuff go ahead do it send it to me and I'll
play it the next time but I used Newbert the AI that generates music based on
prompts And this is what came out when I put in the words "Ed Sheeran" and
"Leonze".
We're really excited about being able to help solve or contribute to solving this
really, really hard problem of separating audio and then combining our expertise with
people that are solving other problems, right? So if we can help be a piece or the
back and just the audio components of someone's experience. That's super exciting for
us, right? We get to work across so many different areas, film and TV and music
and sports and transcription. Like it's really cool to see the things that people
bring. And like, I know it's funny and probably particularly to the creative
sometimes to hear someone working in tech talk about how creative it is. But I
really think it is. Like, there's so much creativity that comes in creating a
technology. And then the thing I personally really enjoy is how do you bring what
you've built to other people and let them kind of go wild with it and do really
interesting things with it. So when I think about our roadmap and what we're trying
to do, we just want to make it easier and easier for people and developers, rights
holders, to artists to artists to create with audio, to build on top of audio,
and make it as fast and computationally efficient, which I know sounds super
uninspiring, but is super important with AI, and 'cause these models are just very
large. And so, how can you make it as easy in all kind of senses of the word for
people to create and build new things. And so if we can help be part of that
infrastructure,
that's really motivating for us. (upbeat music)
- Thanks Jessica Powell for the last words about what AudioShake is aiming for with
their AI to separate stems. It was a pleasure to talk to you and dive into this
amazing technology that in my opinion will enhance the creativity of many musicians
in the coming years. I can't wait, we have seen it with Beatles and we will see
more, I'm pretty sure. That was the new episode of the Eliex Suite. Thanks for
listening humans, take care and behave.
(upbeat music)
Creators and Guests

