19: Transcribe Music with Klang.io
This is an automated AI transcript. Please forgive the mistakes!
This is the Iliac Suite, a podcast on AI -driven music. Join me as we dive into
the ever -evolving world of AI and music, where algorithms become the composers and
machines become the virtuosos. Yes, this music and text was written by a computer,
and I am not real, but... I am and my name is Dennis Kastrup.
Hello, humans! Five, zero, zero, zero, zero,
zero, zero, zero,
zero, zero, zero, zero, zero, dollars. Let that sink. Inwards,
five hundred billion dollars. And when I say this to you,
You probably realize pretty quickly that I'm talking about a Stargate the investment
of three major AI Companies that happened last week to be named in person open AI
with CEO Sam Altman, Soft Bank with CEO Masayoshi Son and Oracle with Chairman Larry
Ellison All those three people were standing next to President Trump,
while Elon Musk was raging about that in the media afterwards, 500 billion.
Okay, I get it, because that is the future and it makes totally sense to invest in
AI, but more and more I get the feeling that we forget about the urgent problems
of our times in societies in which people are struggling with everyday life.
I mean, shouldn't that be the first investment? Says the person talking about AI in
this podcast. This is The Iliac Suite, a new year about AI and music is coming, is
starting today. Hello to all of you, I'm happy that you are still with me after
2000 and 24 in AI and music and it starts with a quote that shook the music world
Suno AI CEO Mikey Shulman said in the podcast 20 v c quote It's not really
enjoyable to make music now It takes a lot of time. It takes a lot of practice
You need to get really good at an instrument or really good at a piece of
production software I think the majority of people don't enjoy the majority of the
time they spend making music. And because it was not enjoyable for me to search for
the real quote in that podcast, because it takes a lot of time to listen to all
that, I just took an AI to reread it for you. So, is it a scandal,
what he just said? Of course not. What What did he say? I mean, he is just
selling his product. Imagine a washing machine producer saying, "Oh, people should
really wash their clothes with the hands because it is so much fun." That would be
stupid. I mean, he would literally tell the people to stop buying his product.
Although washing your clothes with your hands is really no fun. I can tell you
this. And making music is on the other side so much fun for people who love it.
Of course, Shulman knows this, and of course, he knows that saying what he said,
the web goes wild and talks about it all over the place. Everybody wants to discuss
this. Another free advertisement for Soonu. Talking about Soonu, finally, after some
words before the German Gamer sued Soonu and Udyu because they claim both companies
copied works. They are representing. The Gamer is the German version of the ASCAP,
that organization that collects and distributes royalties for songwriters, composers,
and music publishers in the United States. Let's see where this will lead us in the
end to be continued. I keep you updated here, but what do I have for you today in
the Eliak Suite? I was hanging around the wonderful Rippermann Festival in Hamburg
last September, a showcase festival for a lot of great bands, where you also talk
about new trends in the music industry. And it happened on a warm late summer
evening that I met Sebastian Murgul. He started a company that transcribes music to
notes. I wish I had recorded that conversation back then because it was so
fascinating and interesting. Well, we did this conversation again. We tried to copy
it again. So this is the Eliak Suite about Klang .io, the company of Sebastian.
This is the recording of our conversation. And this is Sebastian. He tells you more
about Klang .io.
Yeah, the CEO and co -founder of the startup Klang .io. We are currently based in
Karlsruhe, Germany. And what we are doing is we are using artificial intelligence to
transcribe audio from music as a music from audio into sheet music.
So written down, notated music. So this means you can just upload an audio file or
record yourself and get sheet music right away within just a few seconds.
Well, what was my way to get there? So I'm kind of a hobby musician myself.
I had guitar lessons when I was six for like 12 years or so and and I was doing
more like the classical guitar stuff and flamenco and so on and So music was also
always very important for me and
I'm also a technician because I studied electrical engineering here at the KIT the
University here in Karlsruhe and And I was always wondering how could music be
combined with signals, with computer science. Because in the end,
music is also not only art, but also math. And you can calculate music.
There's a lot of algorithmic music outside and out there and this was always
something that was very interesting to me. But how the initial idea for Klang Iow
came to my mind was actually inspired by my little sister who plays the keyboard
and it was in live 2015 when she got her new keyboard and on that piano there was
a little melody stored that she could just listen to with a playback function but
there was no way to get the notes that were played or the keys she should hit to
display them and so she asked me if I could help her and as an electrical
engineering student who does not have perfect pitch I took an oscilloscope app,
recorded parts of the musical piece and then used it to measure the frequencies that
were used and then calculate the notes pitch and write off that down onto musical
paper and so this whole process took me like a lot of a lot of hours and then I
finally got there and I asked myself isn't there an easier way to achieve that and
this was the way yeah my passion started for music transcription I and Yeah,
searched in the app stores if there is something similar available, but I haven't
found any app that could do that at that time but also I heard about something
like Fourier transformation, which is a mathematical method to figure out the
frequencies that are present in a signal, an audio audio signal and I got myself in
there and learned more about signal processing, about the math behind that and it
all got me hooked and I just got lost in that topic and then the whole AI thing
came up and it was in 2017, 2018,
when the first AI deep learning frameworks came up,
and I thought to myself, is there a way to connect these two topics?
And it's very interesting, a lot of research accelerated since then,
and a lot of new AI models came up, and they all got available by,
for example, being open source or being proposed in papers that I could read as a
student. And it's a very interesting topic. Yeah, very interesting.
And like what, first of all, for listeners to understand what Klang, K,
L, A and G means, it means sound in German. And it's called Clangio,
like I owe in the end. So it's, it's connected to sound. I just wanted to point
it this out, because it's really interesting that you use that word. So this
application now is running. What were the challenges from the moment on when the AI
was coming up, maybe to summarizes in a simple way, how did you step forward from
that moment on and I also know you have different products you offering and we get
into this a little bit later like what were the challenges and how do you were
able to face those challenges and now be able to offer that product you have.
Since music transcription is a very complicated and very complex problem,
The first thing I did with my signal processing background is divide this problem
into sub -problems that I could solve. So we don't have an end -to -end model that
gets audio and output sheet music. We have an AI system that solves different
subtasks like, for example, identifying sounds and when they were played and define
the pitch of these sounds, these notes, and analyze the rhythmic structure of the
music. So for example, which time signature do we have, what is the tempo,
when does a musical bar start, especially with time in seconds,
to all get synchronized and fused with harmonic information,
like the musical key and the chord that we're used, and all this information get
then fused into a readable and playable sheet music. So we don't have one model
that does all of that. We have a lot of models that solve these sub -problems.
And this is also the reason why we are able to just swap out these building blocks
and create new applications for new instruments. For example,
we have an app for piano, piano to notes, we have an app for guitar, guitar to
taps, but there is also an app for drums, drum to notes, and drums is a whole
different sound. You don't have the harmonic sound with a fundamental frequency and
overtones that are integer, multiple of the fundamental frequency.
So these horizontal structures that you can see in your spectrogram. So spectrogram,
this is an image that shows in the x -direction the time,
and the y -direction is the pitch. And this is the foundation of audio analysis.
You always, or in most cases, first calculate these images and then do something
similar to image processing, which did a lot of big jumps in the 2010.
Okay, it's really interesting to listen to you, because yes, I think music is very
complex. Music has so many layers also. Let me get this right,
if I got this right, because there are so many models working on this. Do you,
first of all, when there is, for example, like there's a piano sheet, yes, you can
identify the piano notes and you can put it into notes.
Is it possible also to put in like a whole song and get the notes out of this,
because the whole song has different layers of music, piano, guitar, bass, whatever
and is it possible first and if so do you have to separate the different instrument
before?
That's a lot more difficult than the easiest thing you could have is just a piano
playing a tune and maybe it's a monophonic, so only one note is playing at a time,
melody and you figure out the notes for that. And when you have not only just a
piano but also like vocals and drums and bass, it gets very complicated.
That's something that we were not able to solve for a long time, but finally in
December we launched our new app, the Klangio Transcription Studio,
which enables you to just do what you said, input a whole song and get the score
with the parts for each instrument. In just one score, in just one step,
you don't have to do some source separation in advance, you can just input the mix
and get your sheet music out of that. But what's always possible is to do some
soul separation before, like with tools like Moises, and you get the stems for your
instruments, and then upload these stems to our apps and get the sheet music from
there.
So let us take us by the hand a little bit. Like, let's say I have a melody,
I have something I want to transcribe. How does it work practically when you go on
your app or your homepage? What do you have to do?
Let's say you have your melody, your music as an mp3 file.
You just go onto our website there's a drop field where you can just drag and drop
the audio file into for example in the new transcription studio. Then you get guided
through our transcription wizard where you can select the instruments that you want
to transcribe then you can enter if it's a solo or a multi -instrument recording
because we do some advanced filtering if it's a multi -instrument recording in
advance. Then depending on the musical instruments you have some more information you
can input like for piano we have two types of models one more optimized for
classical sheet music, one more optimized for pop music and you can enter if you
want some more detailed sheet music with like pedal activity or if you're just
interested in the notes, the bare notes. If you're using guitar there are even more
advanced options you can choose if you're having a finger picking recording or
strumming recording, you can define the tuning that should be used if there's a capo
and then the AI takes on from that and does the rest for you.
And like for for who is it now like who needs a sheet of notes from a music
that's maybe probably already out there and people who already have the sheets don't
need it that's for sure but who needs that kind of software then We thought it is
more like a niche application, and only a few musicians, maybe composers,
needed, but at the time we launched it and we got feedback from our users and were
able to identify the users, we found out that actually there are a lot of people
interested in music transcription. So most of our users are hobby musicians that want
to find out the sheet music, the notes from existing songs and in most of the
cases the sheet music isn't available anywhere. So you would either have to write
down the notes down by yourself or use an app like ours to figure out the notes
for you. But there are also
professional musicians, artists, music producers, transcribers or music teachers that
are using our app. For example, music teachers play in a specific exercise for their
students, get the sheet music, give that to the students and they can exercise at
home. Is that on the sense like I'm just wondering
is it legal to do this? I'm just asking in the sense like, is it legal to take
any piece of music and transcribe the notes? I have no idea to be honest. It's
just a simple question I never thought about. Is it, yeah, is it possible to, I
don't know, if I listen to somebody and I write it down and then I don't know
what I can do with it, I have no idea. But is there some legal,
Yeah, like things going on around this I'm not aware of you can tell me right now.
That's a very important question and of course we have some lawyers and legal
advisors that are researching this case or did some research in this case for us
and but in the end it comes down to If you're using the transcription for yourself,
so privately, it doesn't matter if you own the copyright or not. If you want to
publish the transcription, you need to have the copyright for the sheet music.
So for us, it's like we are a tool that you can use to create the sheet music
and in our terms of use we say okay you have to own the copyright if you want to
publish it but if you just want to use it privately you can just do so.
I've talked to so many artists in my life and not a lot some of them keep telling
me also oh you know what I am I'm a musician since 30 years,
but I never learned to play with notes. I never did this before in my life, and
I'm sometimes really surprised because those are really well -known musicians. Is that
also something for these kind of musicians that they say, "Okay, I can finally see
kind of what I'm doing?" Is that interesting for them to use the application to see
their notes and their piece of art in a different way. Absolutely, we make music
visible and you don't only have the sheet music as PDF or something like that,
you have a digital music score and you can also view it in alternative ways.
Like there's a viewer for piano roles in our apps, so you can just see the piano
blocks, the note blocks falling down on the keyboard and you see where you have to
press. So if you don't read sheet music, you can also use that. And of course,
a lot of guitarists don't know sheet music and are more into tabletures.
That's why we also support tabletures. But a lot of people come to us and ask us
if we would like to support some alternative views on that music.
Like for example a flute and you have some guitar hero like you where note blocks
fly over the holes and you know when to press and you can and interactively
practice, but to be honest as a business of course it comes down to what is
actually used by most of the people and who is willing to pay for that feature.
Do people still exist who do this as a job? Like transcribe music? I have no idea.
Can you help me there? Yes, yes, yes. There are big companies that do this most of
the time together with freelancers that listen to the music for you and write down
the notes. There is also a huge section on Fiverr and I know that some people on
Fiverr actually use our tools to create a first version, then they do some fine
-tuning, make it a little bit more pretty and then they sell it to their customers.
You just described probably how so many people use AI these days.
They use it for a first version and then they work it over again for the perfect
version, the human touch. That means on the other side, that means on the other
side, that there are some mistakes in your sheets, right? - Yes, there are definitely
some mistakes in our sheets. It strongly depends on the quality of your recording
and on the musical complexity. For training our AI, we currently absolutely totally
rely on synthetic data that we generate ourself from scratch. So the realism of our
training data is somewhat limited. What we are currently trying to do is build some
sort of
data set as an investment form where musicians can just record themselves and with
some given lessons or given sheet music. Then we use this recording, analyze the
quality if they are suited for our data set and if it's suited they get included
in our data set and then they get a revenue share on the app,
where the model is used.
What's very interesting about that is that we don't need big artists,
we don't need professional musicians, we need real recordings. That's why we
collaborate with the Music University here in Karlsruhe. Well,
a lot of students are to have some some job where they get some more money to
financial there to finance their studies and set something that I find very
interesting to include musicians or the music of musicians in data set and let them
have a share from the model that you create with it. For me, Data sets are a lot
of the times, millions, I don't know, millions of songs to train something and you
tell me you don't need that many, right? No, because we do this extensive pre
-training with our synthetic music, we just need fewer but more realistic and more
diverse recordings of real humans playing. - Oh, well, okay. Well, that's a big
advantage. Talking about real humans, is it also possible for you to, if I would
sing something right now, that you can transcribe it? - Absolutely. With our app,
Sing2Note, that's no problem.
To be honest, our app is really, yeah,
honest to you. And if you don't hit the pitches it will say you so.
Okay so what does it mean so I sing and then it looks at it and tells it to me
or I sing and does tell me while I'm singing I'm not a good singer. No no you
sing you record it with the app and then you get the sheet music from it that's
the whole thing and there is also a playback so you can listen back to the
original audio and to the transcription and then sort of compare if it really is
the same for you and since it's singing is very difficult and hitting the right
pitch is also very difficult because especially I'm not sure if you're the trained
singer are you? No I will not thing for you that's for sure I was just imagining
because this is something I thought about for this app is really nice because there
are so many musicians who record themselves and they do this I don't know at 3 o
'clock going home from from a party and the melody comes in and these days it's
beautiful because you can just sing into your phone and they with that they can
already just put it down into notes and Maybe also, because of this, it's easier
for them to create a whole sheet where other musicians can play the music and
already join in to play a big thing,
which is a nice example of how to use your sheets. Actually,
there was a music producer
and LA that reached out to us who had some music notes.
It was like 3 ,000 music notes of him playing the piano over the last 20 years.
And he reached out to us and said, "Hey, I have some melodies,
some ideas, and I collected them over the last 20 years. But now I want to create
a new album and can you help me figuring out the MIDI notes so I can produce it
with my DAW software. And he sent us all the recordings. We returned the MIDI files
and then he created his new album. It was a very cool story.
And is there something like in in the application which works better than the
others? Let's say like piano melody. I'm just looking at the home page here. A drum
note is different singing. Which one? Which one? What works the best?
- What is the easiest way for an AI to transcribe? - Piano is absolutely the easiest
way because your instrument is tuned by a professional, so it's in tune and you
have clear transients. It's a clear hit when you hit the key.
And you,
Yeah, but that's it. I think and the only thing that's more difficult for piano in
piano music is that you can play more tones at the same time for example for
guitar music you can Play only six strings at the same time with a standard guitar
with a piano You could in theory play 88 keys at the same time But if that makes
sense in a musical way I'm not sure. And when you have these transcriptions can you
also work on the sheet then like arranging well I guess you can arranging the the
notes differently because you realized okay this is not the right note and yeah and
we actually integrated a sheet music editor in our apps which helps you solve the
most common mistakes that our AI does. One example is there might be an overtone
over note sound at the same time that we mistakenly identified that you can just
delete with one click. Or another example is sometimes the AI does not pick up the
is the pickup bar at the right position and the whole sheet music is shifted with
like one beat before or after and you can just correct that with just a click.
And then you get your music XML file which you can just download and use with your
other tools that you use and with sheet music and you can do whatever you want.
You just mentioned a couple of questions before that you said, like,
oh, there are existing companies who work in that field of transcribing notes.
How do you feel about this? Because this is an ongoing question in every kind of
surrounding concerning AI, if it's if you translate something, if you make music,
if you make like music for films or picture generating,
how do you feel personally about this? I mean, I know you're doing your business,
but is that something that not concerns you, but did you think about that you take
away jobs with your work? I do think about that quite regularly, to be honest.
And be honest.
I spoke to a lot of musicians, artists and also transcribers that do that for
living and asked them, "Hey, what do you think about that?
Is that something that worries you or is it something that you say, "Hey, there's
enough transcriptions in the world so that everyone can do their share and I think
right now it's so that there's a little quality difference between professional
transcribers and AI apps and the professionals can provide the perfect sheet music.
The AI delivers very fast and only to a certain quality level.
And right now it's a perfect combination because the professional transcribers can use
our tool to create a first version, then do the fine -tuning and deliver that to
their customers so they can transcribe faster. Typically a professional transcriber,
it takes them like eight hours or so for a three -minute piano recording so it's a
very time -consuming process and if he can reduce the time with our app to like
half an hour or one hour it's a huge time saver for them so it's a win -win
situation but if you think more into the future and if our app would be able to
create really perfect transcriptions with no mistakes.
I think the jobs will of course shift a little bit and of course it's one way a
musician could make money
that gets reduced. - But On the other hand, you create jobs for the,
I don't know, you call it jobs, but for the musicians who play the music,
it's what it's, I think it's not a well -paid job. It's just some add -on money,
I guess, right? It's not a job to play for you and give you the sounds. Yeah,
it's an add -on money, more like you can invest with your music into the model and
you get your share and maybe that's something that cancels out. But of course, this
offer is some sort of locally limited.
You need to know that we are collecting recordings and you need to come to or we
have to work with you and there's definitely not enough jobs or recordings that we
would like to make for every musician out there.
Talking about the future. Maybe you can tell me also, is there some instruments that
you think they're not possibly, they're really difficult for us? I mean, you have
those on your page, that's okay. But maybe there's some other instrument because the
world is full of instruments. And maybe also, if there is a different in the
genres, like, is it more difficult to, I don't know, to, Is it more easy to put
down Schlager piano than, I wanted to say heavy metal piano,
but it doesn't really exist. But you know what I mean? Is there other differences
in, of course, in the instruments that are more easy, or you just said it, you
talked about this with the ones you have on your page, but you want to do and
you're not able to do right now? And also maybe in musical styles, Did you think
about maybe music from India is more difficult to transcribe? Is there some relation
there or not? There are some instruments that are more difficult than others like
for example violin, because violin does not have that clear transient like a piano
or a guitar. There are also different genres that are more complex like you
mentioned heavy metal, metal guitars are very fast playing, a lot of distortion,
that's really difficult. There are also challenges when it comes to produced songs,
because there is a lot of mixing, a lot of filters that you apply to your
instrument and right now it works the best if you have an acoustic version, a clean
recording with all the instruments that is not heavily edited and but for example in
EDM music and that's not the case it's hard to identify and what is the bass or
what is the the lead instrument and it's not the the And the real instrument that
you have to detect more the functional instrument, like is it low tone or is it
some sort of higher catchy melody that is playing? But when we look a little bit
further, right now we are focusing on Western music. There is also Makam music In
the Arabic world, there is all the "bid -deh -gah".
Not half -tone, but microtonality. It's what it calls some sort of microintervals,
where you don't have half -tone relations between your tones, but like seven pictures
between a full -tone step and that's something that which is very common in the
Eastern world and that's something that a lot of applications don't look at.
Okay yeah so there are differences worldwide differences in the music which are
challenges for your application that's that's interesting yeah of Of course, and did
you ever check if they can transcribe AI music pretty good?
That works surprisingly well. If you put something in from Suno or Udio,
you get really good results with the Clangio Transcription Studio.
That's something that might be a cool application in the future, okay? But yeah, but
I was thinking it works good because Sunoo and AI just work on They just generate
kind of clear sounds in the way that they have learned clear sounds That's what
what I was thinking, you know, they they they have they have just this I don't
know mathematical rules which they apply and of course in AI who knows these
mathematical rules now is how to apply these mathematical rules on other AIs.
So that's why it is not so surprising for me. But you say it surprised you.
It surprised me because I thought there are a lot of artifacts in the music because
it's not perfectly generated. I think, especially for vocals,
you're sometimes able to hear out this artifacts. And it sounds a little bit strange
to the ear. Of course, they got better over the last month.
But I think it's very funny that an AI creates music for an AI and that's the use
case. - Well, that's what we are heading to AI using AI for AI to use AI AI AI
that's where we're what's we're heading to that's that's that's what I see on the
horizon one one last question I have I really curious about this also let's let's
talk about the great pianists in the world I mean are is it more challenging for
your AI to transcribe the better that you play Let's say a Glenn Gold piece.
Is it more difficult, the more complex you play, or the better you play, or is it
easier because they just play so good? There are two sides that you have to look
at. Of course music, on the one hand, music by these world class pianists is very
complex and they are flexing. They do cool stuff and it's very complex and
complicated and difficult. But on the other hand they have the ability to perfectly
perform these piece so it is very tightly played.
So this makes it easier for the rhythmic analysis but since it is might be faster
and more tones played at the same time it might be harder to detect the note that
were played. On the guitar on the other hand if you look at flamenco players they
are able to perfectly play.
Flameco players are able to play very tight and very clean, which makes it a lot
easier than amateur guitar player playing an easy piece.
So it really depends on the musical instrument. What is the future view of what you
want to transcribe? I mean, we already touched it a little bit, like,
is there some things we can... I mean, you said you relaunched this, like, where
you can put on a whole song, but is there some things on the horizon? Yeah,
which would be interesting.
Personally, right now, I'm just about to finish my PhD. And my PhD is focused on
guitar music transcription, so That's something that I put a lot of effort into,
transcribing all the nuances of acoustic guitars. And when I finish this project,
I can focus more on other new stuff. And something that I personally find very
interesting is choral music with multiple singers singing at the same time.
And also EDM is very interesting where you have more sort of functional view on the
music and you display other stuff.
But also supporting more instruments, especially more niche instruments like,
for example, trumpet or other brass instruments would be very interesting.
So there's a lot to do in getting broader, supporting more instruments,
but also going more in depth and supporting more features,
more details in that specific kind of music. Good luck with that.
Thank you.
That was the new episode of the Iliac Suite. Thanks, Sebastian Murgul for talking
to me about Klangio. Go and check them out. They are doing a really really good
job and just to be fair, there are also other AI's out there, which do the same
thing. But I highly recommend Klangio. If you have any feedback, if you want to
tell me something, if you're mad, if you're sad, if you're happy, if you want to
just talk to me, write me an email. The address is mail@theiliacsuite.com,
mail@theiliacsuite.com. I'm looking forward to hear from you what you think,
what you want to say. And also follow this podcast, like share and tell your
friends maybe also your neighbors next door ring at their door bring them a beer
and have a nice chat I mean a real chat with a real person this way you will
also have a nice evening and you can tell them listen to the iliac suite if you're
interested in AI and music that's it for today take care and behave humans
Creators and Guests

