09: "Fairly Trained" datasets!
This is an automated AI transcript. Please forgive the mistakes!
This is The Iliac Suite, a podcast on AI -driven music. Join me as we dive into
the ever -evolving world of AI -generated music, where algorithms become the composers
and machines become the virtuosos. Yes, this music and text was written by a
computer, and I am not real, but... I am and my name is Dennis Kastrup.
Hello humans, here we are again. Welcome to another episode of the Iliac Suite. How
are you all doing? Still keeping pace of what's going on in the world of AI? It
has been a wild ride since 2022 when mid -journey shocked us all and chat GPT came
and said hello. I am your best buddy now. This speed is crazy. Also in the world
of music, I'm honest with you. I'm currently working on a big article about AI and
music for a German organization There are so many subjects to cover that I always
have the feeling once I open the door Another door appears on the horizon. It is a
never -ending story sing it Freddy
Sorry, I'm a kid of the 80s. I love this film with Bastian Balthasar -Buchs and his
White Dragon Falcor. Music like this brings back memories, even if it was the AI
version of Freddie Mercury and not the original one of Lima. Talking about
Neverending story. One thing that crossed my way in the last years every time I
talk about AI and music is the question is it an infringement of copyrights when an
AI learns with data that comes from music? The big companies say no. The artists of
course say yes and as the governments and courtrooms will probably take years to
answer this question we stay in the grey zone. One man did not want to continue
like this at Newton Rex. Last year, he left stability AI, whose AI can be used to
generate music, among other things. And he left it with the argument that these
companies steal too much from artists by using their data without paying them.
Now he has launched fairly trained to contact this, a platform on which companies
get a license for data sets in which only fair -used music is taken. We will dive
into this today here in the Iliac Suite. I talked with Ed about that a couple of
weeks ago and wanted to know from him how the idea for fairly trained came up.
- This idea came about pretty soon after I left my previous job,
which was as VP of Audio and Stability AI. I left that job,
because I disagreed with the, with the generative AI industry's stance on training on
copyrighted work without consent.
And I got a little bit of bit of press around the fact that I left,
which was really very surprising and a little bit weird. But what it meant was that
I ended up having a lot of conversations about this with a lot of people. And In
particular, I spoke to a number of journalists who were asking me,
"If you're saying there are ways of training models that are more ethical,
where are they? Which are they?" On the other side of things, I was talking to a
lot of AI companies who got in touch with me and said, "We're really glad you're
saying what you're saying because this is the approach we take. We take a fairer
approach to training models. And it really struck me that it would be quite easy
and hopefully helpful to essentially put in place a simple certification that just
showed people which those more fairly trained models would be. So yeah,
it's really those conversations that led to this coming about.
And one of this company is Soundful, which generates songs for you in the genre you
want. You can adjust the harmonies, also major or minor, the speed and seconds later
there's your song. We know this from other generative AI companies. You will hear
later in the show also the founder, Dya Elal, who had the chance to ask about
their participation in Fairly Trained. Until then, a lot of the music will come from
their AI.
So, let's talk a little bit more about Fairly Trained with Ad. What is it exactly?
Fairly Trained is a non profit that certifies generative AI companies for fairer
practices around how they train their generative models. In particular, our first
certification, which is called the licensed model certification can be acquired by any
generative AI company that has an AI model that is trained on work that does not
go and take copyrighted work without consent. So we give a certification to those
companies, we give a certification essentially to companies who go and license their
training data, instead of doing what has become the common practice among some of
the bigger tech companies, which is scraping data using as much training data as
they can find, and doing so without any compensate, any compensation to the rights
holders behind that training data or any consent even from the rights holders. So
what we do is we certify those AI companies if they get consent. And the reason we
focus on consent is because we believe that as long as you have that stage of
consent, you as a rights holder, let's say you're a large music rights holder, you
have the chance to then negotiate the terms that work for you, which is and we
think this should work.
I do hope the same thing, but I fear this will not happen anytime soon,
unfortunately. And because of this, fairly trained, is in my opinion a good step
towards a solution, but Once we have maybe decided it is not okay to use music
material without consent, the question will come up how do we pay these people?
How much should an artist get if his or her let's say 100 songs are in a data
set and a generative AI creates a song? It is impossible to say how much influence
a portion of music has in a data set of millions of songs. In this context,
the term "fixed rates" is something interesting to consider, I think. I have
discussed this lately quite often with my friends. Let's say AI companies have to
pay a yearly amount. This money is then distributed to the artist they use in the
dataset. But as sad, who gets what? My suggestion, my idea?
As every song is tagged with something like a description, a mood, a feeling, a
genre, maybe it would make sense to distribute at the end of the year the money to
the artists with hashtags which have been used the most to generate a song.
We would have a top 100 hashtags, each ranking would get some percentage of the
whole which is then distributed to the musicians attached to those. Does that make
sense? I know this is a very simple idea for a very complex problem and I'm sure
once you dive into this more there will be other problems, but we have to start
somewhere. What do you think about this? Do you think that's a good idea? Do you
think it's a bad idea? Do you think this will never happen? I'm always curious to
know what you think. I'm interested in your ideas. You can reach out to me via
mail that you will find on my home page. Write me a mail if you have any ideas.
Search for Dennis Kastrup. You will find me there.
Let's come back to our problem with the use of music and data sets. Fairly Trained
is offering you a certification that gives the promise that you are using material
that is trained only on data that is there with the consent of the musicians. And
then you can put this as a logo on your homepage. But what is the process to get
this logo? The certification process involves a written submission, that's the core of
the process. We give you as an organization, a bunch of questions about your model
and your training data to answer and you answer them. And that includes listing your
sources of training data. It includes sharing information on any licenses to that
training data that you have, any links to public domain or creative commons data
that you might be using as well. And on top of that, we so ask for information
about your processes to ensure that your own internal guidelines around training data
are actually followed by your researchers, your engineers, and other people at your
company. So essentially, this is a trust -based exercise. We ask for a bunch of
information from you. We take it confidentially. We don't share that with anyone. We
think that a trust -based system for certification works at this stage of where the
generative AI industry is. In general, people are pretty open when they go and
scrape data. Mostly they're saying this publicly that it should be an acceptable
thing in their minds. So in general, you do have quite a big divide between the
companies who go and scrape data and think it's fair use. In the States at least,
that's what it's referred to. And the companies who don't and go and license.
And so we think the trust -based model works. We may update that in the future, but
right now it seems to be working pretty well. We have rejected some applications and
I think the mark of any good certification is that not everyone passes,
so that's a good sign. There are many good things about Fairly Trained, but one is
they also have fair prices if you want to be part of it. There's a certification
fee. it's there to cover our costs. We're not a we're not a profit -making
organisation. We're a non -profit. The fees are we try to keep them pretty low.
There's a submission fee for everyone who applies. All companies who apply pay
somewhere between $150 and $350 to submit.
And then if you successfully get certified, then there's an annual certification fee
that ranges from $3 ,000 to $6 ,000. You're then re certified annually,
we just check you haven't updated your your the way you use trading data in a way
that would mean you could no longer get certified. So there's that annual fee. We
think it's pretty reasonable. And hopefully it just covers our costs, which are
relatively limited at the moment. I'm the only person working in this full time
right now. But, you know, we hope to expand and we hope this becomes a more
important part of the ecosystem. At Newton Rex about the pay model for Faley
trained, which is clearly not aiming to make a lot of money. If it is not the
money, what drives him to do this project? What is it? I did it because I honestly
I just think it's really important. I think this I think right now is a very, very
critical time for creators. And I think we need, you know,
I'd like more people to understand that there is a difference of approach here. It's
not that everyone takes the approach that some of the larger AI companies do, you
know, in exploiting creators work without consent like that. Not everyone does that.
And I think it's really important that consumers understand that. I think it's really
important that companies who are using generative AI understand that. And I also
think it's really, really important that regulators, people writing the laws and
making sure they're enforced. I think it's really important that they understand this
as well. It's tempting to believe large AI companies when they say licensing is
impossible. We can't license training data, but it's not true. There are companies
already doing this. I myself did this when I was at Stability AI. All these
certified companies are doing this as well. Really, the point of this is to show
that it is possible. there is another way to do this. I hope that we make as much
impact as we can. My hopes for the future of this space are that we settle on a
middle ground where Genitrv AI, which can be a very powerful technology and I'm a
massive supporter of, where Genitrv AI manages to have a very powerful future,
but it does so in a way where licensing training data is the norm. That's what
we're trying to believe, we won't achieve that on our own. It takes lots and lots
of different efforts from lobbying to court cases, to people building great products.
It takes all kinds of things, but hopefully we can contribute to that in some small
way. Talking about these companies, the ones taking part in fairly trained are
currently Beatoven .ai, Bumi, Bria .ai, LiveScore writes FI,
SomsPoint AI, Tune and Soundfoil. So if you are thinking about generating music the
next time with an AI, and you also want to think about the artists, use the just
mentioned companies. No more remorse, just pure joy playing with the tool. The logo,
by the way, looks like an F and another F turned upside down next to each other,
a quality feel, maybe a bit like the ones you know in supermarkets when you buy
coffee. You see it and you know that you are supporting good working conditions. One
of the companies I just mentioned is Soundfull, an AI music generator which I tried
out and from which some of the music from this episode comes from, like this one.
As I'm a kid of pop I used the prompts emotional pop in F minor pretty good I
think I was impressed. I wanted to know more from DAL co -founder and CEO of
soundful So how exactly SoundFool uses its data? - So we took a little bit of a
different approach. My background grew up classically trained pianists, studied sound
engineering and music production. I was a sound designing and the designer. And then
after that, I was a touring artist, DJ producer, assigned to some major labels,
other labels out in Europe into dance music world. And then went to entrepreneur
route, and then that ended resulted into sound full about seven years later, where
soundful came about. And where we took a little bit of a different approach, we
took an approach of training our models of music theory rules. So the same way you
teach an artist or a musician how to play an instrument, build different models
mimicking the human way of playing an instrument guitar, different know from drums,
et cetera. And second, our sounds are-- we've developed it in -house.
We've acquired libraries, as well as we hire contractor sound designers to sample
them in a very specific way to hit certain thresholds that we would require for our
models. So everything that we've done is we've built it in -house, as well as when
the other side of the company is when we're working with the artists or producers
or labels, rights holders, we build the models that will mimic or bottle the
artist's sonic DNA. We work with them hand in hand building that model.
Even though Soundful builds the model, it's actually we took a little bit of a
different approach of the artist or the rights holder owns it, even though we build
it and we have the rights to use it for them to monetize on it and actually they
share majority of the revenue from it.
Of course I also ask Thea what he thinks about fairly trained. I think it's the
absolutely best step moving forward. Moving forward, the problem right now with the
world in general is that every part of the world has their own rules and views on
AMI. And some here in the US, they're saying, "Hey, if it's fully generative,
you cannot copyright it." And there is rights issues, taking aside the whole scraping
data and all of that stuff. That's a whole different, you know, discussion. But just
the views and some places in Europe, they're saying, no, if it's generated BI and
the I as a user created it, then I can own the copyright for it.
So I think really fairly trained did a really nice job of really merging or taking
the first leap forward of where let's just first solve the problem of,
is this ethically trained and it's doing it the right ways and protecting the rights
holder, other than just looking at what is copyrightable or not, what is the output
if the output is good or not, that doesn't matter. Let's just look at the core of
the foundation of all of these generative companies and are they doing ethically or
not. And by doing that first step, actually that opens up doors and makes It's
easier when you're having company, like talking with the major labels or social
platforms or even CPG brands or gaming companies. It really brings in saying,
"Hey, there are some people that already did the work and looked into what is
really in your build that has done fairly or not." And that should be the first
really check mark or working with any companies, you've done this right,
you have the right data sets. Okay, let's proceed forward with the next step of the
discussion, not necessarily getting a deal done, but just the next step of the
discussion. And that's actually before talking with Edd and Fairly Train,
working with the labels and working with the artists, the first question is like,
what data sets did you use to train? the sounds. Do you have the rights for these
sounds? That was always the first step. So it's really nice to see somebody taking
the leap, you know, the first leap of faith and saying, hey, guys, it's the
nonprofit. I'm going to do this to make sure just people are doing it the right
way of protecting the rights holder, which is right away when Ad reached out and
presented that to me. And I was like, I'll be the first person to raise my hand
and be a part of this.
So if you want to be part of this as a user, you can of course take a look now
each time when you use a Generative AI if that logo of fairly trained is in there.
But on the other hand, one important thing is also that we talk about it, just
like no. It is so easy to just click online on some buttons and then let magically
music appear. But this magic is based on artworks and musicians and we have to talk
about this. And I'm not sure if everybody out there knows this. There is a lot of
the majority of the people now, like as users, They just don't care and it is it's
it's actually it's Our job anybody that is building that technology or somebody like
early training It's our job to to Educate the market and educate the users that you
should care for the people that don't care And that's why having something like
fairly trained and saying hey guys We're you know, we're certified by clearly
trained. Well, what does that mean and and write about it and read about it. Why
is that very important to you to protect what you are creating and using moving
forward? There's a lot of people that don't think of it that way and it's our
responsibility first to protect the rights holders. And second is to educate the
users also that it's coming from the right place and you should be looking out for
that. And that hasn't been just a testament given, you know, chat GPT or given
Dolly and, you you know, mid -journey and older people.
But it does not only lie in the hands of the users, of course the players in the
fields who also work with AI companies have to be aware of what is going on there.
My hopes is that when companies are vetting other companies to collaborate and
partner with, is that should be the first thing that
that's one as a really the stamp of like approval, like that these guys are doing
it right. And then by doing that, Dan, that forces other companies that maybe are
not doing it the right way to maybe rethink the way that they're doing it and
redevelop their, you know, their process or give really the rights holders their
rights and the shares of what they're using as, you know, as open trained or like
you know fair use. So that's really my hope is that this is a step forward that
companies should really think long and hard that there is a way that technology
advancements and the rights holders it is possible and it is doable to for them to
meet kind of at an intersection and it's not just always like black or white.
Perfect last words for this episode of the Iliac Suite. There is hope, but we
should not let it die. Which means use artificial intelligence responsibly and change
will apply. In other words, we all have the power to change things if we take
actions. And right now is the time because all is so new and fresh that we can
still form the future. If you like this episode, I would be happy if you start
following the Iliac Suite on any streaming platform so you know when a new episode
will be out. Feel free to reach out to me, you will find all the information
online where to contact me. I am heading out for South by Southwest and some days
drop me a line if you want to meet me there. That's it. Thanks again Ed and Dio
for talking to me. Take care and behave.
[Muzica]
[Muzica]
(upbeat music)
Creators and Guests

