MusicLM: Google's new AI tool can turn text, hisses and hums into real music

Estimated read time: 3 min

Google researchers have revealed a music synthesis AI that creates songs up to five minutes long.

Publishing an article with their work and findings so far, the team has presented MusicLM to the world with a number of examples that look surprisingly similar to their text prompts.

The researchers claim their model “outperforms previous systems in both audio quality and respect for textual description”.

The examples are 30 second excerpts from the songs and include their subtitles such as:

  • “The soundtrack to an arcade game. It’s fast and upbeat, with a catchy electric guitar riff. The music is repetitive and easy to remember, but with unexpected sounds, like cymbal crashes or drum rolls. drum”.
  • “A fusion of reggaeton and electronic dance music, with a spatial and otherworldly sound. Induces the experience of being lost in space, and the music would be designed to evoke a sense of wonder and wonder. admiration, while being danceable”.
  • “A rising synth plays an arpeggio with lots of reverb. It’s backed by pads, a sub-bass line, and smooth drums. This song is full of synth sounds creating a soothing and adventurous atmosphere. He can play in a festival for two songs for a boost”.

Using AI to generate music is nothing new – but a tool capable of generating passable music based on a simple text prompt has yet to be introduced. That’s so far, according to the team behind MusicLM.

Researchers explain in their article the various challenges facing the AI ​​music generation. First, there’s a problem with the lack of paired audio and text data – unlike text-to-image machine learning, where they say huge datasets have “significantly contributed” to recent advances. .

For example, OpenAI’s DALL-E tool and Stable Diffusion have both generated renewed public interest in the region, as well as immediate use cases.

An additional challenge in AI music generation is that music is structured “along a time dimension” – a music track exists over a period of time. Therefore, it is much more difficult to capture the intent of a music track with a basic text caption, as opposed to using a caption for a still image.

MusicLM is a step towards overcoming these challenges, the team says.

It is a “sequence-to-sequence hierarchical model for music generation” that uses machine learning to generate sequences for different levels of the song, such as structure, melody, and individual sounds.

To learn how to do this, the model is trained on a large unlabeled music dataset, as well as a musical subtitles dataset of over 5,500 examples, which have been prepared by musicians. This dataset has been made public to support future research.

The template also allows for audio input, such as a whistle or hum, to help inform the melody of the song, which will then be “rendered in the style described by the text prompt”.

It has not yet been released to the public, with the authors acknowledging the potential risks of “creative content misappropriation” if a generated song did not differ sufficiently from the source material the model learned from.

Post a Comment

Cookie Consent
We serve cookies on this site to analyze traffic, remember your preferences, and optimize your experience.
Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
AdBlock Detected!
We have detected that you are using adblocking plugin in your browser.
The revenue we earn by the advertisements is used to manage this website, we request you to whitelist our website in your adblocking plugin.
Site is Blocked
Sorry! This site is not available in your country.