Basic Pitch can turn your vocals into sheet music

0

Before electronic music has become an umbrella category for a distinct genre of modern music, the term refers to a music production technique this involved transferring audio created by real instruments into waveforms that could be recorded on tape or played through amps and speakers. In the early to mid-1900s, electronic instruments and music synthesizers—computer-connected machines that can electronically generate and modify sounds from a variety of instruments—began to become popular.

But there was a problem: almost every company used its own computer programming language to control its digital instruments, which made it difficult for musicians to put together different instruments made by different manufacturers. So in 1983 the industry came together and created a communication protocol called musical instrument digital interfaceor MIDI, to standardize how external audio sources transmit messages to computers, and vice versa.

MIDI works like a ordered which tells the computer what instrument was played, what notes were played on the instrument, how hard and how long it was played, and with what effects if any. The instructions cover the individual notes of the individual instruments and allow the sound to be reproduced accurately. When songs are stored as MIDI files instead of an ordinary audio file (like an mp3 or CD), musicians can easily change the track’s tempo, key, and instrumentation. They can also remove individual notes, entire instrument sections, change the instrument type, or duplicate a lead vocal track and turn it into a harmony. Since MIDI keeps track of which notes are played when by which instruments, it’s essentially a digital score, and software like Rating Reader can easily transcribe MIDI files into sheet music.

[Related: Interface The Music: An Introduction to Electronic Instrument Control]

Although MIDI is convenient for many reasons, it generally requires musicians to have some sort of interface, such as a MIDI controller keyboard, or knowledge of how to program notes by hand. But a tool made available to the public by engineers at Spotify and sound trap this summer, called Base locationpromises to simplify this process and open up this tool to musicians who lack specialized equipment or coding experience.

“Similar to how you ask your voice assistant to identify the words you say and make sense of the meaning of those words, we use neural networks to understand and process audio in music and podcasts”, Rachel Bittner, Spotify scientist who worked on the project, said in a September blog post. “This work combines our ML research and practices with domain knowledge about audio – understanding the fundamentals of how music works, like pitch, tone, tempo, frequencies of different instruments, and more.”

Bittner envisions the tool as a “starting point” transcription that artists can do in the moment, saving them the hassle of writing notes and melodies by hand.

This open-source The tool uses machine learning to convert any audio to MIDI format. See in action here.

[Related: Why Spotify’s music recommendations always seem so spot on]

Previous research on this space has made the process of building this model easier, to some extent. There are devices called Disklaviers which record piano performances in real time and store them as a MIDI file. And, there are many paired audio recordings and MIDI files that researchers can use to create algorithms. “There are other tools that do many parts of what Basic Pitch does,” Bittner said in the podcast. [email protected]. “I think what makes Basic Pitch special is that it does a lot in one tool, rather than having to use different tools for different types of audio.”

Additionally, an advantage it offers over other note detection systems is that it can simultaneously track multiple notes from more than one instrument. Thus, he can transcribe voice, guitar and vocals at the same time (here a document the team published this year on the technology behind it). Basic Pitch can also handle sound effects such as vibrato (trembling on a note), glissando (sliding between two notes), bends (fluctuations in pitch), thanks to a pitch bending detection mechanism.

To understand the components of the model, here are some basic things to know about music: Perceived pitch is the the fundamental frequency, otherwise known as the lowest frequency of a vibrating object (like a violin string or vocal cord). Music can be represented as a bunch of sine waves, and each sine wave has its own particular frequency. In physics, most sounds we hear as high-pitched have other harmonically spaced tones Above. The difficult thing that pitch-tracking algorithms have to do is lump all the extra pitches into one, Bittner noted. The team used something called a constant-Q harmonic transform model the structure of tonal sound by harmonic, frequency and time.

The Spotify team wanted to make the model fast and low-power, so it needed to be less computationally expensive and make fewer inputs to go further. This means that the machine learning model itself had to have simple parameters and few layers. The basic pitch is based on a convolutional neural network (CNN) which has less than 20MB of peak memory and less than 17,000 parameters. Interestingly, CNNs were one of the first models known to be effective in detecting images. For this product, Spotify trained and tested its CNN on a variety of open datasets for vocals, acoustic guitar, piano, synthesizers, orchestra, in many musical genres. “In order to allow for a small model, Basic Pitch was built with a harmonic stacking layer and three types of outputs: starts, notes, and pitch bends,” Spotify engineers wrote in a blog post.

[Related: Birders behold: Cornell’s Merlin app is now a one-stop shop for bird identification]

So what’s the benefit of using machine learning for a task like this? Bittner explained in the podcast that they could create a simple representation of pitch using audio clips of an instrument being played in a room over a microphone. But machine learning allows them to discern similar underlying patterns even when they have to work with different instruments, microphones and rooms.

Compared to a 2020 automatic multi-instrument music transcription model trained on data from MusicNET, Basic Pitch had a greater precision when it came to detecting notes. However, Basic Pitch performed less well than models trained to detect specific instrument notes, such as guitar and piano. Spotify engineers recognize that the tool isn’t perfect, and they’re eager to hear community feedback and see how musicians use it.

Curious to see how it works? Try it here—you can record sounds directly from the web portal or upload an audio file.

Share.

Comments are closed.