Lukas Tuggener draws a couple of curves on a sheet of paper. They look a bit like altitude lines on a map of Switzerland. However, the research he wants to explain with the sketch does not revolve around the Eiger, Mönch, and Jungfrau mountains, but rather around Mozart, Bach, and Beethoven. The Zurich University of Applied Sciences specialist for object recognition is working together with the start-up ScorePad on a project designed to revolutionize the digitization of sheet music. Their objective: To transform sheet music into a machine-readable format, publish it as an app, and open up completely new possibilities for professional musicians.
Florian Seibold, Chief Technological Officer at ScorePad, explains: “Just imagine: The conductor only has to tap on a bar in the musical score – and each tablet computer in the orchestra pit immediately displays the correct position. And this precisely as required by the individual musicians: The cellist, for example, sees only the parts of the score and the comments that are relevant to her.
If a piece of music needs to be transposed – for example because the vocalist cannot reach the highest notes – all it takes is a tap on the screen. Thanks to machine-readable scores, the app could ultimately even be programmed so that it can play or listen along to music. “Then, the trumpeter will no longer have to turn the pages by hand. The app will recognize which part he is playing and automatically display the corresponding position in the musical score.”
Tablet computer instead of reams of paper
Digital sheet music is handy for professional musicians – also in a very general sense: Instead of hauling reams of paper around, all they need is a tablet. Although this is already possible today, until now, this has been based on digital copies of scores, i.e. image files or PDF files. Lukas Tuggener and Florian Seibold’s project, in contrast, is about machine-readable music. This is the basis that allows software to understand the sheet music at the semantic level.
Lukas Tuggener compares the principle to text documents: “You can print a text, take a photo of it, and then send the photo to someone via WhatsApp. The person who receives the message may be able to read the text, but it is simply a cluster of pixels. Alternatively, you can send it as a Word document, which makes it possible for the recipient to edit the text.”
Automated digitization thanks to deep learning
Just as it is possible to type up a text from a physical manuscript, sheet music can also be digitalized by hand: By painstakingly entering each note, each pause, and each key into music notation software. For an A4 page of moderately-difficult music, this process takes an experienced musician approximately half an hour. This soon adds up: Just the “Spring” of Vivaldi’s “Four Seasons”, for example, consists of almost 30 pages. This is why the researchers and app developers are working on massively accelerating this process by automating large parts of it.
The key is an artificial neural network that uses deep learning methods to teach itself how to read sheet music and determine its content. But this is not as simple as it might sound. “Conventional deep learning approaches towards object recognition were essentially developed to analyze your holiday photos. These usually contain one to five dominant elements: A palm tree, a ship, a cocktail, a face, and a deck chair, for example,” Lukas Tuggener explains. The deep learning algorithm overlays the image with a kind of coordinate system and determines for each quadrant whether it contains an object. “However, since each sheet of music contains hundreds of very small objects, the grid would have to be extremely fine-meshed to locate each one. This would require vast computing capacities and would not speed up the digitization process at all.”
Completely new approach towards object recognition
Here the altitude lines from Lukas Tuggener’s sketch come back into play. He explains:
Instead of using a grid, the first step our neural network takes is to draw an object map. The musical score is thus transformed into a kind of relief map: The higher the ‘mountain’, the closer you are to the center of an object.
The second step is the “cut”: Wherever the ‘mountain’ exceeds a certain height x, there is an object," Lukas Tuggener explains. In this way, you achieve a binary result.
The combination of map and vertical cut is a completely new approach towards object recognition. It is called Deep Watershed Detection and its great advantage is that it makes no difference whether there are two objects on one page or two hundred. Finally, the neural network assigns all the detected objects to an object class, which it learned previously – a sixteenth pause, quarter note, or F key, for example.
Gallery of Lukas Tuggener at work
Gute Resultate für häufige Zeichen
Das neuronale Netz trainierten die Forschenden mit zwei Trainingsdatensätzen: einem mit rund 200'000 gedruckten und einem mit 110 handgeschriebenen Notenblättern. Bei den anschliessenden Tests lieferte das neuronale Netz ausgesprochen gute Resultate – insbesondere für diejenigen Elemente aus dem zirka 150 Zeichen umfassenden «Notenalphabet», die sehr häufig vorkommen.
«Our method is more than twice as accurate as others,» Lukas Tuggener emphasizes.
Moreover, in terms of detection and classification precision, there were hardly any differences between the printed and hand-written music. “Until now, however, we have only applied our method to very high-quality sheet music. In the next step, we want to attempt to also digitalize old music manuscripts or even poor photos of sheet music.” The federal government acknowledges the potential of the project: Just like the first phase of the research project, this second phase will receive funding from Innosuisse, the Swiss Innovation Agency.
Post-processing remains very extensive
So far, the researchers have only partially succeeded in drastically accelerating the digitization of sheet music. Post-processing – the checking and revising of the results supplied by the neural network – is still very time-consuming and has to be performed manually. “Currently it takes us 20 minutes per A4 page,” says Florian Seibold from ScorePad. Thanks to a variety of screen views, however, the most common errors can already be eliminated quite quickly.
In order to eliminate the remaining errors more efficiently, a method, which researchers in most other fields of deep learning are also interested in, would prove to be very helpful: “It would be ideal if we could somehow get the neural network to tell us how confident it is in its decisions,” says Lukas Tuggener. But this is not a simple task. Because the strength of artificial neural networks – that you do not have to tell them how to learn – is also their weakness: “In a way, they are a black box and we never know quite how they reach their results.”
Access to information about how reliable the classifications are would considerably simplify post-processing. Lukas Tuggener says: “Of course, four errors are better than five. But having five errors and a rough idea of where they might occur is also better than having four errors but no idea about where they are.” Because ScorePad will never do completely without human post-processing.
Spotify for professional musicians
“Our goal is that within a year or two, we will be able to reduce the post-processing time to three minutes per A4 page,” says Florian Seibold. This increase in efficiency is key to the creation of a large sheet music database, such as ScorePad has in mind. Florian Seibold:
«We want to establish ourselves as the equivalent of Spotify for professional musicians: a library containing all imaginable pieces of music, where, thanks to machine readability, it is even possible to search for individual phrases of notes.»
The app currently offers approximately 200 scores, and some 100 users have downloaded it to date.
Florian Seibold is convinced: In order for the app to become more popular, we need a profound cultural change in the music scene. “Just a few years ago, the transition from paper scores to PDF scores was almost unthinkable. Now the realization has to establish itself that scores are not simply something static, but a dynamic tool that allows us to work with music.”