Ornamenting with Neural Networks

Evaluating AIs' taste in diminution

In the majority of the chapters in this research, music making has mostly been approached in simple note-against-note scenarios. In such an environment, creating discrete procedures for safe voice-leading may not be impossible by interpreting errors as prohibitions to absolutely avoid and by taking advantage of rules to guide us deriving the next note of our counterpoint.

Such approaches do not address the crucial topic of diminution.

Rules?

In Il Dolcimelo, the author Aurelio Virgiliano provides a short list of rules for generating diminution.

However, a detailed look quickly reveals inconsistencies:


"1. La diminutione caminar deve per grado il più che sia possibile"

 

"1. Diminution must proceed stepwise as much as possible"

What does "as much as possible" mean? Are we not allowed to make any kind of jump? A significant portion of his examples -if not the majority- does contain leaps. Even wide jumps can be found too. Does his rule imply such examples are less preferable to 100% stepwise diminutions? Most likely not.

"8. Non deve la diminuzione discostarsi mai dal soggetto più di una quinta sotto, o sopra."


"8. Diminution must never expand from the main note more than a fifth -up or down."

 

What about this, then?

There is a huge -perhaps infinite- variety of ways his rules could be coded into algorithmic procedures. It is possible to interpret this page in a stocastic fashion and assign percentages of likelihood to the many parameters.

E.g. we might decide to have rule 1 being randomly applied on an average of 1 every 2 cases (the flip of a coin), perhaps 1 every 5, etc. Or else, we could decide rule 1 to be the very last operation the algorithm should follow, almost as "emergency backup procedure" once other high-priority parameters cannot succeed generating an effective diminution, etc.

 

With the help of external computational power and bruteforce (although not necessarily), the user could in theory know which interpretation of his rules would be the most free. For example, all possible sequences of 4 notes in the span of a fixed diatonic octave would be 2401 (74). Adding the condition that the first note of the diminution must be the same as the original (one of the points of his rule 4) already cuts the results down to 343 possibilities (73). Here is some of those solutions:

Instead of developing algorithms for creating effective diminutions -especially since I could not find historical examples- I considered the use of modern neural networks because of their renowned ability to "learn" indipendently by "studying" (or analyzing) a certain set of data to then produce similar but unique new outputs. To many, this peculiar and still partially mysterious feature easily reminds of how apprentices of any field learn certain skills: by watching dozens, hundreds if not thousands of examples and somehow "acquiring a sensitivity", sometimes creating the illusion different ideas are mixed together into compound and possibly fresh, new solutions. Like us, AI now need to practice as well!

Ornamenting with LLMs:

Notes, words or binary digits?

With the recent rise of experimentation by big tech companies in the field of AI -particularly LLMs (Large Language Models) and neural networks- there is ongoing debate about which model performs best at specific tasks: who has the "smartest" of them all, basically. This inspired me to make the following test.
 
 
After verbally transcribing page 4 of Aurelio Virgiliano's Dolcimelo, I asked two popular LLM chatbots to fill in the rest of the page with their own solutions. I explicitly asked not to copy the input examples and transcribed their answers onto the manuscript for better readability. Here are the results.

Dolcimelo's page 4 filled in by DeepSeek

Dolcimelo's page 4 filled in by ChatGPT

Hopefully at this point, the reader will grasp the substantial lack of scientific rigor of such "experiment". Repeating the same test would in fact most likely produce different results, each time. The tables above therefore can only serve a demonstrative and suggestive purpose for the reader.

 

Still, thinking it through it's fascinating to contemplate: what are we actually looking at, here?

 

The LLM does not contain a clear note gamut -like the one we learnt at school- that it access when my request of creating diminution is sent. It therefore does not pick the notes for me but instead checks through its available data (presumably a large amount of written, verbal one) and makes a text prediction of an output likely to be accepted by me.

It could then be argued the notes you see are not actual notes but instead lucky combination of letters which, by chance, looked coherent enough to me to be accepted as "final answer from the bot".

To better understand the intricate designs of neural networks, a great explanatory series is highly recommended on the YouTube channel 3Blue1Brown.

For instance, notice the awful performance at the easiest of the counterpoint task:

Although grammatically impeccable, it's hard to keep track of the number of mistakes and contradictions in the chatbot reply.

"I wish such a robot could be a bit more diligent and follow rules...for once!" will perhaps think the reader.

Indeed, what we are looking at is an example of a computer not computing counterpoint, but words.

 

By chatting further it should become quite easy to convince the bot that the diatonic octave is -say- 15 notes wide, or that the notes are named in a different order from our standard -mi,sol,fa,re, etc.

Perhaps, because of this linguistic imprinting, such a tool could be more helpful or reliable for the user willing to explore or invent concepts of music theory.

The Bach Doodle

A more successful implementation of neural networks in music than the previous is the Doodle Celebrating Johann Sebastian Bach released in 2019. Its neural network  was designed to read actual music data (in MIDI format, I assume) and had access to the entirety of the Bach chorales contained in his Cantatas.

By examining the outputs, it can be assumed there is no further "categorical filters" based on counterpoint rules as it is possible -if not common- to find well-known mistakes such as well parallel fifths, octaves, unprepared dissonances and much more. Unfortunately, those mistakes may likely be unsolvable due to the very core nature of the machine-learning process. In fact, instances of such "mistakes" can be found too in the dataset (see here). The neural network therefore does not see the point why it should avoid them. "If Bach is doing so, so will I!".

 

Apart from generating big numbers of diverse chorals on a same input melody, it can happen -it did to me at least- that the bassline of a generated SATB choral on a given melody (I remember it was Num Komm der Heiden Heiland in my case) precisely matches the bassline used by Bach not in the same chorale, but in the Basso Continuo of other parts of the same cantata, whenever the melody appears. I remember the spooky feeling I felt when I realized the database only contained SATB-solo chorals and not a single instance of chorals with instrumental parts. The doodle could not know Bach would have done that somewhere else.

How to compare AIs?

I find the above reflections leading to more question:

To what extent is it possible to compare different AI's output, especially when their functioning starts relying on complex stochastic processes such as in neural networks? Without understandable processes behind them, how could a human describe, comment on their general functioning?

Could perhaps a statistical analysis made on a large body of artificial works by a music-generating-machine reveal truths about the machine as a whole entity?

More specifically, how fair would it be to deduce "music features" simply because appearing in the outputs? Would recurring instances of a particular event be intrinsic to the machine and not mere coincidences?

In other words, could a neural network posess an innate inclination, something close to what we call taste -perhaps unknown to their inventor?

 

Another way to look at it may be accepting our inability to see, understand, perceive such a tasteinclination -determined by the complexity of the neural network, the scale and quality of the dataset, etc- which would certainly change from machine to machine.

 

...Anyway, did anyone ever succesfully described what their (human) music taste precisely is?


 

Click on the archer to

go back to the Home Page