Pythagoras’ quote is often interpreted as a reflection on the limitations of human understanding, suggesting that what we call "reality" is merely a filtered perception shaped by our senses and minds. For instance, humans can only hear frequencies between roughly 20 Hz and 20 kHz, while the visible spectrum ranges from about 400 to 790 THz. When we consider the specificity of our senses and the way our minds interpret them, we become aware of how little we can truly grasp about reality. This very realization has long inspired artists to explore ways of bridging different sensory experiences. It is fascinating to consider the possibility of manifesting a single object through multiple phenomena—or at least creating the illusion of such a connection—within the limits of human perception. For me, this has the power to momentarily distance us from the constraints of daily life, fostering a heightened awareness and contemplation of perspectives beyond our reach. These ideas are a constant source of motivation and identity in my artistic practice, guiding the investigations presented in this exposition (see also 14. Personal perspective).
Cross-modality in art refers to the interaction and integration of multiple sensory modalities—such as sight, hearing, touch, and even taste or smell—to create a unified artistic experience. It delves into how stimuli from different senses can complement and influence one another, often blurring the boundaries between them to produce cohesive, immersive effects. For instance, in cross-modal art, a sound might alter our perception of a visual texture, or a specific color might deepen the emotional resonance of a piece of music. While the terms cross-modality, multi-sensory, and multimedia have slightly different meanings, each emphasizing distinct aspects of this creative field, they often overlap. For the purposes of this research, I will treat them as synonyms. Coming from a jazz background, I found myself confronting a new dimension governed by parameters and laws that demanded exploration and understanding. It quickly became apparent that navigating this territory would require not only curiosity but also a meticulous approach. I began to ask myself fundamental questions: How can I navigate and find order in overlapping layers? How will audiences perceive and respond? Which parameters deserve the most attention? I identified the first crucial task in maintaining a balance between the mediums to avoid overstimulating the audience. Achieving this balance necessitates establishing a hierarchy among the elements and I was fortunate to attend a lesson by Yiannis Kyriakides, who spoke directly to these concerns1.
In cross-modal composition, the coexistence and hierarchy of different mediums are influenced by several interrelated factors2. Context plays a significant role in shaping audience expectations and focus, with the social or historical setting often pre-determining the dominant medium. For example, in an exhibition, viewers are more inclined to focus on visual elements, whereas in a concert, the sonic or musical aspects typically take precedence. Structure is another determinant; a medium can establish itself as primary when it demonstrates formal consistency or self-sufficiency, possessing an inherent logic that allows it to exist independently and draw attention without external meaning. The rate of information within a medium also impacts its prominence; a dense musical score or an active visual field naturally commands more focus, shaping how it interacts with other elements. Similarly, scale plays a crucial role, as larger, closer, or louder elements tend to dominate cognitive space. This principle is evident in immersive forms like cinema or video games, where overwhelming sensory input eclipses awareness of the external environment. Additionally, concreteness can draw attention, as recognizable elements—such as concrete sounds in music or real-world visual objects—often overshadow abstract textures and structures. Finally, junctures occur when a dominant medium is disrupted or replaced. These moments of transition redefine the relationship between mediums, prompting shifts in audience perception and focus. Together, these factors create dynamic hierarchies within cross-modal compositions, shaping how different elements coexist and interact.
Before delving into specific parameters, I would like to linger on the concept of harmony and its various meanings across different contexts, drawing inspiration from a recent article by Charles Spence and Nicola Di Stefano3. In the Greek tradition, the term “harmony” originally referred to the physical unification of diverse elements4, as well as a broader sense of agreement or peace5. Over time, deeply embedded in metaphysical and cosmological thought, the concept evolved to signify a more abstract and qualitative form of agreement, often carrying a positive rather than neutral connotation6. In a unisensory context, harmony functions as an organizational principle, arranging stimuli within the same modality without necessarily conferring any specific processing advantage. For instance, musical harmony organizes sound materials, assigning distinct roles based on their functions within musical language7. However, in a multisensory context, harmony takes on a different dimension. It becomes a mapping criterion, bridging sensory stimuli across different modalities. Here, it is often assumed that stimuli combined harmoniously will not only align aesthetically but also facilitate more efficient and effective processing8. Central to the concept of harmony are the parameters of consonance and dissonance, which play a pivotal role in defining the relationships between elements. I will explore these parameters in greater detail in few paragraphs.
Let us now explore the parameters within the audible realm, which I will refer to as sonic parameters. In music, terms such as harmony, consonance, and dissonance are often associated with pitches and intervals. However, this association is far more complex than how it sounds—pun intended. Consonance is a value that is both partially objective and partially subjective, as it varies significantly across different musical genres and traditions. For instance, dissonance in Classical-period music is treated with strict rules and resolutions, while modern pop music generally minimizes its use. Conversely, genres like jazz and contemporary music actively seek out dissonances, treating them as valuable aesthetic elements. This research does not aim to define a universal standard for consonance but rather to emphasize its inherent relativity. For example, if we consider pure Pythagorean intervals as consonant, even equal temperament—the standard in Western music—can be perceived as dissonant. This point is particularly relevant because the oscilloscope, central to my work, is highly sensitive to intonation. As a result, Just Intonation plays a pivotal role in my compositions. Beyond pitch, rhythm emerges as another fundamental parameter, so closely related to pitch that they can, in some cases, be regarded as one and the same, as Stockhausen famously argued9. Below 16 Hertz, we perceive sound as rhythm rather than pitch10. Similarly, just as an interval is defined by the ratio between two pitches, slowing it down enough transforms it into a polyrhythm. For instance, the major third interval, with its 5:4 ratio, corresponds to a polyrhythm of 5 against 4—a phenomenon that can be both heard and visualized or a 3 over 2 is perceived as a fifth interval (audio 1). Other few parameters to keep at hand are amplitude, timbre, density (as the amount of simultanous) voices and tempo.
In the visual realm, the principal parameters can be identified as shape, size, color, proportions, density, and visual harmony. But what is visual harmony? It shares striking similarities with musical harmony. Just as pitches in music are organized according to ratios and functions, so too are objects and colors in visual art. In my work, I establish a direct correspondence between sonic and visual ratios. Bill Alves eloquently captures this connection: “The fact that whole number proportions create these arresting patterns of visual resonance suggests a correspondence, or complementarity, to consonant musical sonorities created by whole number frequency ratios, that is, Just Intonation.”11 He further notes, “I created the music entirely in Just Intonation, using harmonies which were often direct analogues of the patterns of visual symmetry.”12 This parallel underscores how ratios, whether sonic or visual, create a unified sense of balance and resonance across modalities. Looking back to the historical reference (see 5. Visual music), architects and painters have long employed principles of harmony in their work, both in relation to space (shape, proportions, position) and colour.
The medieval cathedral is a prime example of a multisensory spatial experience. Its acoustic properties, the massiveness of its structure, the dramatic interplay of light and shadow, and the tactile qualities of its materials collectively evoke a profound sense of spirituality through harmonic manipulation of sensory input13. The phenomenon of color harmony has also fascinated thinkers and creators for centuries. Aristotle, inspired by Pythagorean musical consonances, hypothesized that pleasing color combinations might depend on the same numerical proportions as musical intervals14. While he lacked the tools to test his theory, it influenced artists and scientists, including Isaac Newton, who later linked color and pitch through their wave properties15. From Newton’s theories to Aleksandr Scriabin’s use of color in Prométhée, scientists and artists have explored the interplay of sound and vision, often seeking perceptual unity or synesthetic expression. However, subsequent analysis has suggested that Scriabin’s intended use of color had been to disambiguate the music itself, rather than being an expression of his possible synaesthesia16,17,18, and attempts to create a direct mapping of color relationships to the immediacy of musical consonance and dissonance have largely failed19. Other parameters, such as size and density, can be more straightforwardly mapped between sound and image. Size corresponds to amplitude, while density aligns with complexity or texture. In my practice, I associate color with musical character, though this choice is somewhat subjective and arbitrary.
These direct mappings contribute to what is known as perceptual coherence—a property of stimuli that form a unified whole, making them easier to perceive as a singular entity20. Gestalt psychologists, discussed this extensively in relation to the visual domain, where coherent patterns like Kanizsa figures evoke a sense of unity21. Another relevant concept is processing fluency, which suggests that the easier it is for a perceiver to process a stimulus, the more aesthetically pleasing it becomes. This term explains why consonance and dissonance evoke such different responses. Consonant intervals are more easily processed by the auditory cortex, requiring fewer neural resources compared to dissonant chords22,23. Dissonance, by contrast, demands more from the sensory system due to its lack of periodicity and increased complexity at various levels of auditory processing. In simple terms, the sound wave of a consonant interval—such as those based on perfect ratios—is more regular, with a shorter and more predictable period, while dissonant sounds are irregular and harder for the brain to decode24. This underlying neurological reality reinforces the idea that harmony, whether sonic or visual, resonates deeply within our perceptual systems.
Harmony in cross-modality has sparked considerable debate among experts, with some questioning whether it is even possible to achieve harmony across different senses. A central issue is whether harmony can genuinely be experienced cross-modally or if it remains a phenomenon exclusive to individual sensory modalities, such as auditory perception. This raises the question of whether the term “harmony” holds any substantive meaning when applied outside its traditional acoustic or musical context, or whether its use in other sensory domains is purely metaphorical25. Some argue that harmony is fundamentally an auditory concept, while others suggest it may exist within and across senses, though it remains unclear which specific sensory pairs—such as vision and sound—are capable of achieving this interaction. Over the years, psychologists have explored plausible equivalences between sensory modalities26 (Julesz & Hirsh, 1972; Kubovy, 1981; Marks, 1978), suggesting potential pathways for understanding cross-modal harmony. For instance, John Whitney (1984) proposed an approach in which animation complements music not through direct representation but by aligning with a higher-level aesthetic intention, which he termed “complementarity.”27 However, while this method allowed for moments of musical articulation within the animation, it did not fully demonstrate the potential for a truly harmonious integration of visual and musical composition28, leaving the concept of cross-modal harmony a topic of ongoing exploration. This debate serves as an invitation to approach the following chapters of this research with extra awareness, especially as we move from conceptual discussions to more technical approaches. For now, I can anticipate that cross-modal harmony, in terms of consonance and dissonance, can be extremely straightfoward on the oscilloscope.
After an intensive—though not exhaustive—exploration of the historical and theoretical topics underpinning my research, we now turn to the practical techniques required to understand and operate the oscilloscope. The first key point is that most oscilloscopes feature two input channels, enabling them to receive and process two distinct signals simultaneously. This functionality is essential for my practice, as the techniques discussed in the following sections cannot be executed with a single-channel oscilloscope. To understand why, let us first examine how a single-channel oscilloscope operates. In this mode, the oscilloscope screen displays a graph where the x-axis represents time, and the y-axis corresponds to voltage. The time frame can be adjusted using the Time/DIV knob (E2), which allows you to "zoom in and out" along the x-axis. Similarly, the y-axis can be fine-tuned using the Volt/DIV knob (D4 or D10) and the Position knob (D1 or D7) for the respective channel. Refer to Figure 17 and Video 1 for a visual guide. It is worth noting that control placements and labels may vary slightly depending on the oscilloscope model or brand, so you may need to identify the corresponding knobs on your specific device.
The real magic happens with the Trigger Mode (or Source) selector (B3). Setting this selector to X/Y mode allows the two input signals to define the x- and y-axes, effectively transforming the oscilloscope into a visual synthesizer, or an electronic Harmonograph (see 2. Harmonograph). Some devices may require activating an additional switch, such as button B5, to enable this feature. (Refer to Video 2 for a demonstration.) While I haven't extensively used one-channel mode, its simplicity should not be underestimated. It can yield fascinating results, particularly when shaping waveforms. But what truly unlocks the potential of this system is the two-channel mode, where we can finally draw figures, such as Lissajous curves (see 3. Lissajous figures). Figure 18 demonstrates why these shapes emerge, showing how two sine waves—one phase-shifted by a quarter cycle—generate a perfect circle. As we shift the phase of one sine wave, the shape distorts, as illustrated in Figure 19. Similarly, the other animatated figures showcase the effects of modifying frequency relationships using just intonation ratios, leading to intricate visual patterns. Educational online tools are available to generate these figures on the go29, try them yourself!
Generating Lissajous figures on an oscilloscope is a straightforward process. As shown in Diagram 01, this can be achieved by feeding two sine waves into the x and y inputs. The resulting curves depend entirely on the frequency ratio between the two oscillators: when the frequencies are equal, a perfect circle appears, while other ratios produce more complex shapes. Specific examples of harmonic intervals will be introduced later. This technique closely resembles the operation of a lateral harmonograph (see 2. Harmonograph). Reproducing the patterns of a rotary harmonograph, however, requires a slightly more intricate approach. As illustrated in Diagram 02, the signals from both oscillators are first summed and then sent to the oscilloscope’s two input channels. To visualize a shape, one of these channels must be phase-shifted, effectively introducing a delay. Video 03 demonstrates this process, gradually shifting the phase of the second signal. In both lateral and rotary techniques, the figures remain stationary when the interval ratio is pure (i.e., a just intonation interval). However, as the interval becomes less consonant, the figure begins to rotate. For instance, a 5:3 ratio (a major sixth) will stay still but will gradually start spinning when one of the frequencies is detuned, as shown in video 04.
In both the lateral and rotary techniques, the curves maintain a direct one-to-one relationship with the musical interval. The examples above illustrate this connection. A recurring pattern emerges when analyzing the 5:4 interval (figs. 20, 21). In the lateral (Lissajous) figure, the 5:4 relationship is clearly visible. However, in the rotary technique, the pattern initially appears to reflect only the sum of the two numbers (5+4=9). Yet, when we adjust the amplitude of one of the signals—whereas in the previous examples, both amplitudes were equal—a new detail emerges: the number 4 corresponds to the gaps between the rounded angles (Figure 22). To further clarify this phenomenon, let’s examine the simpler 3:2 interval (Figure 23). This relationship directly connects to star polygons, as shown in Figure 24. In this case, the 3:2 interval forms a shape related to the {5/2} star polygon, where 5 represents the sum of the ratio’s numerator and denominator (3+2), and 2 corresponds to the denominator.
As demonstrated in the previous example, amplitude has a directly proportional effect on the image. In general, amplitude (or volume) of sound translates to the size of the visual representation on the screen (video 5). Since the two channels correspond to the x- and y-axes, it is also possible to stretch the image along a single axis, as shown in video 6. In both cases, the effect is as perceptible visually as it is audibly. Notably, in video 6, you can hear the single note rising and falling, along with changes in the volume of the left and right channels—especially if using headphones or proper speakers. In some techniques, such as the rotary method, amplitude modulation can influence more than just size. When both signals are modulated together, the result is a uniform size variation (video 7). However, when modulated independently, the shape itself changes (video 8 - synced, video 9 - unsynced)30. Let’s now explore a few more examples of simple parameter changes. In video 10, a gradual phase shift produces increasingly intricate and diverse shapes. Up until now, we have primarily used sinusoidal waves—let’s examine how triangular waves behave in rotary motion (video 11) and lateral motion (video 12). Videos 13 and 14, on the other hand, demonstrate the interaction between a triangular wave and a sine wave. Unlike these, sawtooth and pulse waves have a different use in oscilloscope visualization, as their instantaneous voltage jumps prevent the formation of continuous lines or curves.
So far, the examples have focused on synthesized basic sound waves—regular signals with no harmonics (or nearly so). The oscilloscope, with its uncompromising representation of sound, vividly displays both the purity of sine waves and the raw, chaotic complexity of real-world sounds. Even a single acoustic instrument can produce unexpected, uncontrollable, and often messy results (videos 15 and 16). Now, imagine what an entire orchestra would look like on the screen—or better yet, see it for yourself in video 17. As you’ll notice, the rich overtone content of a single instrument alone can visually "overload" the oscilloscope. This starkly contrasts with the strict harmonic principles of consonance and dissonance from the classical period, as heard in the Haydn Trumpet Concerto, which remain irrelevant to the oscilloscope’s impartial rendering of sound. As a general rule, it can be stated that the more pure is the sound (that is less overtones and regular wave) the better it looks on the oscilloscope. Complex timbres, irregular noisy waves will look chaotic on the screen and so unable to produce recognisable shapes. This fundamental question on dealing with the complexity of reality lies at the heart of my research: exploring the possibilities, limitations, and potential solutions in bridging music and visual representation.
The figures shown so far may appear three-dimensional, especially when in motion, but this is merely an optical illusion. Analog oscilloscopes operate with two inputs—managing the x- and y-axes—resulting in strictly two-dimensional representations. But is it possible to introduce a z-axis and truly enter the third dimension? With software, this becomes easily achievable, and the following examples demonstrate how a three-dimensional oscilloscope functions. By using three inputs, it’s possible to visualize more complex structures, such as triads. In video 18, you can observe the shape of a major triad, while video 19 shows the same triad with slight detuning and amplitude modulation, causing the form to shift and evolve dynamically. To enhance clarity, the line is colorized, helping to distinguish its structure since we’reobserving a three-dimensional shape on a flat screen.
As I stated in a previous chapter, my aim is to bind sound and image through the universal laws that govern different mediums. This approach carries a spiritual, almost religious, carefulness—whether drawing with pure waves, just intervals or directly shaping visuals using the timbre of an instrument or voice. However, this is my personal perspective, and for the sake of completeness and curiosity, I must acknowledge other possibilities. With mathematical functions—and especially digital tools—you can render virtually any shape on the screen. For example, it is not difficult to create the shape of a mushroom34. We begin with a simple circle (as shown above), resizing it to match the diameter of the mushroom’s stem (video 20). Next, we introduce a sawtooth wave on the right channel (y-axis), which causes the circle to move vertically at its frequency, forming a cylinder—or more precisely, a spiral (video 21). To create the cap, we add a sine wave of the same frequency to the left channel (x-axis), shaping the mushroom’s top but extending it throughout the entire stem (video 22). Adjusting the phase aligns it correctly at the top. To finalize the shape, we limit the modulation to the upper portion of the y-axis (video 23). We can modify the cap’s shape by changing the wave that generates it—for instance, using a sawtooth wave multiplied by itself (video 24). Adding another sine wave to the left channel, slightly detuned from the first sawtooth, introduces a wobbling effect, producing a psychedelic visual (video 25). Dividing its frequency by whole numbers splits the mushroom into multiple rotating copies, and finally, adding square waves to the channels multiplies the shapes, as shown in video 26. I will demonstrate how to achieve this using the software I am working with (Max-MSP) in a later chapter (see 11. Max-MSP).Beyond these examples, the oscilloscope can visualize all kinds of complex patterns, from the chaotic beauty of the Lorenz Attractor (video 27) to the algorithmic unpredictability of Xenakis' Gendy (video 28). Artist Jerobeam Fenderson even developed a software that translates 3D models into sound signals, enabling the visualization of any three-dimensional scene35—or even playing a video game—on the oscilloscope screen.