Sonic Information Design for the Display of Proteomic Data

William L. Martens



1.    Introduction


The idea of creating sound from data derived from protein analysis is not a new one, and many attempts have had interesting results, particularly when artists and biomedical scientists have collaborated in efforts to produce artefacts that are both aesthetically pleasing and potentially informative. These efforts can be thought of as naturally distributed along a continuum that ranges from a dominant emphasis on aesthetics to an emphasis on utility. In an extreme case, a dominant emphasis on utility may sacrifice aesthetics in favor of communicating the meaning of the data, often in hopes of revealing patterns in the data that are hidden from view. In a contrasting extreme case, the emphasis on aesthetics may sacrifice the chance that a listener could apprehend patterns or relations in the data. While an approach that integrates aesthetics and utility might seem to be an admirable goal for data sonification, a perfect balance between them need not be required for success. Among the various goals that might be valued, a distinct goal for an approach dominantly focused upon aesthetics could be to make science appealing to a wider audience, as was claimed by Takahashi and Miller (2007) for their conversion of genome-encoded protein sequences into musical notes “without compromising musicality.” Such attempts at protein data sonification certainly have produced pleasing results, such as that presented on the “Life Music” audio compact disc produced by Dunn and Clark (1999). However, examination of the literature describing the results of these and related efforts reveals a recurring theme that can be described as the rejection of sonification outputs that are not pleasing to the ear, or results that were described simply as “horrible” (Carey 2016).


Nonetheless, the music that resulted from the algorithms developed by Takahashi and Miller (2007) have been reported to demonstrate clearly audible differences between protein datasets that were distinguished in terms of issues related to health, such as the differences that were introduced in their sonifications related to the mutated huntingtin protein that causes Huntington’s disease. So, choosing to emphasize musical aesthetics does not necessarily doom a data sonification to fail as a means for communicating the meaning of the data. The claim that Takahashi and Miller (2007) make for the value of their conversion of data to sound is the following:


The conversion will help make genomic coding sequences more approachable for the general public, young children, and vision-impaired scientists. (Takahashi and Miller 2007: 1)


Indeed, it could be argued, as Barrass (2012) has, that increasing emphasis on aesthetics in data sonification could transform the field from a scientific curiosity and engineering instrument into a popular mass medium. He went further to propose a sonification design approach that would integrate functionality and aesthetics by dissolving divisions between scientific and artistic methods. With respect to such an effort, readers might well suspect the bulk of this essay to be dedicated to the description of a wide variety of data sonifications, positioned along the above-mentioned continuum, with approaches ranging between the poles of aesthetics versus utility. But it is not the goal of the current essay to explore avenues for musical expression offered by data sonification in terms of the popular appeal of the sonic results. Instead, this essay is explicitly dedicated to an examination of sonic information design for the most effective display of complex data, and in particular, for the effective auditory display of proteomic datasets that call for direct and immediate human apprehension. The previous statement identifies the first and most important distinction that serves to clarify the scope of the current essay. Another distinction that must be stressed at the outset is that the essay is focused upon the design of the sound material itself and not on the design of an overall sonification structure, the analysis of which might be likened to the analysis of musical structures. This distinction is not unlike that made by Wierzbicki (2014) in his treatment of the imagined sounds of outer space within a discussion of “space music” and the music of sci-fi cinema.


Whereas Wierzbicki (2014) framed his thoughts regarding the imagined sounds of space as realized in music alone, the current essay, as indicated above, will focus on an examination of sonic information design without reference to any analysis of musical structure. Thus, music-theoretic considerations are largely outside of the scope of this essay, in contrast to published work on installations that are explicitly dedicated to the “musification” of scientific data, such as that described by Visi, Dothel, Williams and Miranda (2014). This is not to say that music-theoretic considerations are not relevant to the design of sound material for effective sonification; rather, it is to say that the focus of the current essay is upon what particular elementary variations in sound material can be demonstrated to carry the meaning of sonified data.


There is a great deal of related work concerning the sonification of biomedical data in general over the past twenty years. A substantial amount of attention has been paid to the sonification of rhythms in human electroencephalogram (EEG) data, with a notable early contribution by Baier and Hermann (2004). This work is well reviewed in Hermann and Baier (2013). Mihalas et al. (2012) has completed related work on the sonification of electrocardiogram (ECG) data, with applications for heart rate analysis. They compared a straightforward approach (akin to listening to the ECG data itself) with an approach that they called “true sonification,” as it utilized a more abstract correspondence (or mapping) between the data and the sound synthesis parameters. Mihalas et al. concluded that their pitch- and loudness-based sonification enabled better inspection of compressed long series of ECG data, improved detection of arrhythmic events, and increased potential for detecting differences when comparing normal and abnormal signals. Most closely related to the current work, however, are the investigations of Visi et al. (2014) regarding musical and visual modeling of data related to the pathophysiology of amyotrophic lateral sclerosis (ALS). The paper on their work goes into greater detail about the protein structure and biochemistry than is relevant to the current essay; however, their paper also describes a multimedia installation entitled “Unfolding | Clusters” that was intended to make the process of ALS-related degradation more accessible to the greater public. Similar to the proteomic data sonification approach taken for the project described in the current essay, the installation developed by Visi et al. utilized “sonic timbres and melodic patterns in a video-synchronous spatial speaker array.” They concluded that the installation was effective in raising awareness about the disease and its workings, likely due to the “musicality” of the final result that made the piece “accessible to a wider audience while, at the same time, remaining faithful to the source data.”


An important question to be raised in this context concerns how to best select the design criteria to be applied in order to identify a successful result. The criteria for adopting particular sonification algorithms in the current work were based more upon whether the output seemed to resonate with the listener’s natural capacities for auditory pattern recognition rather than addressing the listener’s capacities for apprehending musical structure. That being said, the sonic parameters to which data were mapped here were those that are most commonly used as form-bearing elements in music, such as pitch, duration, and timbre (see McAdams, 2000). Thus, some emphasis on aesthetics remains even when the primary motivation is to uncover useful information through the sonic exploration of complex datasets. The sonic information design itself was a central component of a research project targeting problems in the display of the results of a particular proteomic data analysis, which is described in the next section.