Using Participatory Visualization of Soundscapes to compare Designers’ and Listeners’ Experiences of Sound Designs

Iain McGregor

There are numerous rules and well-established guidelines to help designers with the visual appearance of interactive technologies. In contrast, when it comes to the use of sound, there is a paucity of practical information regarding design for euphony, excepting musical composition. This paper addresses this hiatus by describing a theoretically based, practical method for evaluating the design of the auditory components of interactive technologies and media. Specifically, the method involves eliciting the auditory experiences of users of these technologies and media and comparing them with what the sound designers had intended. The method has been comprehensively tested in trials involving 100 users (listeners), and the results have been described as “useful” and “invaluable” by a group of 10 professional sound designers.

Introduction

As human-computer interaction (HCI) and interaction design abandons the desktop in favor of our pockets and purses, the hands of children, and the wider environment, so issues of non-visual modalities of interaction are being foregrounded. We currently rely on a sound or vibration to inform us that a text message has arrived, or that we are late for the dentist. Beyond these simple alerts, sound is used to great effect in the design of virtual environments (Finney and Janer 2010; Nordahl 2010; Serafin and Serafin 2004), video games (Collins 2013; Murphy and Neff 2011; Nacke, Grimshaw and Lindley 2010), and artistic installations that embrace an existing site in order to enhance and empower spaces (Batchelor 2013; LaBelle 2006; Torehammar and Hellström 2012). The use of sound in these technical systems mirrors, to some degree, the place of sound in the real world, where it provides a broader “picture” of our surroundings which, in turn, orient us within the complexities of these “information” spaces wherein we daily find ourselves.

The issues of how interaction designers should design sound for use within HCI remains unresolved, as even talking about it, much less reasoning about it, is difficult. Barrass (2005) refers to the novelty of his 1994 approach to sonification and how that laid the foundations for a comprehensive framework for auditory display design. Brazil (2010) stated that there was still no systematic approach to auditory display and sonic interaction design, but that there was ongoing work. MacDonald and Stockman (2013) highlight that auditory display design techniques are still not unified or easily understood. The expert knowledge upon which such design depends remains locked up in the professional practice of sound designers and ranges from what Anderson (1974) refers to as the available propositional or observable, through to what Polanyi (1951) describes as the unarticulated tacit or yet to be formally systematized.

Notwithstanding these problems, there have been a number of attempts at systemizing the design of sound within interactive technologies. Perhaps the most widely known is Brewster (1994), who proposed guidelines for the design of earcons. Earcons are the abstract representations of some information using sounds, first introduced by Blattner, Sumikawa, and Greenberg (1989). Mynatt (1994) concentrated on the usability of auditory icons, proposing a method of design that addressed identifiability, conceptual mapping, physical parameters, and user preference. Auditory icons are everyday sounds that correspond to computer events, as first developed by Gaver (1986). Dombois and Eckel (2011) have developed guidelines for audification – the process of transforming data waveforms into sound, first discussed by Frysinger (1990) – and are most concerned with the suitability of the source material (data-set). Despite the many different approaches to sonification, such as audification, that have been developed since the 1980s, there is still no complete set of guidelines for their design (Walker and Nees 2011), although attempts have been made to formalize the process since (Kramer et al. 1999).

Frauenberger and Stockman (2009) have evaluated the work of Barrass (2003), suggesting that patterns of auditory design can be developed and tagged with keywords to enable specialist and non-specialist designers to access this knowledge. In the style of Alexandrian patterns (Alexander 1979), the approach is to describe typical design problems and solutions. They note that identifying patterns from individual sound design solutions is difficult and that this problem could be lessened by increasing the size of the community contributing patterns as well as allowing sufficient time for patterns to be generated and shared. Despite these efforts, sound design for auditory displays remains something of a “black art”, being confined to a gifted few (Alves and Roque 2011).

While sound design within interaction design is poorly understood, this is not the case for sound in other design disciplines. Sound designers for games typically specialize early in their career to learn the craft and the professional tools that are required for their trade. Sound designers of soundscape installations in public buildings are often musicians who have a deep knowledge of how to affect people’s feelings and behavior through sound (Hellström, Dyrssen, Hultqvist, Mossenmark and Sjösten 2011). Sound designers for radio and theatre predominantly train as recording studio or live music engineers first, and of course film schools provide another specialist route for film sound designers (Touzeau 2008). Within film, Walter Murch was the first to describe his work as sound design, as he moved the sound between mono, stereo and quadrophonic for the film Apocalypse Now (LoBrutto 1994). Ben Burtt was the first to work as what is now the recognizable role of a sound designer on Star Wars Episode IV –A New Hope, being credited for special dialogue and sound effects (Whittington 2007).

However, as sound becomes an increasingly mainstream part of interaction design, we need to find ways of accessing the design knowledge of these specialist designers by mainstream interaction designers. Sounds rarely exist in isolation, so designers need to be able to represent the whole “soundscape” of an auditory display, including the direction and loudness of the different sound events and how the display changes over time. There is a simple equivalent for the quick sketch of a visual designer: vocal sketching is an effective approach for creating monophonic sounds using the human voice. There are issues with the length of sounds and breath as well as creating complex sounds and requiring multiple contributors for harmonies (Ekman and Rinott 2010; Tahiroğlu and Ahmaniemi 2010a;Tahiroğlu and Ahmaniemi 2010b). There are a small number of professional practitioners who can successfully create complex sounds and who are less limited by the length of breath required, although they predominantly specialize in animation for film, games and television (F. Newman 2004). Giordano, Susini and Bresin (2013) point out that if continuous evaluation is required, then it is advisable for participants to listen to recordings as passive listeners rather than vocally sketch the sounds themselves. While a visual sketch remains fixed on the page, sound is temporal by nature, so it can be difficult to prototype a soundscape for user evaluation. It is, of course, easy to create and play a tune, or a sound effect, but even with current software, it remains difficult to present this in a way that designers can understand what it will be like when it is fully orchestrated and deployed as part of an overall user experience.

Our way of dealing with these issues is to visualize the soundscape, so that complex temporal data can be captured and analyzed in order to highlight similarities and differences in listening experiences. Visualizations can allow designers to view data quickly, identify problems and provide a consistent form of interpretation. Just as a web site designer will provide a wire frame of the design to get reactions from the clients, so we seek to provide a visualization of a proposed soundscape design. This needs to capture the foreground and background sounds, the different types of sound, and of course the change of the soundscape over time. Allowing designers to use a visual form to represent a sound design allows them to validate their designs quickly and with confidence.

Listening, soundscapes, and sound design

Listening and hearing are different (Handel 1989), and Szendy (2008) tells us that we can choose to listen. Madell and Flexer (2008) define hearing as the acoustic mechanism of sound being transmitted to the brain, whereas listening is the process of focusing and attending to what can be heard. Thus, listening is an active process comprising conscious choice and subjective interpretation of what is heard (Blesser and Salter 2007).

A soundscape can be defined as the surrounding auditory environment that a listener inhabits (Porteous and Mastin 1985; Rodaway 1994; Schafer 1977). The soundscape surrounds the listener and is an anthropocentric experience (Ohlson 1976). The definition has not been standardized, but there is on-going work to create an ISO standard in order to establish its definition, conceptual framework, as well as methods and measurements of its study (Brown, Kang and Gjestland 2011; Davies et al. 2013). There is no complete model of the soundscape, as interpretation is affected by the sounds which can be heard, the acoustic space which affects the sounds, and listeners’ interpretations based upon what and how they are attending to the sounds (Davies 2013).

Luigi Russolo, as part of his 1913 Futurist manifesto, encouraged musicians to analyze noise in order to expand their sensibilities (Russolo, Filliou, Pratella and Press 1967). Granö differentiated between the study of “sound” and “noise” in 1929. He mapped auditory phenomena with reference to the “field of hearing” rather than “things that exist”. Granö (1997) did not use the term soundscape; instead the concept of proximity was applied, which represented the area immediately surrounding an inhabitant. The concept was revisited in 1969 when Southworth tried to establish how people perceived the sounds of Boston and how this might affect the way they experienced the city (Southworth 1969). Schafer (1977) and Truax (2001) attempted to formalize the concept using descriptions derived from existing terms such as soundmarks, rather than landmarks. Schafer (1993) argued that all soundscapes should be designed or regulated to display what he terms high-fidelity (distinct, easily interpreted sounds), rather than low-fidelity (indistinct, difficult to interpret sounds). Soundscapes and the individual sounds that make up a soundscape have been shown to have a physiological and psychological impact upon listeners (Cain, Jennings and Poxon 2013). Sounds that are considered unpleasant cause a reduction in heart rate, and pleasant sounds lead to an increase in respiratory rates (Hume and Ahtamad 2013).

The work of the sound designer is to create an aesthetic combination of sound events, to produce a soundscape that is informative and/or evokes an emotional response in the listener. For example, in film and other linear media, sound may be used as a sleight-of-hand, making the audience believe that something has happened (Chion 1994). Video game sound designers have adopted many of the techniques associated with film sound (R. Newman 2009), but have added interactivity so that some of the sound events are directly controlled by gamers’ actions, whilst other sounds remain passively experienced within non-interactive sequences (Collins 2008).

Sound designers routinely manipulate the attributes of sound as part of their everyday practice. These include the sound’s pitch, loudness, timbre (or overall quality of the sound), duration, and direction. For example, the length of a sound can be used to convey a character’s emotions, such as a longer doorbell ring suggesting impatience (Kaye and Lebrecht 2000). The length of a silence (or lack of sound) can be useful to convey the passage of time or a change of location (Beaman 2006). Changing a sound’s pitch can make objects seem larger or smaller or alter the age or gender of a character (Beauchamp 2005; Collins 2008). Spatial cues, such as panning, can provide an insight about what a character is attending to (Beck and Grajeda 2008; Kerins 2010).

In interaction design, designers of auditory displays are concerned both with sounds being considered informative as well as creating appropriate acoustical properties (Brewster 2008; Buxton 1989). For example, Gaver’s Sonic Finder used auditory icons such as a scraping sound for objects being dragged across a computer desktop and a scrunching sound for putting a file in the wastebasket. Similar sounds are used on Apple’s operating system to this day (Gaver 1989). Microsoft’s Outlook email client, in contrast, uses abstract earcons, such as a soft tinkling when an email arrives in the user’s in-box.

Classifying listening experiences

Before designers can present and evaluate their designs for sound events and soundscapes, they need to establish what characteristics of sounds are most important – in short, they require a vocabulary. Each researcher describes sounds from their own perspective: some focus on the spatial characteristics of sounds, others on the dynamics or the aesthetics, and others may include additional qualities such as whether a sound is a background noise. The following brief treatment of some key writers in this field offers a flavor of the resulting incertitude.

Schafer (1977), in one of the definitive treatments of sound, was concerned with a sound’s estimated distance and its environmental factors such as reverberation. Gabrielsson and Sjogren (1979) identified the feeling of space and nearness associated with sound events, while Amphoux (1997) added orientation and reverberation. Hellström (1998) tended this by proposing enclosure, extension, center, distance and direction, and Mason (2002) highlighted the width, diffuseness and envelopment.

Attributes concerned with the dynamics of sound have also been highlighted: Schafer (Schafer 1977) focuses on the intensity of a sound, Gabrielsson and Sjögren (1979) identified loudness, Amphoux (1997) was concerned with scale, and Hellström (1998) specified strong and weak dynamics.

Temporal attributes included duration (Schafer 1977), atemporality (Amphoux 1997), and rhythm (Hellström 1998). Spectral attributes related to both frequency and timbre, with Schafer (1977) identifying the brightness or darkness and fullness or thinness of timbre. A full sound has a broader range of spectra, while a thin sound has a much narrower range. Hellström (Hellström 1998) focused on both pitch and timbre, and Mason (Mason 2002) referred to timbral frequency.

Aesthetics were considered by Gabrielsson and Sjögren (1979) and Amphoux (1997). Clarity was specified in terms of “hi-fi” or “lo-fi” environments by Schafer (1977) and as clearness and distinctness by Gabrielsson and Sjögren (1979).

This is by no means exhaustive. However, our own recent work in this area aims to simplify and clarify this diversity.

What listeners hear

While sound designers can guide listeners by providing clues about what they should be attending to (Kerins 2010; Sonnenschein 2001), there has been relatively little work on directly comparing listener and sound designer experiences. There has, however, been much work distinguishing between musicians’ and less experienced listeners’ experiences in the field of psychoacoustics (Bharucha, Curits and Paroo 2006; Marie, Kujala and Besson 2012; P. M. Paul 2009). Listening tests have been conducted within product design for the last 50 years or more and involve experienced (trained) listeners (Engelen 1998; Frank, Sontacchi and Höldrich 2010; Soderholm 1998).

Rumsey (1998) tells us that there are high levels of agreement when participants are experts, whereas non-experts’ responses are likely to vary more. Bech (1992) suggests that increasing the number of participants can improve the level of confidence in the findings. Yang and Kang (2005) highlight the differences between measurements and evaluations and how much they can vary, especially when it comes to different types of sound sources and levels of pleasantness. Listener testing is limited to products such as audio reproduction equipment and vacuum cleaners and has not migrated into mainstream media, and only partially into computing (Bech and Zacharov 2006). Tardieu, Susine, Poisson, Kawakami and McAdamas (2009) found that laboratory tests of sound signals (earcons) do not fully correspond with tests conducted under real world conditions.

In a previous study, the authors attempted to establish whether listeners have the same listening experience as the person who designed the sound (McGregor and Turner 2012). Surprisingly, there was little evidence as to whether what is designed to be heard is what is actually heard. A repertory grid technique was adopted using listener and designer generated constructs. One designer and 20 listeners rated 25 elements using the same attributes (descriptors) used in this study, within a surround sound recording created by a soundscape generative system. The listeners’ modal response was compared to the designer’s. The results suggested that it is perfectly feasible to compare designers’ and listeners’ experiences and to establish points of agreement and disagreement. The authors demonstrated an ontology of sound based on user experience rather than a designer’s training, with an approach based upon long-term experiences and listeners’ conceptualization of sound.

Visualizing Soundscapes

We are, of course, not the first to propose visualizing sound and its attributes. The painter Wassily Kandinsky translated atonal music into canvases (Brougher and Zilczer 2005). Another artist, Roy de Maistre curated in 1919 the exhibition Color in Art where the musical notes A to F were converted to colors (A = red), paintings were accompanied by music, and color charts were made available for the audience (Alderton 2011). Gibson (2005) displayed different frequencies using color, as did Matthews, Fong and Mankoff (Matthews et al. 2005). Circles have been regularly used to represent sound and can be found in a variety of visualization schemes (Azar, Saleh and Al-Alaoui 2007; Frecon, Stahl, Soderberg and Wallberg 2004; Helyer, Woo and Veronesi 2009). Servigne, Kang and Laurini (2000) proposed that the varying intensities of noise could be visualized by altering the radius of circles. Gibson also adopted this approach by indicating the volume of a sound in a mix by the object’s size, with louder being larger than quieter. Shape has been used to designate the articulation of musical notes: legato = rounded, staccato = polygon (Friberg 2004).

Abstract shapes have also been applied to visualize phonemes that are not recognized by a phoneme recognition system, with high frequency sounds having spiky irregular forms (Levin and Lieberman 2004). Opacity has been used to a limited extent to communicate the volume or loudness of a sound event (Mathur 2009; Radojevic and Turner 2002; Thalmann and Mazzola 2008). Servigne, Laurini, Kang and Li (1999) suggested that graphic semiology would be appropriate for displaying sounds, proposing that smiling faces overlaid onto a map could be used to display a participant’s preferences, a smile represented “nice”, a neutral expression “neutral”, and a frown “not so good”. Bertin’s 1967 theory of cartographic communication was used to create the visualization. Bertin proposed that the visual variables of shape, size, value, orientation, hue, texture, x and y coordinates could be applied to point, line, and area symbols. Monmonier (1993) argued that Bertin’s variables were also suitable for text, which could act as symbols within visualizations.

Figure 1 holds the set of symbols we have developed iteratively based on the literature highlighted above to represent the components of a soundscape (McGregor, Crerar, Benyon and Leplatre 2008; McGregor, Leplatre, Turner and Flint 2010). Each attribute of a classified sound event is visualized according to the symbols below, and then placed on a grid according to its perceived spatial location.

Using Participatory Visualization of Soundscapes to compare

Designers’ and Listeners’ Experiences of Sound Designs

Iain McGregor, Phil Turner and David Benyon

Figure 1: Symbols used for visualizing sound events within a sound design

Method

The same sound design can be experienced differently by listeners based upon their personal interests and training. Visualizing soundscapes provides an insight into listeners’ experiences so that they can be easily compared. To be effective for designers, the soundscape visualizations need to be applicable to a wide range of soundscapes, such as auditory displays, games, films, and so on. Accordingly, ten sound designers for different media were asked to design a soundscape that they would be interested in having visualized (see Table 1). Listeners listen to a sound design, classify all of the sound events that they are aware of using the attributes shown in table 2, and then the results are collated and visualized to illustrate both the designer’s and listeners’ experiences.

Participants

Ten professional sound designers were recruited via email. They worked professionally in a variety of fields from interface design through to games, film, television and radio. The 100 listeners were either staff or students at Edinburgh Napier University. None of the listeners had previously taken part in a listening study before. The participants all considered themselves to be without hearing difficulties and ranged in age from early twenties to late fifties. Both male and female participants took part with a ratio of approximately 3:2.

Materials

The ten sound designers were asked to supply a sound design that they would like to have visualized. The choice of design was left to the sound designer, and no guidance was given about length or complexity. The tests were conducted in a quiet office with stereo loudspeaker reproduction, except for design 9, which required a surround sound system located in an isolated, acoustically untreated room.

2.3 Design

For six of the designs, participants were asked first to listen to the complete design and then classify the sound events. For the other four sound designs, participants were played short sections and asked to rate specified sound events based upon what they had just heard. The decision as to which approach was adopted was left to the designers. Questioning about the attributes of each sound event was conducted verbally, with listeners having access to the grid (for identifying spatial attributes) and the list of attributes (see Table 2). The classification itself was based on the principle of a common language, having been derived from a lexicon generated from descriptions used by participants to describe what they were listening to (McGregor, Leplatre, Crerar and Benyon 2006) and a questionnaire where audio professionals were asked for terms that they used to describe sounds (McGregor, Crerar, Benyon and Leplatre 2007). This meant that the resultant terms should be meaningful to both groups.

Procedure

The procedure involved classification, visualization, and a survey. The designers rated their designs based on the specified attributes and forwarded them for visualization. Listeners then classified the designs using the same attributes, and the results were visualized.

Listeners were randomly assigned to each sound design until 10 participants had experienced each design. All of the participants were able to complete all of the tasks without prompting. The responses from the listeners were collated, and the mode for each attribute was calculated for each sound event. The results were translated into two different visualizations. The first visualization represented the designer’s intentions, the second one the combined listeners’ experiences. For this iteration the visualizations were generated manually. However, an automated version has been proposed, so that listeners and designers could create their own visualizations in the future. The results for all twelve designs are shown in the following section along with brief discussions.

Table 1: Summary of sound designs provided by the designers for visualization

Table 2: Attributes for classification

Results

Four case studies were chosen to highlight in order to illustrate the procedure. The remaining six designs are briefly reported in section 3.5. The experts’ evaluation can be found in section 4.1.

Design 01: Auditory Display

Auditory displays have been defined by Kramer (1994) as an interface between users and computer systems using sound and are considered a natural extension of the way in which sound is used in the physical world. Auditory displays differ from auditory interfaces in that they operate unidirectionally. An interface allows audio to be used as input as well as an output, but does not require audio to be used as an input, whereas a display provides only output (McGookin and Brewster 2004). Speech interfaces are a specialist type of auditory interfaces that are predominantly confined to speech (Raman 2012). Auditory displays can be split into the user interface audio and audio used in visualization. User interfaces include earcons, auditory icons, sound enhanced word processors (text to speech), and other applications, whilst sound in visualization includes audification, sonification, and auralization (Vickers 1999).

The sound events for the auditory display had been designed for a large manufacturer of electrical appliances for a variety of their products (Audio 1). The designer recorded no spatial cues, as the sound events were tested in isolation rather than within products (see Table 3). The designer made limited use of the material and interaction attributes, recording this information for only 9 of the 32 sound events. However, all of the other attributes were applied (see Figure 2). The majority of the sound events were considered by the listeners to be informative (square) and clear (opaque) (see Figure 3). Three of the sound events that were considered to be pleasing by the designer were found to be displeasing by the listeners (border width). There is a clear difference between the designer’s and listeners’ classification of music (musical notes symbol) and sound effects (loudspeaker symbol), with the designer considering the majority of sound events to be music, possibly due to the prominent use of earcons, which are often considered to be musical in nature by designers. The listeners predominantly classified these sound events as sound effects. They considered the aesthetics of the sound events to be more evenly distributed than the designer, who considered more to be either pleasing or displeasing. The listeners also classified more sound events as neutral than the designer, who considered the majority to be positive (emoticons).

As an auditory display, the sound design might be regarded as successful, as 26 out of the 27 sound events that were classified as informative by the sound designer were also classified as informative by the listeners (see Table 4). Similarly, 31 sound events were classified as clear by the listeners. The 3 sound events that were rated by the listeners as displeasing (AR, AT and BD) along with the 2 that were found to be uninformative (AT and AX) might benefit from further review. The major difference in the listeners’ and designer’s rating of the auditory display was in terms of the sound events being considered as sound effects rather than music by the listeners. The similarities were far more prevalent, especially in terms of the Temporal, Spectral, Dynamics, Content and Clarity attributes. This might mean that listeners did not fully appreciate the hierarchical associations that are inherent in earcon design. Listeners may perceive each earcon as a separate sound effect, which would require them to learn each icon individually, rather than recognize musical similarities.

Design 02: Sonification

Sonification refers to a technique for transforming data into an audible stream that is analogous with data visualization (Kramer et al. 1999). It can be argued that a sonification method must be objective, systematic, reproducible as well as suitable for use with different data (Hermann 2008). Data can be split into auditory streams where each stream is linked to a specific audio variable such as pitch, volume, note duration, fundamental wave shape, attack (onset) envelope, and overtone (harmonics) wave shape. This can make the data not only more informative, but potentially increase the amount of information that can be transmitted concurrently (Bly 1982).

This soundscape consisted of a 56 second video of an acceleration trace from a four man coxless rowing team sonified using a continuous tone that varied in pitch (Video 1) (Schaffert, Mattes and Effenberg 2010). The sonification is designed to help athletes improve their performance (see Table 5).

The designer classified all of these sound events as sound effects, with values of gas for the material, informative for the content, and clear for the clarity attribute. Interactions varied from impulsive to intermittent, with a single instance of continuous (see Figure 4). The listeners were aware of all 8 sound events, and considered all but 1 to be informative. The listeners grouped the materials of the sound events into 1 gas (magenta border), 5 liquid (cyan border), 1 solid (yellow border), and 2 as both liquid and solid (see Figure 5). Listeners experienced a greater range of spectral attributes than the designer.

The sonification could be considered as successful, as almost all of the sound events were considered informative, and listeners were able to distinguish between the differences in pitch (see Table 6). The range of pitch variation could be increased so that it extended into the low range and some form of panning might be considered, if only to move the sound events into the center of the stereo field. The designer’s and the listeners’ ratings for Type, Dynamics, Clarity and Emotions were identical. The Spectral, Content and Aesthetics attributes only differed slightly. The main differences between the designer’s and listeners’ responses were with the Y axis (depth), Material, Interaction and Temporal attributes. Whilst listeners found all of the elements informative they did experience the sound events as being further away, as well as sounding more liquid like than gas, and the Interaction being more continuous than impulsive.

Design 03: Simulation

A variety of systems exist for simulating soundscapes and/or acoustical environments. The simplest is to record an auditory environment using a multichannel microphone or multiple microphones and then to reproduce the recording through multiple loudspeakers (Bertet, Daniel and Moreau 2006; Holman 2000). More complex interactive systems are available, where a sound designer records all of the original samples and composes, or creates, a set of rules and parameters for real time soundscape generation (Schirosa, Janer, Kersten and Roma 2010; Valle, Lombardo and Schirosa 2009). Procedural audio systems, where all of the sounds are generated artificially, are also available and are commonly found in video games (Farnell 2011; L. J. Paul 2010).

A 7 minute and 25 second simulation of the soundfield of a multimedia laboratory and its immediate environment was created for this soundscape (Audio 2). A soundfield can be defined as the auditory environment surrounding a particular sound source. A soundfield represents the quantifiable characteristics of a sound source or event (Ohlson 1976). The simulation was created using a non-linear sequencing model called GeoGraphy (Valle et al. 2009). GeoGraphy had previously been tested by comparing simulations with recordings from real environments and asking listeners to identify which was which. Each sound event is a single zone, and the descriptions represent the sounds that were used to create the zone (see Table 7).

The designer considered the content of 9 of the sound events to be informative, 5 neutral and 5 uninformative. These were visualized as in Figure 6, showing different shapes to represent the different content. Sound events such as the photocopier and the film were informative, the sounds of people’s actions such as drying their hands or footsteps were neutral, and room tones were uninformative. Listeners were unaware of two of the sounds associated with the washing of hands (AJ and AK) (see Figure 7). The listeners thought that 14 sound events were informative, 1 neutral and 2 both informative and neutral. Listeners might have been trying to make sense of what they were listening to and constructing a narrative in order to understand the sequence of sound events. This could be attributed to the number of sound events that the listeners found to be clear (15), which contrasts with the designer, who rated 9 of them as clear and the remaining 10 as unclear.

The sound types were consistent (see Table 8), with all of the sound events being categorized identically by both the listeners and the designer. Four of the sound events were considered to be speech, with the remainder (15) being sound effect. There were no instances of music. The designer considered the sound events to have a greater difference in dynamics: 7 were loud, 7 medium and 5 soft. The listeners found 3 to be loud/medium, 1 soft, and the remainder (16) medium. This might suggest that the variation in dynamics is too subtle and that a greater difference needs to be applied in order to convey the range intended by the designer. More sound events were considered clear and informative by the listeners than by the designer, which is probably due to the artificial nature of listening out of context. The Aesthetics and Emotions aspects of the sound design were not communicated effectively, with almost all of the sound events being neutral.

Design 04: Game Sound Effects

The fourth design utilized sound effects used for a commercially released console video game. All of the sound events were part of a company’s sound library, for designers to use in the construction of games. Eight separate audio files were included; the shortest was less than 1 second long and the longest 1 minute and 19 seconds (Audio 3). Half of the files, which were all recordings of a female voice speaking single words, were single sound events, and the remaining 4 were atmospheric constructs with between 3 to 5 sound events (see Table 9).

The designer considered all of the 18 sound events to be informative (see Figure 8), and either speech or sound effects. Full use was made of the range of the remaining attributes. For the material attribute, gas was predominantly used to classify the voices, most of the “birds”, and some of the dogs. Liquid was consistently chosen for “water”, and solid was applied to “kiss”, “hit”, and some of the dog sounds. There was increased consistency for the Interaction attribute. The designer used continuous to classify only the water sounds, all of the birds were intermittent, and all of the voices were impulsive. Only the dog sounds were inconsistent, being either impulsive or intermittent. The majority (10) of the sound events were temporally short, only 3 were medium, and 5 were long. Atmospheric sound effects, such as the waterfall, tended to be temporally long, whereas speech was either short or medium.

The listeners rated only 12 of the 18 sound events as being informative (see Figure 9). Four were found to be uninformative, 1 was neutral, and 1 was both informative and uninformative, illustrating that there were contradictory responses. Each of the sound events classified by the listeners as uninformative, as well as the single neutral sound event, were speech. Three of these were also unclear, whilst the remaining 2 were clear. The designer regarded only 1 of the sound events as unclear.

When considering the sound design as a whole, sound effects can be considered successful when they are informative and convey the required emotions accurately. There is a difference for the two groups with regards to speech (see Table 10). The emotions are not conveyed, being consistently considered as neutral by the listeners as well as predominantly uninformative. However, the designer judged them to be both informative, conveying either positive or negative emotions. This is perhaps due to a problem with the dialogue delivery rather than the sound design. More sound events were considered clear by the listeners than by the designer, which may be due to the artificial nature of the task, where sound events were listened to in isolation, without reference to a game.

Other designs

Design 05 was a short film that had music and sound effects, but no dialogue (Video 2). The designer identified 45 sound events, but only 23 of these were recalled by the listeners (see Table 11). Twelve of the events that listeners were unaware of were classified as uninformative by the designer, but all the events that the listeners were aware of were classified as informative (18) or neutral (5) (see Figures 10 and 11).

Design 06 was a 30 second soundscape composition, composed of a piece of music and sound samples of a person playing flute by a stream (Audio 4). The designer identified 15 sound events and made full use of the width and depth codes (see Table 12, Figures 12 and 13). The listeners were aware of 9 of the events and did not perceive such a wide spatial distribution. They combined the two pieces of flute music into a single sound event.

Design 07, a 42 second section from a radio drama, consisted of 14 sound events, 5 speaking characters, and 9 sound effects (Audio 5). One event identified as unclear by the designer was not noticed by the listeners, but otherwise they were aware of all the events (see Table 13, Figures 14 and 15). The characters were also classified in emotional terms, with the Aesthetics as being neutral, rather than pleasing or displeasing.

Design 08 was a set of audio logos (audio branding) that often form part of an advert; here the aesthetics, clarity, and emotional response were most important (Audio 6). The designer considered all but 4 of the sounds to be pleasing, whereas the listeners classified only 5 as pleasing (see Table 14, Figures 16 and 17). The outcome of this evaluation could be useful to feed back to the designers what people actually thought of the designs.

Design 09 was an abstract composition, included to see if the visualizations could be used for representing complex soundscapes (Audio 7). It was presented in surround sound to the listeners. There were 26 sound events, all classified by listeners and the designer as sound effects, and the listeners were aware of all of these (see Table 15, Figures 18 and 19).

Design 10 consisted of a 30 second audio sequence of film sound effects (Audio 8). The listeners were only aware of 18 of the sound events out of a total of 32 (see Table 16, Figures 20 and 21). Listeners were unaware of all of the sound events that the designer classified as soft. However, it did not follow that listeners were aware of each loud sound event.

In table 17 it is possible to see that a few of the attributes were rated similarly such as Type, Temporal, Spectral and Emotions. There were small differences in Material in relation to the rating of liquid and gas. There were pronounced differences in Interaction, Dynamics, Content, Aesthetics and Clarity. In terms of Interaction listeners rated more sound events as continuous than the designers. For Dynamics listeners tended more towards the mid value, whereas for Content listeners more often rated the sound events as informative than the designers did. Finally, listeners found a greater percentage of the sound events to pleasing and clear than the designers considered them to be.

Figure 12: Designer’s soundscape composition visualization

Figure 13: Listeners’ soundscape composition visualization

Figure 18: Designer’s Abstract composition visualization

Figure 19: Listeners’ Abstract composition visualization

Figure 8: Designer’s game sound effects visualization

Figure 9: Listeners’ game sound effects visualization

Table 17: Summary of classification for designs 5-10

Figure 3: Listeners’ auditory display visualization

Figure 2: Designer’s auditory display visualization

Figure 16: Designer’s Audio logos visualization

Figure 20: Designer’s film sound effects visualization

Figure 14: Designer’s radio drama visualization

Figure 15: Listeners’ radio drama visualization

Figure 17: Listeners’ Audio logos visualization

Figure 21: Listeners’ film sound effects visualization

Figure 4: Designer’s sonification visualization

Figure 5: Listeners’ sonification visualization

Figure 10: Designer’s short film visualization

Figure 6: Designer’s simulation visualization

Figure 7: Listeners’ simulation visualization

Figure 11: Listeners’ short film visualization

Audio 7: Abstract composition (stereo mix)

Table 15: Key for figures 18 and 19

Table 10: Summary of classifications

Audio 1: Auditory display selection

Table 4: Summary of classifications

Table 6: Summary of classifications

Table 8: Summary of classifications

Table 11: Key for figures 10 and 11

Audio 4: Soundscape composition

Audio 6: Audio logos (Echo, 2013)

Table 14: Key for figures 16 and 17

Table 12: Key for figures 12 and 13

Table 13: Key for figures 14 and 15

Table 3: Key for figures 2 and 3

Table 5: Key for figures 4 and 5

Table 7: Key for figures 6 and 7

Table 9: Key for figures 8 and 9

Audio 3: Game sound effects

Table 16: Key for figures 20 and 21

Audio 8: Film sound effects

Audio 5: Radio drama

Video 1: Sonification

Audio 2: Simulation

Video 2: Short film

Discussion

Designers chose whether they wished the listeners to be able to listen only once or multiple times. When the listeners were able to listen repeatedly, they were aware of every sound, which led to a 100% positive score for awareness. Within the 6 designs that listeners could not listen to repeatedly there were a total of 144 sound events. Listeners were aware of 98 of the sound events, which represented a level of 68% awareness. Listeners were unaware of sound events that did not have a recognizable source, such as “synth ambience” in Short Film or “stretching in and out synth transition” in Film Sound Effects. Sound events that the designers considered displeasing, such as “bathroom sounds” in Simulation or “weird branches coming out of the mouth” in Short Film, also went unnoticed by the listeners. This lack of awareness might be due to the sound event being regarded as uninformative by the designer, for example “girl’s voice” in Radio Drama and “rock first hit” in Soundscape Composition.

Spatial cues were used by 9 out of 10 designers and are well reflected in the subsequent visualizations that show the differences between the designer’s intention and the listeners’ reactions. Listeners only perceived 2 of the designs as having motion (Composition and Film Sound Effects). They thought that the design with the greatest amount of motion was Composition, with 69% of the sound events being regarded as having motion. The designer of Composition considered 42% of the sound events to have motion. For stationary sound events, the designers used almost the entire X axis (panning) and the entire Y axis (depth). The listeners experienced slightly less panning and depth for the static sound events. For sound events that had motion, the entire X and Y axes were used by the designers. In contrast, the listeners experienced the entire range of panning but a lesser amount of depth.

The Type attribute (music, sound effect, speech) is quite intuitive. Speech was predominantly used to classify identifiable words or phrases by both the designers and the listeners, such as “I’m calling you (Man)” in Auditory Display or “Butler’s voice” in Radio Drama. Music was chosen for the most part when there was a clearly identifiable melody such as “dub music” in Short Film or “flute music A” in Soundscape Composition. Sound effect was used for a wide range of sound events; examples include “birds” in Audio Logos and “recovery phase” in Sonification.

When classifying the Material attribute of sound events, gas was often chosen for sound events that involved the movement of air as in “the tonic (keynote)” of Laboratory, “the wheels of personal computers” in Simulation, or the “jet-entry” in Film Sound Effects. Liquid was predominantly selected for sound events such as the “waterfall” in Games Sound Effects and “water trickling” in Soundscape Composition.

For the classification of the Interaction attributes, impulsive was primarily used for percussive-type sound events, such as “message knocking” in Auditory Display, or “drumming fingers on a desk” in Simulation. Intermittent was chosen when sounds had a percussive element but had an underlying sustained element beneath it, as in the “dog growl” in Games Sound Effects or the “juddering anacrusis” in Composition. Continuous was applied when there was a sustained sound event without any obvious percussive elements; examples include the “room ambience” in Short Film and the “background ambience” in Composition.

Within the Temporal attributes, short was chosen for brief non-repeating sound events, such as the “poof flash” in Film Sound Effects or the “voice ‘xxxxx’” in Audio Logos. Medium was used when a sound event was of indeterminate length, neither short nor long; examples include “Sid’s voice” in Radio Drama and “sounds emitted by a key-holder while someone is walking in the passage, other noises, some steps, aeration ducts” in Simulation. Long was applied to extended uninterrupted sound events, such as “theme music” in Auditory Display or “water” in Games Sound Effects.

The Spectral attribute high was commonly applied to bright percussive sound events both natural and man-made; these included “loud chirp” in Soundscape Composition and the “front catch” in Sonification. Mid was chosen for sound events that fell between high and low as well as for sound events that had broadband spectral content. Examples of broadband content include the “distorted evolving pad” in Short Film and “jet engine fires” in Film Sound Effects. Low was selected for obvious bass content, as in “bass rumble” in Composition and “leather bass drum” in Audio Logos.

When Dynamics attributes were applied, loud was often used for short prominent sound events, such as “gunshot” in Short Film or “mid catch” in Sonification. Medium was chosen for moderate intensity sound events that provided context for a further action; examples include “gun loading” in Short Film and “safe door jiggled” in Radio Drama. Soft was used to classify gentle sound events that formed an auditory backdrop; examples include “background ambience” in Composition and “birds” in Audio Logos.

The most obvious examples of informative sound events for the Content attribute were those associated with warnings such as “low battery alert” in Auditory Display or “Ringing bell (doorbell)” in Radio Drama. Neutral was applied to sound events that were neither regarded as necessary nor unnecessary to comprehend the sound design. Examples include “chirping beeps 1” in Film Sound Effects and “big leaf crunch” in Soundscape Composition. Uninformative was retained for those sound events which were considered unnecessary, as in “leather bass drum” in Auditory Display or “voices” in Games Sound Effects, the latter of which was only uninformative from the listeners’ perspective.

Within the Aesthetics attributes, pleasing was predominantly applied to positive sound events that came from an acoustic source; examples include “birds high and loud” in Games Sound Effects and “classical guitar” in Audio Logos. Displeasing was often chosen for sound events that had negative associations, such as “dog growl” in Games Sound Effects or “distorted scream” in Short Film. Neutral was used for abstract sound events that had no physical analogue, such as “back reversal” in Sonification or “ripping detritus-drop” in Composition. The aesthetic ratings appear to be closely related to emotional responses rather than to whether a sound was considered beautiful.

Within the Clarity attribute, clear was often applied to explicit sound events that were foreground in the designs; examples include “the emission sounds of a television: a woman’s voice” from Simulation and “woman’s voice” from Radio Drama. Unclear was used for sound events that, whilst still audible, were difficult to discern, as in “female voice ‘Tomorrow’” from Games Sound Effects and “background ambience” in Composition. Neutral sound events were those which were regarded as neither clear nor unclear; examples comprise “rock bounce” from Soundscape Composition for the designers and “warning spearcon” from Auditory Display for the listeners.

In terms of the Emotions attributes, positive was applied when a sound event with obvious affirmative associations, such as “kiss” in Games Sound Effects or “success” in Auditory Display. Neutral was used when the sound events were abstract; examples include “drive phase” from Sonification and “building transitional whoosh” in Film Sound Effects. Negative denoted sound events that were designed to have an unpleasant effect; these included “door” in Audio Logos and “Chetwood’s voice” in Radio Drama.

In general terms the designers’ responses were weighted towards the middle value in 5 of the attributes (Spectral, Dynamics, Aesthetics and Emotions) (see Table 17). Only one of the attributes (Material) had a value of under 10% (liquid, 7%) according to the designers. With regard to the listeners, 4 of the attributes had responses that fell below 10%: music (9%), uninformative (6%), neutral clarity (7%) and unclear (5%). For 6 of the attributes, the internal ranking of responses was consistent between the designers and listeners, although the percentages differed. The 4 remaining attributes that did not have consistent ranking between the two groups were Type, Interaction, Aesthetics and Emotions, although the majority response was always the same. Listeners rarely rated any sound event as music unless it was exclusively musical; in contrast, the designers rated sounds that had musical elements as music. The contrast between the Material and Interaction interpretations could be a case of degrees of differentiation. Both solid and impulsive were interpreted reliably, but there was obvious variation in terms of gas/liquid. This could be an area where training is required to produce consistent differentiation. Some of the variation in responses for Clarity and Content may be due to listeners being asked to consider sound events in isolation, and being provided with descriptions rather than having to interpret what they were listening to without guidance. The differences in Emotions and Aesthetics might be due to the designers’ applying more subtlety in their sound designs than the listeners could interpret. The central weighting for Dynamics with listeners could be solely down to the reproduction apparatus, in that the designers had access to equipment with improved dynamic range. The consistency for Temporal and Spectral attributes may be down to an inherent familiarity, irrespective of training. In conclusion, all of the attributes were used by both the designers and the listeners, and as such appear to be suitable for describing soundscapes.

In order to address the reliability of the attributes, we examined 120 conditions of which a small number (21) proved to be of interest (see Table 17). When comparing designers’ and listeners’ responses, none of the attributes could be considered reliable for all of the sound designs, and interaction was not reliable for any design. However, for three of the sound designs there is a significant level of reliability for a limited number of attributes. There are two factors to consider. First, the method may not be reliable or valid: listeners and designers agree sometimes but not at other times, because the method is flawed in some way, the most obvious example being in describing interaction. Second, it is also possible that expert knowledge is very different from non-expert (Alves and Roque 2010; Cattell, Glascock and Washburn 1918; Kaufman, Baer, Cole and Saxton 2008), and we may be comparing apples with oranges – both fruit, but one is of the genus Malus, the other of the genus Citrus. What Table 18 shows is that in certain instances listeners’ and designers’ experiences can be compared with confidence, but that the scope is limited.

Coleman (Coleman 2008) highlights the distrust that designers have for non-experts’ descriptions of auditory environments. Audio professionals spend a considerable amount of time learning to shift between critical and natural listening. The visualization allows a comparison to be made between designers’ and listeners’ listening experiences.

A simple comparison of the designer’s soundscape map of the pre-existing environment with the listeners’ illustrates where similarities and differences lie. Cross referencing what participants were aware of with all of the recorded sound events, highlights what was being attended to and what was ignored. The classification provides information about what the perceived events sound like, how relevant they are, whether they are pleasing, clear and what, if any, their emotional impact was. This informs the designer what is favorable and what is considered to be neutral or unfavorable.

Expert Evaluation

A questionnaire was sent out to all ten designers with soundscape visualizations of their design. The questionnaire addressed classification, visualization, and applications. Designers were asked to rate how important each attribute used in the soundscape visualizations was in order to compare sound designs with listeners’ experiences. Designers were invited to choose the most appropriate way to display the audio attributes used in the classification. An adapted visual questionnaire approach was used, where each visualization option was pictorially represented, using a check box to indicate choice. This was followed by their level of agreement with the statement that the “soundscape visualization allowed me to compare a sound design with the experience of listeners”. The questionnaire concluded with open-ended questions about what methods they currently employed to evaluate sound designs, how they could use this method, and suggestions for changes.

All ten designers completed the questionnaire; none of the questions were omitted. Seven out of the 12 audio attributes were considered to be either important or very important by 6 of the designers. A further 4 attributes were rated as important or very important by 5 of the designers, and only a single attribute (Interaction) was rated as being either important or very important by fewer than half of the designers. Awareness, Spatial Cues, Type, Dynamics, Content, Clarity and Emotions could be chosen as a reduced set of attributes for future visualizations.

The second part of the questionnaire asked designers for their preferred choice of displaying each audio attribute. Seven out of the 12 attributes had a single method of display chosen by the majority of the sound designers. Two of the methods of visualization were chosen by all ten of the designers: the position on a grid for the Spatial Cues and symbols for the Type. A further 2 visualization methods were chosen by nine out of ten designers: inclusion of object for Awareness and emoticons for Emotions. Opacity for Clarity, border dashes for Interaction and shape for Content were also chosen by more than half of the designers. There was no clear single choice of display for the remaining attributes Material, Temporal, Spectral, Dynamics, and Aesthetics.

A reduced set of 7 attributes has been suggested by the designers (Awareness, Spatial Cues, Type, Dynamics, Content, Clarity, Emotions), along with appropriate methods of display (see Figure 22). All but one of the attributes (Interaction) were considered to be either “important” or “very important” by at least half of the designers, with 7 attributes being selected by the majority. All of the designers agreed that soundscape visualization allowed them to compare a sound design with the experience of listeners.

Table 18: Summary of designers’ and listeners’ application of attributes

Figure 22: Possible reduced set of attributes

Table 19: Attributes of interest

Further Work

Although the ten sound designers who took part in the final study agreed that visualization of the soundscape allowed them to compare a sound design with the experiences of listeners, further work has been identified. Research needs to be conducted on the internal validity of each attribute; these attributes will be tested individually in order to establish the correct scale and be provided with easily understood descriptors. The attributes associated with the physical properties of sound events, such as Spatial, Temporal, Dynamics, and Spectral might be expected to be more consistent across listeners than the more subjective attributes of Interaction, Aesthetics, Emotions, or Content. However, this was not always the case, which can be seen in the significant level of reliability for Content and Emotions in Sonification and Radio Drama.

Of the 7 attributes that are retained, 6 had a single method of visualization chosen by the majority of the designers. There was only a low level of agreement for the visualization of the Dynamics attribute, and varying the dimensions of the shape was chosen by only four out of ten of the designers. Alternative methods of displaying this attribute will be researched and trialed with designers.

Individual visualizations of each of the listeners’ responses vary from being almost identical to disparate, with the majority being similar. The designers’ intentions are not always markedly different from the listeners’ experiences, as in Radio Drama and Sonification, where two thirds of the attributes are rated identically. In contrast, there are obvious differences between designers’ intentions and the listeners’ experiences for both Soundscape Composition and Auditory Display, with only a third of the attributes being rated identically. In order to identify if it is the act of design or the expertise of the designer that introduces the differences in responses, each designer could analyze the sound designs that they did not create themselves, and the results could be compared to the listeners’ experiences.

The low level of agreement for Simulation and Auditory Display might suggest that greater levels of agreement may be achieved if non-experts were to design for non-experts, and experts designed for experts. However, it could also be argued that the approach might be used as part of an iterative design process, where experts become more attuned at designing for non-experts and adapt their designs accordingly. The similarity between the designers’ and listeners’ responses for Radio Drama and Sonification illustrate that responses are not always disparate.

Studies will also be conducted to establish the most appropriate length and level of complexity for listeners to experience. There was not an easily identifiable relationship between length and complexity of a sound design with regards to the level of awareness of sound events. A balance between reliance on listeners’ memories and the level of awareness needs to be made.

A method of highlighting only the differences between a sound designer’s intentions and the listeners’ experiences within a single visualization was suggested by one of the designers. A possible way of showing the differences on a single visualization might be to overlay the listeners’ responses onto the designer’s, omitting all of the attributes that are identical, whilst still retaining the code in order to indicate the sound event’s presence (see Table 19 and Figure 23). In this example, each sound event has been given a different colour in order to aid identification. The sound events are located on the grid according to the designer’s responses, and the arrows indicate the positions according to the listeners. The designer’s response is on the left hand side of each object, with the listeners’ on the right.

An alternative approach might be to convey the results through sound, either through a sonification or a sound design. A sonification that conveys agreement for each attribute could be running concurrently. If there is agreement, then there is no additional sound. If there is a differing response between designer and listeners, then an appropriate tone or alert could be played, according to the attribute. Another solution could be to alter the sound design according to the listeners’ experiences. Attributes that represent physical properties of sound, such as Spatial, Temporal, Spectral, and Dynamics would be relatively easy to alter so that the design that more closely matches the listeners’ experiences could be made available for the designer to experience. Awareness would be the easiest to convey, as it would only require the removal of a sound event. Attributes such as Emotion and Content would be more difficult to convey, however, through analysis of successful matches, it might prove possible to alter the sound design so that the designer can experience the listeners’ perspective. Although the study contributes in general to the design of sound for any application, there are specialist applications which might benefit as well. These applications include those for the visually impaired, or mainly out of the user's view, without a screen, or in high performance environments in which the misinterpretation of audio cues might have serious repercussions.

The aim of this work is to enable sound designers to evaluate their designs’ intended meaning with their listeners. In this paper we have reported on an evaluation of the soundscape visualizations, involving ten designers and 100 listeners. The study provided examples of how the visualizations could be used by designers to compare their intentions for a sound design with the experiences of listeners. The need to visualize sound designs has not been established in this paper, a method has merely been presented, and has not been compared to other forms of comparison. The ten designers who took part in the study all agreed that soundscape visualization is an effective way of evaluating soundscapes for their sound designs. They proposed a number of modifications, the most important being the reduction in the number of attributes needed to describe the salient features of a sound event. Before any assessment about the suitability of visualization for sound design in general is made, it is important to develop the method further and test it in a more systematic manner, as there is little evidence that the approach could describe a general range of sound design practices. Revisiting the work of authors such as Barrass (1996), should help provide descriptors more robustly suited for the design of sonifications and auditory displays in general.

Research continues to refine the visualizations, based on the 7 attributes favored by the designers. The reduction in the number of attributes should reduce the cognitive demand on those interpreting the maps, as the number of symbols has been reduced from 13 to 7. We hope that with a smaller set of attributes and clearer notation for the visualizations we can produce a method for the evaluation of soundscapes that could become a standard part of interaction design. At present, a sound design expert needs to draw the visualizations, but with the more focused set of attributes, tool support, and a database of sound design patterns, we are confident that the benefits of soundscape visualization would be easily available to designers. We are of the opinion that using a visual form to represent soundscapes could make an interaction designer’s job easier and more efficient.

Figure 23: Visualization for the Audio Logos indicating differences between sound designer’s and listeners’ responses

Audio 6: Audio Logos (Echo, 2013)

Table 20: Key for figure 23

References

Alderton, Z. (2011). "Colour, Shape, and Music: The Presence of Thought Forms in Abstract Art." Literature & Aesthetics 21/1: 236-258.

Alexander, Christopher (1979). The Timeless Way of Building. New York: Oxford University Press.

Alves, Valter, and Licinio Roque (2010). A pattern language for sound design in games. Paper presented at the 5th Audio Mostly Conference: A Conference on Interaction with Sound, Pitea.

Alves, Valter, and Licinio Roque (2011). "Guidelines for sound design in computer games." In Mark Grimshaw (ed.), Game sound technology and player interaction: Concepts and developments (pp. 363-383). New York: Information Science Reference.

Amphoux, Pascal (1997). L'identité sonore des villes Europeennes: Guide methodologique. Grenoble: Cresson/IREC.

Anderson, John Robert (1974). "Retrieval of propositional information from long-term memory." Cognitive psychology 6/4: 451-474.

Azar, Jimmy, Hassan Abou Saleh and Mohamad Adnan Al-Alaoui (2007). "Sound Visualization for the Hearing Impaired." International Journal of Emerging Technologies in Learning (iJET) 2/1: 1-7.

Barrass, Stephen (1996). EarBenders: using stories about listening to design auditory interfaces. Paper presented at the First Asia-Pacific Conference on Human Computer Interaction APCHI, Singapore.

Barrass, Stephen (2003). Sonification design patterns. Paper presented at the 2003 International Conference on Auditory Display, Boston.

Barrass, Stephen (2005). "A comprehensive framework for auditory display: Comments on Barrass, ICAD 1994." ACM Transactions on Applied Perception (TAP) 2/4: 403-406.

Batchelor, Peter (2013). "Lowercase Strategies in Public Sound Art: celebrating the transient audience." Organised Sound 18/01: 14-21.

Beaman, Jim (2006). Programme Making for Radio. London: Routledge.

Beauchamp, Robin (2005). Designing Sound for Animation. Oxford: Focal Press.

Bech, Soren (1992). "Selection and Training of Subjects for Listening Tests on Sound-Reproducing Equipment." Journal of the Audio Engineering Society 40/7-8: 590 - 610.

Bech, Soren, and Nick Zacharov (2006). Perceptual Audio Evaluation - Theory, Method and Application. Chichester: Wiley.

Beck, Jay, and Tony Grajeda (eds.) (2008). Lowering the boom: critical studies in film sound. Champaign, IL: University of Illinois Press.

Bertet, Stephanie, Jerome Daniel and Sebastien Moreau (2006). 3D sound field recording with higher order ambisonics-objective measurements and validation of spherical microphone. Paper presented at the Audio Engineering Society Convention 120, Paris.

Bertin, Jacques (1983). Semiology of graphics: Diagrams, networks, maps (trans. William J. Berg). Madison, WI: University of Wisconsin Press.

Bharucha, Jamshed J., Meagan Curtis and Kaivon Paroo (2006). "Varieties of musical experience." Cognition 100/1: 131-172.

Blattner, Meera M., Denise A. Sumikawa and Robert M. Greenberg (1989). "Earcons and Icons: Their Structure and Common Design Principles." Human-Computer Interaction 4/1: 11-44.

Blesser, Barry and Linda-Ruth Salter (2007). Spaces speak, are you listening? Experiencing aural architecture. London: MIT Press.

Bly, Sara (1982). "Presenting information in sound." In Jean A. Nichols and Michael L. Schneider (eds.), Proceedings of CHI 1982, Conference on Human Factors in Computing Systems (371-375). Gaithersburg, MD: ACM.

Brazil, Eoin (2010). "A review of methods and frameworks for sonic interaction design: Exploring existing approaches." In Solvi Ystad, Mitsuko Aramaki, Richard Kronland-Martinet and Kristoffer Jensen (eds.) Auditory Display: 6^th International Symposium, CMMR/ICAD 2009, Copenhagen, Denmark, May 18-22, 2009. Revised Papers (pp. 41-67). Berlin: Springer.

Brewster, Stephen Anthony (1994). "Providing a Structured Method for Integrating Non-Speech Audio into Human-Computer Interfaces." (Doctoral dissertation). York: University of York.

Brewster, Stephen Anthony (2008). "Nonspeech auditory output." In Andrew Sears and Julie A. Jacko (eds.), The Human Computer Interaction Handbook (pp. 247-264). New York: Taylor and Francis Group.

Brougher, Kerry, Jeremy Strick, Ari Wiseman, and Judith Zilczer (2005). Visual music: synaesthesia in art and music since 1900. New York: Thames and Hudson.

Brown, A. Lex, Jiang Kang and Truis Gjestland (2011). "Towards standardization in soundscape preference assessment." Applied Acoustics 72/6: 387-392.

Buxton, William Arthur Stewart (1989). "Introduction to this special issue on nonspeech audio." Human-Computer Interaction 4/1: 1-9.

Cain, Rebecca, Paul Jennings and John Poxon (2013). "The development and application of the emotional dimensions of a soundscape." Applied Acoustics, 74/2: 232-239.

Cattell, Judith, Josephine Glascock and M. F. Washburn (1918). "Experiments on a Possible Test of Aesthetic Judgment of Pictures." The American Journal of Psychology, 29/3: 333-336.

Chion, Michel (1994). Audio-Vision: Sound on Screen (trans. Claudia Gorbman). New York: Columbia University Press.

Coleman, Graeme W. (2008). "The Sonic Mapping Tool." (Doctoral dissertation). Dundee: University of Dundee.

Collins, Karen (2008). Game Sound: An Introduction to the History, Theory, and Practice of Video Game Music and Sound Design. Cambridge, MA: MIT Press.

Collins, Karen (2013). Playing with Sound: A Theory of Interacting with Sound and Music in Video Games. Cambridge, MA: MIT Press.

Davies, William J. (2013). "Special issue: Applied soundscapes." Applied Acoustics 74/2: 223.

Davies, William J., Mags D. Adams, Neil S. Bruce, Rebecca Cain, Angus Carlyle, Peter Cusack, Deborah A. Hall, Ken I. Hume, Amy Irwin, Paul Jennings, Melissa Marselle, Christopher J. Plack and John Poxon (2013). "Perception of soundscapes: An interdisciplinary approach." Applied Acoustics 74/2: 224-231.

Dombois, Forian and Gerhard Eckel (2011). "Audification." In Thomas Hermann, Andy Hunt and John G. Neuhoff (eds.), The Sonification Handbook (pp. 301-324). Berlin: Logos Publishing House.

Echo. (2013). Echo Sound Branding

Ekman, I., & Rinott, M. (2010). Using vocal sketching for designing sonic interactions. Paper presented at the 8th ACM Conference on Designing Interactive Systems, Aarhus.

Engelen, Heleen (1998). "Sounds in Consumer Products." In Henrik Karlsson (ed.), From Awareness to Action: Proceedings from "Stockholm, Hey Listen!": Conference on Acoustic Ecology (pp. 65-66). Stockholm: The Royal Swedish Academy of Music.

Farnell, Andy (2011). "Behaviour, Structure and Causality in Procedural Audio." In Mark Grimshaw (ed.), Game Sound Technology and Player Interaction: Concepts and Development (pp. 313-339). New York: Information Science Reference.

Finney, Nathaniel and Jordi Janer (2010). Soundscape generation for virtual environments using community-provided audio databases. Paper presented at the W3C Workshop - Augmented Reality on the Web, Barcelona.

Frank, Matthias, Alois Sontacchi and Robert Höldrich (2010). Training and guidance tool for listening panels. Paper presented at Fortschritte der Akustik, DAGA, Berlin.

Frauenberger, Christopher and Tony Stockman (2009). "Auditory display design: An investigation of a design pattern approach." International Journal of Human-Computer Studies 67/11: 907-922.

Frecon, Emmanuel, Olov Stahl, Jonas Soderberg and Anders Wallberg (2004). Visualizing Sound Perception in a Submarine: A Museum Installation. Paper presented at the Eighth IEEE International Symposium on Distributed Simulation and Real-Time Applications, Budapest.

Friberg, Anders (2004). A fuzzy analyzer of emotional expression in music performance and body motion. Paper presented at Music and Music Science, Stockholm.

Frysinger, Steven P. (1990). Applied research in auditory data representation. Paper presented at the Electronic Imaging '90, Santa Clara.

Gabrielsson, Alf and Hakan Sjögren (1979). "Perceived sound quality of sound-reproducing systems." Journal of the Acoustical Society of America 65/4: 1019-1033.

Gaver, William W. (1986). "Auditory Icons: Using Sound in Computer Interfaces." Human-Computer Interaction 2/2: 167-177.

Gaver, William W. (1989). "The sonic finder: An interface that uses auditory icons. the use of non-speech audio at the interface." Human-Computer Interaction 4/1: 67-94.

Gibson, David (2005). The Art of Mixing: A Visual Guide to Recording Engineering and Production. Boston: Artist Pro Publishing.

Giordano, Bruno, Patrik Susini and Roberto Bresin (2013). "Perceptual Evaluation of Sound-Producing Objects." In Karmen Franinovi and Stefania Serafin (eds.), Sonic Interaction Design (pp. 151-198). Cambridge, MA: MIT Press.

Granö, Johannes Gabbriel (1997). Pure Geography (trans. Malcolm Hicks). Baltimore, MD: The Johns Hopkins University Press.

Handel, Stephen (1989). Listening: An Introduction to the Perception of Auditory Events. Cambridge, MA: MIT Press.

Hellström, Bjorn (1998). "The Voice of Place: A Case-study of the Soundscape of the City Quarter of Klara, Stockholm." In Raymond Murray Schafer and Helmi Jarviluoma (eds.), Yearbook of Soundscape Studies “Northern Soundscapes”, Vol. 1 (pp. 25-42). Tampere: University of Tampere.

Hellström, Bjorn, Per Sjösten, Anders Hultqvist, Catharina Dyrssen and Staffan Mossenmark (2011). "Modelling the shopping soundscape." Journal of Sonic Studies 1.

Helyer, Nigel, Daniel Woo and Francesca Veronesi (2009). "The Sonic Nomadic: Exploring Mobile Surround-Sound Interactions." IEEE Multimedia 16/2: 12-15.

Hermann, Thomas (2008). Taxonomy and Definitions for Sonification and Auditory Display. Paper presented at International Conference on Auditory Display, Paris.

Holman, Tomlinson (2000). 5.1 Surround Sound Up and Running. Oxford: Focal Press.

Hume, Ken and Mujthaba Ahtamad (2013). "Physiological responses to and subjective estimates of soundscape elements." Applied Acoustics 74/2: 275-281.

Ignatius, Eve and Hikmet Senay, H. (1999). "Data, vocabulary, marks, composition rules and visual perception rules."

Kaufman, James C., John Baer, Jason C. Cole and Janel D. Sexton (2008). "A comparison of expert and nonexpert raters using the consensual assessment technique." Creativity Research Journal 20/2: 171-178.

Kaye, Deena C. and James Lebrecht (2000). Sound and Music for the Theatre: The Art and Technique of Design. Oxford: Focal Press.

Kerins, Mark (2010). Beyond Dolby: Cinema in the Digital Sound Age. Bloomington, IN: Indiana University Press.

Kramer, Gregory (1994). "An Introduction to Auditory Display." In Gregory Kramer (ed.), Auditory Display: Sonification, Audification, and Auditory Interfaces (pp. 1-77). Reading, MA: Addison-Wesley.

Kramer, Gregory, Bruce Walker, Terri Bonebright, Perry Cook, John Flowers, Nadine Miner, John Neuhoff, Robin Bargar, Stephen Barrass, Jonathan Berger, Grigori Evreinov, W. Tecumseh Fitch, Matti Grohn, Steve Handel, Hans Kaper, Haim Levkowitz, Suresh Lodha, Barbara Shinn-Cunningham, Mary Simoni and Sever Tipei (1999). "Sonification Report: Status of the Field and Research Agenda: ICAD."

LaBelle, Brandon (2006). Background noise: perspectives on sound art. New York: Continuum.

Levin, Golan and Zachary Lieberman (2004). "In-situ speech visualization in real-time interactive installation and performance." Paper presented at the 3rd international symposium on non-photorealistic animation and rendering, Annecy.

LoBrutto, Vincent (1994). Sound-on-film: Interviews with Creators of Film Sound. Westport, CT: Praeger.

MacDonald, Doon, & Stockman, Tony (2013). Toward a method and toolkit for the design of auditory displays, based on soundtrack composition. Paper presented at the CHI '13 Human Factors in Computing Systems, Paris.

Madell, Jane R. and Carol Flexer (2008). Pediatric Audiology: Diagnosis, Technology and Management. New York: Thieme.

Marie, Celine, Teija Kujala and Mireille Besson (2012). "Musical and linguistic expertise influence pre-attentive and attentive processing of non-speech sounds." Cortex 48/4: 447-457.

Mason, Russell (2002). "Elicitation and measurement of auditory spatial attributes in reproduced sound" (Doctoral dissertation). Guildford: Surrey University.

Mathur, Pooja (2009). "Visualizing Remote Voice Conversations: Uses from Artifacts to Archival." (Master's thesis). Urbana, IL: University of Illinois.

Matthews, Tara, Janette Fong and Jenifer Mankoff (2005). "Visualizing Non-Speech Sounds for the Deaf." Proceedings of the 7th international ACM SIGACCESS conference on Computers and accessibility (52-59). Baltimore, MD: ACM.

McGookin, David K. and Stephen. Anthony Brewster, (2004). "Understanding concurrent earcons: applying auditory scene analysis principles to concurrent earcon recognition." ACM Transactions on Applied Perception 1/2: 130-155.

McGregor, Iain, Alison Crerar, David Benyon and Gregory Leplatre (2007). Establishing Key Dimensions for Reifying Soundfields and Soundcapes from Auditory Professionals. Paper presented at ICAD 2007: Immersed in Organized Sound, Montreal.

McGregor, Iain, Alison Crerar, David Benyon and Gregory Leplatre (2008). Visualizing the Soundfield and Soundscape: Extending Macaulay and Crerar's 1998 Method. Paper presented at ICAD 2008. Paris.

McGregor, Iain, Gregory Leplatre, Alison Crerar and David Benyon (2006). Sound and Soundscape Classification: Establishing Key Auditory Dimensions and their Relative Importance. Paper presented at ICAD 2006, London.

McGregor, Iain, Gregory Leplatre, Phil Turner and Tom Flint (2010). Soundscape Mapping: a tool for evaluating sounds and auditory environments. Paper presented at ICAD-2010: Soinic Discourse - Expression through Sound, Washington D.C..

McGregor, Iain and Phil Turner (2012). Soundscapes and Repertory Grids: Comparing Listeners’ and a Designer’s Experiences. Paper presented at the European Conference on Cognitive Ergonomics (ECCE 2012), Edinburgh.

Monmonier, Mark (1993). Mapping it out: Expository Cartography for the Humanities and Social Sciences. Chicago: University of Chicago Press.

Murphy, David and Flaithri Neff (2011). "Spatial sound for computer games and virtual reality." In Mark Grimshaw (ed.), Game Sound Technology and Player Interaction: Concepts and Developments. Hershey PA: Information Science Reference. (pp. 287-312). New York: Information Science Reference.

Mynatt, Elizabeth D. (1994). Designing with auditory icons: how well do we identify auditory cues? Paper presented at the conference companion on human factors in computing systems, Boston.

Nacke, Lennart E., Mark N. Grimshaw and Craig A. Lindley (2010). "More than a feeling: Measurement of sonic user experience and psychophysiology in a first-person shooter game." Interacting with Computers 22/5: 336-343.

Newman, Fred (2004). MouthSounds: How to Whistle, Pop, Boing, and Honk for All Occassions … and Then Some. New York: Workman Publishing Company.

Newman, Rich (2009). Cinematic game secrets for creative directors and producers: inspired techniques from industry legends. Oxford: Focal Press.

Nordahl, Rolf (2010). "Evaluating environmental sounds from a presence perspective for virtual reality applications." EURASIP Journal on Audio, Speech, and Music Processing 2010.

Ohlson, Birger (1976). "Sound fields and sonic landscapes in rural environments." Fennia 148: 33-45.

Paul, Leonard J. (2010). Procedural sound design. Paper presented at GameSoundCon, San Francisco.

Paul, Phyllis M. (2009). "Aesthetic Experiences With Music: Musicians Versus Children." Update: Applications of Research in Music Education 27/2: 38-43.

Polanyi, Michael (1951). The Logic of Liberty. Chicago, IL: University of Chicago Press.

Porteous, J. Douglas and Jane F. Mastin (1985). "Soundscape." Journal of Architectural Planning Research 2/3: 169-186.

Radojevic, Mirjana Devetakovic and Raewyn Turner (2002). Spatial Forms Generated by Music – The Case Study. Paper presented at the GA 2002 Generative Art and Design Conference, Milan.

Raman, T. V. (2012). Auditory User Interfaces: Toward the Speaking Computer. New York: Springer Publishing Company.

Rodaway, Paul (1994). Sensuous Geographies: Body, Sense and Place. London: Routledge.

Rumsey, Francis (1998). Subjective Assessment of the Spatial Attributes of Reproduced Sound. Paper presented at the AES 15th International Conference, Copenhagen.

Russolo, Luigi (1967). The Art of Noise: Futurist Manifesto (trans. Francesco Balilla Pratella). New York: Something Else Press.

Schafer, Raymond Murray (1977). The Tuning of the World. Toronto: McClelland and Stewart Limited.

Schafer, Raymond Murray (1993). Voices of Tyranny: Temples of Silence. Indian River: Arcana Editions.

Schaffert, Nina, Klaus Mattes and Alfred Effenberg (2010). "A Sound Design for Acoustic Feedback in Elite Sports." Auditory Display 5954: 143-165.

Schirosa, Mattia, Jordi Janer, Stefan Kersten and Gerard Roma (2010). A system for soundscape generation, composition and streaming. Paper presented at the XVII CIM - Colloquium of Musical Informatics, Turin.

Serafin, Stefania and Giovanni Serafin (2004). Sound design to enhance presence in photorealistic virtual reality. Paper presented at the ICAD 2004, Sydney.

Servigne, Sylvie, Myoung-Ah Kang and Robert Laurini (2000). GIS for Urban Soundscape: From Static Maps to Animated Cartography. Paper presented at the 2nd International Conference on Decision Making in Urban and Civil Engineering, Lyon.

Servigne, Sylvie, Robert Laurini, Myoung-Ah Kang and Ki-Joune Li (1999). First Specifications of an Information System for Urban Soundscape. Paper presented at IEEE International Conference on Multimedia Computing and Systems, Florence.

Soderholm, Monica (1998). "Listening Test as a Tool in Sound Quality Work: Applied to Vacuum Cleaners." The Marcus Wallenberg Laboratory for Sound and Vibration Research, Department of Vehicle Engineering. Stockholm: Royal Institute of Technology.

Sonnenschein, David (2001). Sound Design: The Expressive Power of Music, Voice and Sound Effects in Cinema. Studio City, CA: Michael Wise Productions.

Southworth, Michael (1969). "The Sonic Environment of Cities." Environment and Behaviour 1/1: 49-70.

Szendy, Peter (2008). Listen: a History of our Ears (trans. Charlotte Mandell). New York: Fordham University Press.

Tahiroğlu, K., & Ahmaniemi, T. (2010a). The Effect of Haptic Feedback in Vocal Sketching Experiments with a Graspable Interface. Paper presented at the 5th International Haptic and Audio Interaction Design Workshop, Copenhagen.

Tahiroğlu, Koray and Teemu Ahmaniemi (2010b). Vocal sketching: a prototype tool for designing multimodal interaction. Paper presented at the International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, Beijing.

Tardieu, Julien, Patrick Susini, Franck Poisson, Hiroshi Kawakami and Stephen & McAdams (2009). "The design and evaluation of an auditory way-finding system in a train station." Applied Acoustics 70/9: 1183-1193.

Thalmann, Florian and Guerino Mazzola (2008). The BigBang Rubette: Gestural Music Composition with Rubato Composer. Paper presented at the IMCMC 2008 International Computer Music Conference, Belfast.

Torehammar, Claand Bjorn Hellström (2012). Nine sound-art installations in public space. Paper presented at INTER-NOISE, New York.

Touzeau, Jeff (2008). Careers in Audio. Boston, MA: Course Technology.

Truax, Barry (2001). Acoustic Communication. Norwood: Ablex Publishing Corporation.

Valle, Andrea, Vincenzo Lombardo and Mattia Schirosa (2009). A Graph-based System for the Dynamic Generation of Soundscapes. Paper presented at the 15th International Conference on Auditory Display, Copenhagen.

Vickers, Paul (1999). "CAITLIN: Implementation of a Musical Program Auralisation System to Study the Effects on Debugging Tasks as Performed by Novice Pascal Programmers" (Doctoral dissertation). Leicestershire: Loughborough University.

Walker, Bruce N. and Michael A. Nees (2011). "Theory of Sonification." In Thomas Hermann, Andy Hunt and John G. Neuhoff (eds.), The Sonification Handbook (pp. 9-39). Berlin: Logos.

Whittington, William (2007). Sound Design and Science Fiction. Austin, TX: University of Texas Press.

Yang, William Jian Kang (2005). "Acoustic comfort evaluation in urban open public spaces." Applied Acoustics 66/2: 211-229.