Informative Sound Assists Timing in a Simple Visual Decision-Making Task


Keith Nesbitt, Paul Williams, Patrick Ng, Karen Blackmore, Ami Eidels

1.   Introduction


The aim of this study is to understand the informative use of sound in a simple decision-making task. In general, we are motivated to better understand interaction in computer games where a player’s fast decision-making is often critical to performance. The amount of time taken to make even simple decisions can be affected by the amount of immediate information the interface provides the player as they progress through the game. Assuming that extra information allows players to perform better at such game tasks, this paper investigates the way sound can be used to provide multi-modal support, in terms of timing, for a simple visual decision-making task.


In general, the “informative” use of sound can provide additional feedback related to the players’ own actions as well as key events and states of a game world. That is, the player can gather information about the game environment by relying upon auditory as well as visual cues. This might be advantageous in situations where visual cues are unhelpful because the eyes are already engaged in processing other signals. Alternatively, auditory displays may provide a more optimal modality for information transfer when temporal cues are required. 


In terms of computer games, the role of sound has evolved since the classic laser sound effects and monotone background music used in classic games such as Space Invaders (Space Invaders 1978). Indeed, sound has become an integral part of the experience provided by modern computer games. Historically, sounds for computer games have been designed much like sound for motion pictures, as an adjunct to the visual experience. In films, music is used to establish the mood of the scene as well as to evoke tension and emotional responses from the audience (Ekman 2005). Sound effects in film tend to enhance the realism of the scene with the intention of creating greater levels of immersion for the viewer. 


Rightly or wrongly, computer game designers have tended to focus on visual perceptual cues when designing the game levels (El-Nasr and Yan 2006). Therefore, as with sound in films, most of the time auditory effects were added to enhance the visual experience (Gärdenfors 2003). In accordance with this approach, much research on sound display in video games has focused on how sound enhances players’ experience and immersion (Gärdenfors 2003; Grimshaw, Lindley and Nacke 2008; Parker and Heerema 2008; Röber and Masuch 2005; Wolfson and Case 2000).


Fortunately, while sound is often designed as an adjunct to visual experiences in many computer games, there are also an increasing number of games that exploit sound in more informative ways. These include Papa Sangre (Something Else 2013), a horror-themed audio game for the iOS mobile platform that uses sound effects to guide the player in the dark environment. Likewise, the recent Thief series (Square Enix 2014) integrates sound into the gameplay and uses it as the primary feedback for navigation. Informative sound is also present in some online multiplayer games such as World of Warcraft (Blizzard Entertainment 2012), where the sounds provide a more general informative function that supports player orientation and the identification of key situations and states (Jørgensen 2006). Even in early games like Space Invaders (Space Invaders 1978), the tempo of the simple sounds would increase as the alien craft increased in speed and moved closer to the players ship. This served to heighten tension but also to provide feedback about the state of the gameplay.


One unfortunate consequence of using sound solely for visual enhancement is that the design of more informative sound can be overlooked. For some user groups, such as the visually impaired, this excludes them from being able to play the game (Valente and Souza 2008). A further consequence is that the full potential of using sound to convey useful messages is not always exploited in games. This situation is still extant despite many studies within the field of auditory display (Barrass 2003; Blattner, Sumikawa and Greenberg 1989; Brewster, Wright and Edwards 1993; Gärdenfors 2003; Gaver 1986; Gaver 1989; Nesbitt and Barrass 2004; Tan, Baxa and Spackman 2010; Walker and Kramer 2006) that provide evidence for the value of auditory feedback. It is clear that in many situations, well-designed sounds can provide important, additional feedback for computer users (Adcock and Barrass 2004; Brewster and Walker 2000; Jørgensen 2006; Kramer 1994; Ramos and Folmer 2011; Röber and Masuch 2005; Walker and Kramer 2006).


Many approaches exist that give insight into the design of sound displays. These include a case-based, metaphorical approach for aligning the informative function of sounds to listening encounters from the real world (Barrass 1996) as well as a structured multi-sensory taxonomy with guidelines that consider all modalities for the display (Nesbitt 2004). Figures 1-3 show an alternative framework using spatial, direct, and temporal qualities of perception rather than traditional sensory-specific properties as a basis for multimodal design. Another design approach relies on the use of Auditory Design Patterns (Alves and Roque 2010; Barrass 2003; Ng and Nesbitt 2013). This approach has been considered for both general auditory display (Adcock and Barrass 2003; Barrass 2003) and specifically for use in video games (Alves and Roque 2010; Ng and Nesbitt 2013).

Figure 1: Spatial Metaphors categorize the space-based properties to be considered for multimodal display. These are applicable for all sensory modalities.

Figure 2: Direct Metaphors categorize the sensory-specific properties to be considered for multimodal display.

Figure 3: Temporal Metaphors categorize the time-based properties to be considered for multimodal display. These are applicable for all sensory modalities.

Probably the two most well-known techniques described for displaying information through sound are Auditory Icons (Gaver 1986) and Earcons (Blattner, Sumikawa and Greenberg 1989). The technique known as Auditory Icons was first investigated as a means of extending the use of visual interface icons to the auditory dimension (Gaver 1986). Based on “everyday listening” skills, this approach maps information to recognizable sounds from the real world. By using a recognizable sound, the user can intuitively understand the current action or event suggested by the sound. For example, hitting a tin with a stick is an event that generates a sound. The sound itself conveys information about the material and size of the tin and whether it is full or hollow. The sound also conveys information about the materials involved, the frequency of hitting, and the force of the hitting. This is information we naturally learn to interpret from our everyday experiences. 


There are many instances of natural-like sounds used to augment computer interfaces. For example, the SonicFinder integrated running and pouring sounds in the Macintosh interface to represent file manipulations on the desktop (Gaver 1989). The SharedARK application, a virtual physics laboratory for distance education, included sounds such as hums that are mapped to the state of a simulated system (Gaver, Smith and O’Shea 1991). The ARKola bottling system mapped sounds to equipment in a soft drink factory and introduced audio cues for monitoring the bottling process (Gaver, Smith and O’Shea 1991).

The second most common method of designing informational sound is through the use of Earcons (Blattner, Sumikawa and Greenberg 1989). Earcons are abstract, synthetic tones that are structured to create auditory messages. This approach relies on “musical listening” skills as it conveys information using the musical properties of sound, such as rhythm, timbre, and pitch. This can be contrasted with Auditory Icons that use everyday listening skills rather than acquired musical expertise. 


Studies on the effectiveness of Earcons for conveying information have been conducted since the 1990s. The effectiveness of Earcons was experimentally tested in the role of providing navigational cues within a structured menu hierarchy (Brewster, Wright and Edwards 1993). The study found that 81.5% of participants successfully identified their position in the hierarchy, indicating that Earcons can be a powerful tool for conveying structured information. The use of Earcons to map common operating-system functions in a graphical interface was also evaluated (Polotti and Lemaitre 2013). In this study it was found that subjects benefited from additional sound feedback when performing key tasks such as cutting and pasting (Polotti and Lemaitre 2013).


Compared with Auditory Icons, Earcons have the advantage of being able to convey complex information about events to the user without any natural associations with a sound source. On the downside, Earcons require prior understanding of the mapping between the sound and the event before the information can be recognized. By contrast, Auditory Icons are considered to be more intuitive, as they capitalize on the existing listening skills of users.


Currently, only a limited amount of work relating Auditory Icons and Earcons to computer games has been published. However, it has been noted that both these approaches can play a role in terms of enhancing control functions for the player by extending the player’s range of perception during the game (Jørgensen 2006). There are also some taxonomies of sound usage described in the context of games (Friberg 2004; Grimshaw and Schott 2008; Stockburger 2003). The use of Earcons and Auditory Icons and their relationship to player performance in Defense of the Ancients 2 (Valve Corporation 2013), a popular multiplayer online battle arena game, have also been analyzed (Ng, Nesbitt and Blackmore 2015). However, the informative use of sound has a much longer history of study in domains outside of games (Kramer 1994), with applications being reported in diverse domains, ranging from file management (Gaver 1989) to hospital operating rooms and vehicle safety systems (Graham 1999; Patterson 1982; Stanton and Edworthy 1999). Making optimal use of these informative approaches to using sound would potentially allow more critical information to be integrated into game interfaces. The intent would be to improve traditional usability criteria such as effectiveness, utility and efficiency (Gaver 1986; Gaver 1989; Nesbitt and Hoskens 2008; Ng and Nesbitt 2013; Smith and Walker 2005) without impacting on the immersive experience that games strive for. 


Interestingly, the effects of additional sound information, in addition to existing visual information, are not always beneficial. The auditory Stroop effect (Morgan and Brandt 1989) demonstrates how performance can deteriorate when the visual and auditory information are in conflict. Moreover, given the, ultimately, limited capacity of the brain to process information (Kahneman 1973; Townsend and Eidels 2011), additional sources of information, though relevant to the task at hand, may overload the system and impair performance. Thus, the potential benefit of adding auditory information to visual displays is not clear-cut and requires careful empirical scrutiny. This is precisely the aim of the current study: to evaluate user performance in a multimodal decision-making task.

                1. Introduction                                                                                                5. Discussion

                2. A Simple Decision-Making Task                                                                6. Conclusion

                3. Method                                                                                                      Biographies

                4. Results                                                                                                      References

                1. Introduction                                                                                                5. Discussion

                2. A Simple Decision-Making Task                                                                6. Conclusion

                3. Method                                                                                                      Biographies

                4. Results                                                                                                      References