The Sonic Lifeworld:
A Phenomenological Exploration of the Imaginative Potential of Animation Sound
Animation is unique among audiovisual arts in that its stories are created entirely from the imagination. Other audiovisual forms such as film and television, by contrast, always have physical substances, specific worlds, and corporeality as their base components. No matter how much the ideas exist in the mind, the external world, and all its inhabitants, must eventually be shot on location or in a studio. Animation begins without physicality in its creative process.(1) Storytellers may reference reality, but aside from rotoscoping techniques, there are no actual settings, no sound stages or green-screen sets, and no actors. Its stories often take place in completely imagined worlds. The raw materials of a film or TV show recorded on production are captured within actual spaces; the raw materials of animation come from the mind. As such, animation is liberated from the physical and logistical restrictions of developing and reproducing physical objects and events in space. The potential is there to produce the physically impossible and the otherworldly. These imagined settings, characters and plots comprise the backdrop for new stories or re-imaginings of existing fables and socio-cultural themes.
When it comes to developing these ideas, great care is given to the “visualization” of these new worlds. When we think of “imagination,” we think of images, and about how we can create this new world in a visual manner. Tremendous energy and creativity is poured into storyboards and animatics, backgrounds and character design. But there is a critical missing element in this process. Worlds, whether real or imagined, are not only of sight, but also of sound. Animation practice too often regards sound as an outcome of what is visibly evident, rather than embracing the unique modes of presentation that the audible produces from these newly invented environments, contexts, cultures and psychological states of being. In this essay I hope to present a means of injecting the sonic into the creative development of animated worlds and the stories that inhabit them. What I propose is a way of conceiving sound that is a shift from the consideration of image reproductions—inherent in motion photography—and toward phenomenological disclosures. This latter approach, I will argue, is more akin to the world-building that is the strength of the animated form. In the process, I will propose the idea that sound is less about adherence to existing objects, and is better conceived ontologically—as a process of producing and reflecting a sense of being through characters interacting within and with a lifeworld.
It is important to begin with the state of how sonic worlds are formed across all audiovisual content. There is a Platonic notion in storytelling that is often cited in examinations of audiovisual sound theory—the diegesis, or the diegetic world. As Mary Ann Doane has noted, this is the space of the film world; in terms of sound it is any sound that emanates from the story space in which events occur (Doane 1985). Claudia Gorbman reaches back to 1950s French scholars Gerard Genette and Etienne Souriau and comes to define diegesis as “the narratively implied spatiotemporal world of the actions and characters” (Gorbman 1987). All four of these writers use the term “space” (Doane) or “world” (the latter three) to define the diegesis. This marks a modification in the term over time from its Greek etymology.(2) Nevertheless, when one uses the term “diegetic sound” today, the intention is to help us consider sound that inhabits the created world as physically present within the space of events. Anything outside of the interior space of the film world is called nondiegetic. The most common example of nondiegetic sound is a musical score, which the characters do not hear and is not physically present in the space of events. It is therefore not grounded in filmic reality.
The foundation of recorded audiovisual media (film, video) is photographic realism, which establishes the audience’s view of this diegetic framework. Sound is generally regarded as a means of reinforcing this photographic realism. “[S]ound is used to to make the image ‘credible’ within a very narrow definition of ‘realism’” (Wayne 1997: 176). Animated imagery, which is not recorded photographically, is different in that it has no connection to any pre-existing, “real” location. However, the audio methodology has transferred from film to animation using the exact same codes of recorded photography. Sounds for animation tend to come from recordings of real-world objects. This applies well to film where there is a match between what is photographically seen and heard. But in animation, applying such anachronistic sounds to imaginary visuals produces what might be called nondiegetic realism—a seeming contradiction in language that reflects the contradiction of applying sounds of modernity and nature to narratives of fantasy. Because animation has no filmed reality to reinforce, this approach takes us out of the animated world by its adherence to a realism which arises from a place outside the animated world, rather than inside it.
There are several reasons for this tendency, going back to the histories of film and of animation:
- Historical: Film sound from its beginning played two roles: a) to attract attention to the cinematic apparatus and b) to legitimize the image (Lastra 2000). Sound as a craft was also regarded as technical rather than artistic. Animation sound relied heavily upon music for both musical and sound effects moments (Curtis 1992).
- Technological: Location sound (or “production sound”) has always been a process of isolation—removing the location’s noise and ambience to preserve the primary signal, namely the voice.
- Methodological: While production sound is restrictive (see No. 2), post-production sound is primarily additive, in the sense that sonic elements are re-built in the studio. Foley (character movement sounds), ADR (dialogue replacement), and sound effects recording and design are created in layers to produce the expectation of audiovisual verisimilitude and rational continuity.
Let us consider the importance of all three sound tendencies by using a simple example: footsteps. If we see someone walking in a (non-animated) film, we expect to hear that sound. Hearing these footsteps satisfies tendency No. 1: to legitimize the image. However, during film production, microphones are pointed at an actor’s voice if he is speaking dialogue in order to capture the clean sound of the voice. This is a reflection of tendency No. 2. As a result of this restriction, Foley performers inside a post-production recording studio will walk in sync to the visual image to produce the desired sound, thereby producing the expectations of tendency No. 3.
This film approach is mirrored in the animated form, except for the fact that animation has no “location sound.” Because there are no actors in existing settings, animation sound is produced from the imagination and applied to invented visuals. However, despite being an entirely different mode of production, animation takes its cues from film in its construction of filmic expectation. Imagine for example a scene similar to the above in an animated film in which a character is walking. Animation students and young animators tend to focus on this element and want to immediately produce sound for this action. This is because it is a synchronous moment that is clearly, visibly evident. There is a strong need to resolve this objective sense of rational closure. It does not matter so much what the sound is—for example, any kind of non-varying “tick tick tick” sound will suffice. The drive to fill the silence with something, anything, is strong. It is a matter of fulfilling the perceptual expectation. Professional-level animation projects will resolve the expectation through the film-based post-production methodology described in No. 3 above.
We’ve seen the historical, technological and methodological reasons for certain tendencies in animation sound. Practice follows a fixed model for fulfilling expectation, one based on and applied from photographic forms. What I wish to propose is that animation practice integrate a more creative model based on ontological concerns rather than physical ones. This incorporates aspects of consciousness, intention, and disclosure, ideas arising from the philosophical tradition of phenomenology. Rather than using sound to connect the audience to objective and visibly present imagery, the aim here is to use sound to connect setting to character, and thereby connect character to the audience. There are two methodological notes to mention before continuing:
- Sound for audiovisual forms is comprised of three elements: dialogue, music and sound effects. This investigation concentrates on sound effects—the nonmusical, non-spoken sounds of the world in which a character exists.
- This essay considers sound effects as connected less to visual objects, and more to elements of narrative structure—character, setting and plot, particularly the first two and the relationship between them.
It is impossible to encompass the philosophical tradition of phenomenology that I wish to consider here in a comprehensive fashion. But I will offer a very brief trajectory of some of its tenets which are relevant to this exploration. The Cartesian epistemological investigation into the subjectivity/objectivity or mind/body problem was to regard matter as the world of extension, plus a mind that thinks it. The world is a world of things out there which I perceive. In order for there to be a world that makes sense to us, it must maintain its consistency in terms of the contents that reside within the extended world. The point, for the purposes of this essay, is that Cartesian thinking implies that there is a rational mind and outside it is a world of external objects. The phenomenological approach, on the other hand, is more inclined to account for a subject who inhabits a world, and the manner in which perception of the outside comes to consciousness. There is, in any object or event, a particular manifestation of disclosure; it is not the entirety of the thing that we come to perceive, nor even its qualities, but rather those aspects of the thing intended by the subject that gives him or her a sense of the thing in that moment of experience. In this way, we the audience can perceive sound not only through its objective nature, but in how it is disclosed to a character, through his or her particular mode of being in that moment of disclosure.
A Cartesian view of sound design is closely tied to how people typically practice the art: The world is a world of things; therefore one must record, layer and design sound toward the completion of the objective convention of thingness. It presents sound as purely physical in its knowability. As a related component of Cartesian thinking, the sonic world must establish a permanence of consistency and expectation. If someone walks down the street, one must hear footsteps that provide a sense of psychological closure on that image in a way that satisfies the physical laws of the diegetic world. This is the verisimilitude of the film world, which arises “not from truth but from convention” (Chion 1994). Here we find a limitation in the adherence to the diegetic/nondiegetic distinction. These terms tend to relate to an object’s or event’s physical presence or lack thereof. While it accounts for a hearing character (his or her subjectivity) the distinction does not consider the existential manner in which a sonic event is heard when heard subjectively.
It is curious that this adherence to a physical “real world” is as strong if not stronger in animation compared with film. The typical filmed setting has realistic sounds—realistic by virtue of the photographic element that reproduces it—that are captured by microphones. So it becomes a process of reinforcing this reality through the sound design. In animation, there is no profilmic world to reify. There are physical laws of the particular animated story world, but this world is invented. From this, we can push the idea of “reality” aside somewhat and instead consider the idea of being using a phenomenological approach. Descartes’ other influential notion—his dualism being mentioned previously—is his famous proclamation of philosophical rationality: cogito ergo sum. This paved the way for philosophical thinking over the next 300 years. Martin Heidegger’s response to Descartes in the early 20th Century was that this resulting direction in philosophy mistakenly embraced one aspect of his famous cogito (I think), while ignoring what Heidegger regarded as the more important aspect (I am) (2000). This second half — “I am” equates to “being” — considers the particular self as it is in the world, what Heidegger called one’s Dasein (2010). The world “is essentially disclosed with the being of Dasein” (2010: 203). The two comprise a relationship, including a relationship with other beings, their Daseins, and their relationships with a self and a shared world. This is connected to Heidegger’s notion of “thrownness” into a world that is not of our choosing, but one in which we experience our particular being. This constitutes the being-in-the-world of every Dasein; but in particular it relates to my awareness of my Dasein in relation to others and the surroundings in which one is present.
The phenomenological approach moves us further away from the demands of rational, universal, external things and closer to the particular manner in which sonic events disclose themselves to a particular consciousness. Such a relationship with the world occurs within an inexact and continually fluid dynamic of concealment and unconcealment (Heidegger 2010). For Heidegger, and against the rational/empirical tradition from Descartes up until Kant, phenomena are not given to consciousness. The world as disclosed to a particular “being” of Dasein (in its distinctiveness) is a relationship; in this relationship is an ongoing process of hiding and revealing. And even as revealed, the phenomenon maintains an aspect of concealment which clouds understanding. Dasein is always in a hermeneutical process — or a particular mode of interpretation — in any given moment. As applied to practice, this does several important things. First, it produces a more active role for the subject in the context of his or her surroundings. It also introduces the element of intersubjectivity (our relations with other beings) and multiple perspectives in a way that Descartes and his followers never did. Lastly, the phenomenological mode obviates the inclination to sonify visual objects to rationalize the image. Footsteps, sounds of the synchronous and rational condition, lose their relevance. Instead, we concentrate on the subject in a relationship — for our purposes, a particular character and her mode of being-in-the-world as a hearing individual. We as an audience thereby come to empathize with that character in her world as she hears it. In Descartes we have the world of things extended in space; in Heidegger we have a particular Dasein possessing an (often narratively flawed) awareness of his condition and position in the world, his sense of his past and anticipation of his future, and his particular feeling or “mood” in the present moment. In narrative terms, a phenomenological approach to sound involves a relationship between a character and the setting with which she interacts, plus the various plot elements that come to be heard in the way they are heard. In this manner, we as an audience do not hear from some objective distance. We are not an omniscient god who hears from everywhere all at once. Rather we hear for and even as the character with whom we should connect. In the phenomenological approach, we as audience experience the world and other beings as the character does. Sounds, concealing and unconcealing along the horizons of being, do so for the character, as he or she comes to know them.
Let us now put this into the context of a specific audiovisual narrative. Robynn Stilwell uses a good example of a hearing subject in her analysis of a scene from the Michael Mann film The Insider (2007). Main character Jeffrey Wigand is driving to his court date and as Stilwell notes, his state of mind is in turmoil. The music does an effective job of presenting his emotional being in that moment. "The character becomes the bridging mechanism between the audience and the diegesis as we enter into his or her subjectivity” (Stilwell 2007: 196). This idea of a character “bridging” story to audience through music hints toward what is possible with a phenomenological approach to sound. However, rather than thinking of bridging audience and diegesis, we should think of it as connecting his evolving modes of being—not his mental or physical presence—to the audience. The music we hear is him, not only his subjectivity but his being in this moment—what Heidegger would call his “care” in relation to the world (2010). It is also worth noting that Stilwell, in analyzing the diegetic/nondiegetic gap in regard to music only, stops her analysis when the music fades and the sounds of the objective world emerge. But this shift itself is what is interesting in the scene because it is still Wigand’s consciousness. The music fading isn’t the end of the subjective moment, because we are still hearing as and for him. What has changed is not subjectivity to objectivity, but the subject’s relationship to his world. His moment of hearing the sound effects emerge is his shift in attention from his inward concern out toward the concerns of the world. He comes out of his contemplative state and “refocuses” his mind toward the task at hand — the court appearance — that now stands before him in sharp reality. Here, through sound, the filmmakers have enriched the condition of the main character’s journey and the dual struggle that faces him — his love of his family (his reflectiveness through music) and his duty to his principles (the hard sounds of the world he faces).
To better understand the subject of audiovisual storytelling, let us begin by examining the dialectic at work. If we are to accept the idea of the subjective, we must also consider the objective, which exists outside regardless of consciousness. The term “objective” in filmic terms refers to a perceptual agreement, a sense that what we hear is heard in a way that is available to all characters within the diegesis. This of course has value in the world-building dynamic. But as used exclusively in the process of designing, it places sound into an artistic restraint, distancing the sounds we hear from the characters who interact with them. The distinction between the objective and the subjective is more clear with images than with sound. The camera presents a position and we can usually get a sense of whether we are seeing from a realm of non-corporeal observation or one of embodied character through point-of-view (POV). Sound is not only represented in a different manner, but it is more difficult to structure. Consider the following example: Imagine there is a long shot of a road, shot from a 45-degree angle, from a camera placed high on a crane. In this shot, a car arrives from a distance, comes to its closest point in relation to the camera, then zooms out of frame. There is no assumption of visual subjectivity here because we don’t imagine someone floating high above the ground. The sound, however, assumes an auditory position, or point-of-audition (POA).(3) The car, as it approaches, begins quiet and reaches its loudest point the closest it comes to the camera position. While there is no particular subject established here visually, there is one presented (but never established) aurally. This illustrates that the audience is always hearing from somewhere. We the audience become something like a presumptive nondiegetic subject hearing diegetically, which exposes a problem in conceptualizing things from this strictly physical basis. But in fact, we are more in line with a nonexistent (or rather, universal) entity hearing objectively, and we therefore regard such sound as objective. A true subject, on the other hand, has particularity as an individual character hearing her world.
Kristin Thompson and David Bordwell (2008) propose two possible modes of film subjectivity: perceptual and mental. The former works in a visual sense (POV) because the camera position is fixed, and that shot is the only way that perspective can be considered. But their use of perceptual subjectivity with sound — “soft noises suggesting that the source is distant from the character’s ear” — doesn’t work in all cases for reasons mentioned previously. Also, because of the diffuse nature of sound, one or more characters standing anywhere near another character would hear the sound in the same manner, as is common in film, thereby dissolving anything particular (or narratively exceptional) in the notion of subjectivity. There remains a possibility of perceptual subjectivity if it is exclusive to that character, but their notion of perceptual subjectivity is more in line with objective sound (a rational agreement). Mental subjectivity for Thompson and Bordwell (2008) consists of images or sounds entirely inside the mind and not presently outside it—for example, inner thoughts as dialogue, mental images, visual or auditory hallucinations, or flashbacks. In regard to sound, however, there are difficulties here as well. By making a hard distinction between what is exclusively inside or outside the mind, we are back to our Cartesian problem of dualism. A phenomenological approach to sonic subjectivity would be to recognize both the objective and what they term the mental subjective as an ongoing relationship. Rather than an either/or approach, we instead imagine sound in the particular mode that the character hears it — a sonic event’s disclosure to individual being. Heidegger would term this as “attunement,” or the particular mood of Dasein in relation to thrownness in a particular moment. "In attunement lies existentially a disclosive submission to world out of which things that matter can be encountered" (2010: 129-130). This is often done cinematically in highly emotional moments. An extreme example of this is the mode of hearing after a bomb explosion. Sounds become muffled and perhaps there is ringing in the ears, all in an effort to connect story events more strongly to an individual character. But it can be used in any number of ways, not quite so dramatic, when hearing as the character is important, such as moments of attention or realization. What we are hearing is not purely mental nor purely objective; it is the objective world coming to the consciousness of the subject in an particular manner. Therefore, what is heard phenomenologically has the effect of connecting audience with character more strongly than visuals can. It is in this identification with character that we come to empathize with the narrative events for someone.
We can expand on this idea of a “someone” by turning to another well-known phenomenologist, Maurice Merleau-Ponty. Influenced by Edmund Husserl and Heidegger, Merleau-Ponty suggested that how we perceive is as a relation of our body within a world. But he is not producing a dualistic structure with this idea and is in fact openly critical of Cartesian rationalism and later British empiricists who tended to think in such ways. For him body and consciousness comprise the same phenomenological experience. A perception of the world is not of objects in isolation but always through the distinctiveness of one’s own presence within that world in that moment. “A thing is, therefore, not actually given in perception, it is internally taken up by us, reconstituted and experienced by us in so far as it is bound up with a world, the basic structures of which we carry with us, and of which it is merely one of many possible concrete forms" (Merleau-Ponty 1945: 381). The last part of this statement is important in regard to sound as design in the phenomenological approach. The world in our relation to it is not fixed; rather there are infinite possibilities within the disclosure/nondisclosure dynamic. How one hears or listens to a phenomenon depends in part on how one is negotiating his sense of being within that world. Considered phenomenologically, the primary effort in all sound design should be to develop this shifting relationship between self and world, character and setting, as channeled to the audience. This does not mean that all sound should be presented in this way. We still need sound as our means of reason and common sense. But in terms of design, it is more creative to consider the many ways in which we can hear for someone as a starting point in the design process.
Sonic events come to consciousness under various modes of presentation. As subjective processes, they are related to individual moods — attention, distraction, passivity, anxiety, exhilaration, etc. These can be integrated into the sonic approach by establishing subtle variations in how we hear what a character hears. How is this particular sound coming to have a relationship with the being of the character in this moment? In considering the how of sound, we must think about the distinction between hearing and listening. Roland Barthes, in the first sentence of his essay “Listening,” makes the claim that "Hearing is a physiological phenomenon; listening is a psychological act" (1985). Hearing is passive; it is an unconscious openness to the sonorous surroundings in which one finds oneself. In this mode, the soundscape of the world is uneventful and unworthy of noticing. Listening is what happens when one actively penetrates the membrane of the uneventful. In listening, I take notice toward something whose identity I may or may not know. And I notice it based on a particular state of mind in the world of experience. Regardless of whether the physical source of the sound is known or unknown, my consciousness is intended toward (or directed toward) some particular sonic event in an effort meant to reveal its meaning. As Jean-Luc Nancy says: “[T]o listen is to be straining toward a possible meaning…” (2007: 6). Nancy is pointing to the phenomenological here. Listening is based not on reason (necessarily) nor simply on sensation (empiricism); it is an activity, a directing toward what is revealed in an effort to uncover its (often multifarious) meaning.
The taxonomy of audible perception tends to stop there, in the distinction between hearing and listening. But there is still another mode of audible perception to consider which is also called “hearing.” We can think of this as a second-level hearing. This can be illustrated with the following statement: “Yes you are listening to me, but are you really hearing what I am saying?” This level of hearing goes beyond listening toward understanding. It has some relation to what phenomenologists call the eidetic intention, or a mode of perception that goes beyond the empirical and into the imaginative (Sokolowsky 1999). In this upper echelon of hearing we become creative subjects; we produce meaning and can abstract certain ideas from what we choose to listen to.
What happens with the treatment of sound in this manner, in a design sense, is that we present an auditory event as something either reflecting the objective world of rational continuity, or more toward a subjective manner of unfolding, one which calls for more active listening and perhaps even second-level hearing of sound as a phenomenological event. These are not hard distinctions, but indeed can be far more interesting when presented as a relationship that changes over time. Films that have done this well in a plot-specific sense are The Conversation (the changed meaning of a tape recording) and The Orphanage (two different disclosures of a particular sonic plot event).(4) Animation, however, has not taken advantage of this dynamic even though its form of pure invention makes it the ideal medium for such shifting states of subjective awareness and empathy.
(3) Point-of-audition can be attributed, in its different uses of the term, to Rick Altman (1992) and Michel Chion (1994). More recent writings that illustrate the ongoing degree of difficulty in considering POA include Anahid Kassabian (2008) and Svein Høier (2012), among others. The former illustrates the level of ambiguity that arises, while the latter attempts to encompass and categorize multifarious aspects of the term.
(4) The Conversation (Coppola 1974) features a taped recording which is fixed as an object, but whose significance changes through the interpretation of the lead character. The Orphanage, or El Orfanato (Bayona 2007), includes a sonic event whose meaning changes when the scene is revisited from a different state of awareness. Because these sonic events are plot-driven moments that manifest in different ways over the course of the film, it is best to watch both films in their entirety to understand how sonic disclosures can function in film.
In order to produce phenomenological disclosure with sound, one must present phenomena to be disclosed to consciousness. To accomplish this, a lifeworld of sound must first be considered and developed conceptually. “Lifeworld” is a phenomenological term that takes note of the world of experience as revealed to consciousness (Moran 2000). It is experience prior to reason, in which we are, according to Husserl, always already experiencing and interacting (Moran 2000). Sonically, this lifeworld can be revealed to the audience 1) objectively, 2) through the awareness of the characters within their world, or 3) through the blending over time or intentional ambiguity of the two. I will address two aspects of sound that have the potential to produce a richer lifeworld for character interaction in animated content: ambient sound and offscreen space.
All living things—whether of this world or any other—inhabit some kind of interior or exterior setting. Characters interact not only with each other through dialogue, but also with the space in which they presently exist. It is not simply an issue of space, however. We must also account for the unfolding of time within such settings. Time, connected with our position in space, allows sound to come and go for the hearing character. This time element comprises the audible horizons of perception for the hearer (Ihde 2007). Such time-spaces are rarely if ever silent. There is always a sonic structure pervading any setting. The most interesting environments are those that are alive with culture, atmosphere, and activity. So the first method of creating a sonic lifeworld is to consider this evolving time-space in which events occur. This kind of sound goes by many names in audiovisual production: natural sound, backgrounds, environmental sound, even “noise.” But a better descriptor of this kind of presence is “ambient sound” because it goes beyond the idea of location and into subjective awareness and personal psychology. It may seem simple to have machine hum in a factory or birds chirp in a forest. But ambient sound can do much more. It can reflect character emotion in a given situation, offer commentary, present certain conditions to the characters, and conjure a deep sense of culture.
It is also the means of establishing the subject in relation to his environment. When space and time combine effectively, it does not only produce the objective, but establishes “the timespace of the phenomenological subject who performs a reduced listening which does not hear a place but produces its own” (Voegelin 2011: 163). There is enormous creative potential here. If a story takes place in a futuristic city, for example, the designer has total freedom to create that sound culture as it relates to the individual being of a character. Somewhat akin to audible production design, ambient moves beyond such static image-based world-building and instead moves dynamically in time in relation to that character. The sonic lifeworld can breathe and change not only in its physical representation, but also as varying states of subjective co-presence and absence. It also presents the possibility of absolute subjectivity. Imagine a moment when a character either notices a particular sound or must listen carefully to one. Another approach is to develop a compelling ambient signature and then gradually removing it to silence. Through this, we can isolate some particular sound for the character as a means of isolating some key narrative moment. “The suppression of ambient sounds can create the sense that we are entering into the mind of a character absorbed by his or her personal story” (Chion 1994).
Unfortunately it is difficult to find animated films that use ambient sound well. The reason for this goes back to the idea of “location sound” in film. Because film records in a location, there is ambience already produced. It is then a matter of “sweetening” it toward various purposes — to establish sonic continuity or to produce something more creative. Because there is no location sound in animation, there is nothing to manipulate in post-production. This is why oftentimes there is simply nothing — no sound of place whatsoever and instead merely silence.
There is a tendency in animation to concentrate too much on what we see directly in front of us. However, our eyes have a limited field of vision. We cannot see everywhere. Our ears, on the other hand, are omnidirectional — we hear sounds all around us, beyond our field of vision. That which is not visibly evident on the screen is “offscreen space.” Too often these sounds beyond the frame are called “offscreen sound.” But as Christian Metz (1985) has correctly noted — and Doane (1985) would likely agree — there is no such thing as offscreen sound because sound is never dealt with in regard to the enclosure of the frame. The sound of offscreen space is therefore better considered phenomenologically as “acousmatic sound.” This is a term Pierre Schaeffer lifted from Pythagoras, which was subsequently applied to film studies by Michel Chion (1994). It refers to sound which has not yet revealed its visual identity. Acousmatic sound as a concept exhibits the degree to which the visual is connected with Cartesian rationality in objects. If a listener is denied the visual source of a sound, the mind struggles to come to terms with its specific nature or cause. Denying the visual source thereby has the power of compelling the perceiving subject to imagine such an identity. “A sound or voice that remains acousmatic creates a mystery of the nature of its source…” (Chion 1994). A gradual emergence of some sound into the field of vision can therefore be used as a means of audible-to-visible disclosure. Due to the level of ambiguity inherent in audible perception, the mind must work to try and imagine what the visual embodiment of that sound is. This is something deep in our psychology of survival: We have an instinctual desire to know visually whatever we cannot picture in our minds. It sounds big. Is it big? What is it? What’s coming? Is it threatening? Later, when you finally reveal the visual source of the sound, everything comes together rationally. But until then, you can offer a profound sense of the unknown.
Offscreen space is not simply defined as being physically outside the rectangle of the screen. It can also be utilized as a presentation of phenomenological nondisclosure to consciousness. Chion (1994) identifies the phone conversation as a presentation of purposeful absence. If the dialogue of the speaker on the other end is not disclosed, we are denied access to content in a way that makes us wonder what is being revealed to the character. The disclosure/nondisclosure dynamic can be used throughout the storytelling process. For example, maybe we hear the content of that conversation later, in an offscreen presentation in which the character remembers what is said. This extends to any manner of plot-based sound that is either concealed or presented in a process of unconcealment to the listener. It helps to continue the mystery and keep us in step with the evolving existential condition of a character.
Let us explore some examples of popular Asian animated films and identify some of the problems associated with their approach to sound design in the context of the issues mentioned previously. Mamour Oshii’s Ghost In The Shell (Kôkaku Kidôtai) (1995) presents a nonspecific Asian city of the future. As a fictional location in both time and place, Oshii and his sound designers had an opportunity to create a sonic culture, a lifeworld of sound in which characters interact. But while visually inventive, the film is for the most part culturally mute, with subjectivity only established through inner dialogue. There is a missed opportunity in design: What does this imagined world sound like? Does it hum? Does it beep and murmur? Is it industrial and mechanized, organic and interactive? Do fruit sellers roll down the street in trucks shouting prices through bullhorns? Apparently no one exists in this world; these are characters abstracted from their setting, chasing each other around a mute and lifeless city. The example scene below reveals a Cartesian approach to sound. In the first third of the clip, there is no sound of the city because there is not much visibly apparent. We only get the sound of objects occurring within the screen. In the second third, when we have a visually active setting, we get accompanying sounds. Then again in the final third, with nothing visual, we hear nothing of the world.
The story is actually quite inventive and philosophical in its ideas of subjective being. We find this presented in what Thompson and Bordwell (2008) would call “mental subjectivity” in regard to inner dialogue. However, we do not hear a relationship between the characters and their world. A key plot element in the movie is that the characters are not human but are conditioned through programming. How might such a character hear “her” (its?) world? We are meant to identify with the main character Kusanagi, but her subjective moments do not reveal a dynamic between the outside world and her being in that world. More often, identification with her condition is done by masking the world of sound through the use of nondiegetic music. While this creates an emotional focusing upon her condition, as music does—think again of The Insider—it also removes the possibility for us to connect with her as she experiences her particular environment. A connection between her being and her being within a lifeworld is never created aurally.
Akira (2008), by director Katsuhiro Otomo, is similar in sonic design. As with Ghost in the Shell, “Neo-Tokyo” stands prominent in the storytelling. But we don’t know what this new city sounds like. As evidenced particularly in early scenes, establishing the setting, there is no sound. It is sonically empty. One might suggest that this emptiness reflects the inward paranoia of its citizens in a dystopian condition. But we see throughout the film that the characters are very active in this world—they drive through the city, engage in political protests, and go about their lives. More than that, it is a city on the brink of anarchy. But this again is embodied visually; other than crowds of voices, music and certain onscreen synchronous elements, we don’t hear the world, and certainly not through anyone who inhabits it.
It is difficult to discuss animated works that embrace a phenomenological approach to sound design because not many exist, even though animation is the perfect storytelling apparatus to do so. One example that comes close is Hayao Miyazaki’s Spirited Away. Early in the film, the family enters a tunnel, which becomes a long hall, then emerges into an open field. The sound designers did an interesting thing with the soundtrack in that it is through sound that we come to recognize the hall as an abandoned train station, through the faint reverberation of an arriving train. This accomplishes several narrative aims. First, it injects ambiguity into the soundtrack and the story. Is it “real” or is it mystical? Is it a ghost of the past or something else? The mother asks: “Do you hear that?” even though no train actually exists. We might ascertain that it is her subjective moment. But Chihiro, her daughter, responds to it: “It sounds like a train.” (The father makes no reference to it.) We might consider this mother-daughter acknowledgement a maternal act, an alerting toward something that Chihiro must recognize. It is a last gesture of protectiveness before the mother disappears and Chihiro sets out on her own journey. We can therefore consider the sound, and the station itself, as a transition point of not merely of mind and body, but of being. It metaphorically establishes Chihiro’s existential “transportation” from the real world into fantasy, and a transition from innocence to adulthood. It also portends via foreshadowing her later heroic journey by train to make amends for Haku, in itself a transition into adulthood. This sound is not tied to the physical world of objects, nor to notions of subjectivity; rather it is a phenomenological disclosure which provides narrative meaning beyond the rational world of physical extension.
Applying the tradition of phenomenology toward animation sound is an effort to stretch the narrative possibilities in the form by re-conceptualizing how we hear, listen to and experience invented worlds. Remembering that animation is an art that builds such worlds, let us consider for a moment its predecessor, painting. Merleau-Ponty in The World of Perception illustrated how a painting is not meant to represent the world of rational objects, but as a particular presence of a world of lived experience or one fully created anew. “Suffice it to say that even when painters are working with real objects, their aim is never to evoke the object itself, but to create on the canvas a spectacle which is sufficient unto itself” (2004: 96). We can adopt the same spirit in animation as a whole, but particularly as concerns the sound that the form can produce. The following statement, again given in the context of painting, could just as easily relate to phenomenological approaches to sound design:
[A]s in the perception of things themselves, it is a matter of contemplating, of perceiving the painting by way of the silent signals which come at me from its every part, which emanate from the traces of paint set down on the canvas, until such time as all, in the absence of reason and discourse, come to form a tightly structured arrangement in which one has the distinct feeling that nothing is arbitrary, even if one is unable to give a rational explanation of this. (2004: 97)
In a painting, the artist is the creator of the world. But it also achieves significance in the viewer who absorbs it. To paraphrase Theodor Adorno (1995), what is alive in a painting is not what has been painted, but what is painted in the moment of experience. In the viewing, one gives internal time to the image. We can think of this time as giving our own sound to the painting, not so much as a conjuring of particular sonic environments per se, but rather in the way the mind thinks. In time, thinking fills the mind with language, emotion and imagination, all of which are qualities of the sounding of the world. We are always in sound, regardless of whether or not we hear anything in the world.
In animation, by contrast, time is provided. The sound that develops personally in the viewer of the painting is here offered as experience by a designer. One can choose to either design a world of objects or instead design experience itself, through hearing characters who inhabit it. If we only consider what is diegetic, nondiegetic, or some blending and crossover between the two, we are deciding that the film world is comprised of what is physically relevant. If we only consider what is subjective or objective, we are only thinking in terms of the perception of a single character. In this split—between what is (and is not) “the world” and what is (and is not) “the mind”—something is lost, something integral to the aim of storytelling itself. What we identify with in any good story is the way that we come to absorb the world through our experience of it. Being itself is not just self, nor is it just the world; it is an individual relationship of self within a lifeworld that comprises existence and experience. It is in this way of thinking that sound be free to play with ambiguities of perception, various states of attunement, shifts in modes of listening and hearing, the language that lives within us at nearly every instant, and the music that comes to signify our feelings in a condition of heightened awareness or deep turmoil. In this way, we are not simply designing a world, we are designing ourselves.
Adorno, Theodor W. (1995). “On Some Relationships Between Music and Painting.” The Musical Quarterly 79/1: 66-79.
Altman, Rick (1992). “Sound Space.” In Rick Altman (ed.), Sound Theory Sound Practice (pp. 46-64). New York: Routledge.
Altman, Rick (1994). “Deep-focus Sound: Citizen Kane and the Radio Aesthetic.” Quarterly Review of Film & Video 15/3: 1-33.
Aristotle (1962). Poetics (trans. I. Bywater). Epub version. Oxford: Clarendon Press.
Barthes, Roland (1985). “Listening.” In The Responsibility of Forms (pp. 245-260) (trans. Richard Howard). New York: Hill and Wang.
Bayona, Juan A. (Director). (2007). The Orphanage (El Orfanato) [Motion picture]. Spain: Telecinco Cinema, et al.
Chion, Michel (1994). Audio-Vision: Sound on Screen. New York: Columbia University.
Coppola, Francis F. (Director). (1974). The Conversation [Motion picture]. USA: Paramount Pictures, et al.
Curtis, Scott (1992). “The Sound of Early Warner Bros. Cartoons.” In Rick Altman (ed.), Sound Theory Sound Practice (pp. 191-203). New York: Routledge.
Descartes, René (2010). Meditations on First Philosophy (trans. Jonathan F. Bennett).
Doane, Mary Ann (1985). “The Voice in the Cinema: The Articulation of Body and Space.” In Elizabeth Weis and John Belton (eds.), Film Sound: Theory and Practice (pp. 162-176). New York: Columbia University.
Gorbman, Claudia (1987). “Narratological Perspectives on Film Music.” In Unheard Melodies: Narrative Film Music (pp. 11-30). Indianapolis: Indiana University Press.
Heidegger, Martin (2010). Being and Time (trans. Joan Stambaugh). Albany, NY: State University of New York.
Høier, Svein (2012). “The Relevance of Point of Audition in Television Sound: Rethinking a Problematic Term.” Journal of Sonic Studies 3:1.
Ihde, Don (2007). Listening and Voice: Phenomenologies of Sound. Second edition. Albany, NY: State University of New York.
Kassabian, Anahid (2008). “Rethinking Point of Audition in The Cell.” In Jay Beck and Tony Grajeda (eds.), Lowering the Boom: Critical Studies in Film Sound (pp. 299-305). Indianapolis: University of Illinois.
Lastra, James (2000). Sound Technology and the American Cinema: Perception, Representation, Modernity. New York: Columbia University.
Mann, Michael (Director). (1999). The Insider [Motion picture]. USA: Touchstone Pictures, et al.
Merleau-Ponty, Maurice (1945). Phenomenology of Perception (trans. Colin Smith). London: Routledge.
Merleau-Ponty, Maurice (2004). The World of Perception (trans. Oliver Davis). London: Routledge.
Metz, Christian (1985). “Aural Objects.” In Elizabeth Weis and John Belton (eds.), Film Sound: Theory and Practice (pp. 154-161). New York: Columbia University.
Miyazaki, Hayao (Director). (2001). Spirited Away [Motion picture]. Japan: Studio Ghibli.
Nancy, Jean-Luc (2007). Listening (trans. Charlotte Mandell). New York: Fordham University.
Oshii, Mamour (Director). (1995). Ghost In The Shell (Kôkaku Kidôtai) [Motion picture]. Japan: Production I.G.
Otomo, Katsuhiro (Director). (1988). Akira[Motion picture]. Japan: TMS Entertainment.
Stilwell, Robynn J. (2007). "The Fantastical Gap Between Diegetic and Non-diegetic." In Daniel Goldmark, Lawrence Kramer, and Richard Leppert (eds.), Beyond the Soundtrack (pp. 184-202). Berkeley and Los Angeles, CA: University of California Press.
Thompson, Kristin and David Bordwell (2008). “Categorical Coherence: A Closer Look at Character Subjectivity.”
Voegelin, Salome (2011). Listening to Noise and Silence: Towards a Philosophy of Sound Art. New York: Continuum International Group.
Wayne, Mike (1997). Theorising Video Practice. London: Lawrence & Wishart Limited.