The Influence of Music Technology on the Perception of the Performer in Phonographic Space

Mads Walther-Hansen

University of Copenhagen



  This paper focuses on designs of phonographic spaces in modern popular music recordings. I am particular interested in exploring how the staging of recorded voices influences on the listener-performer relation. Spatial effects, such as reverb, obviously alter the perceived acoustic space of recordings, but it seems that we know more about how these effects are applied to represent a given 'physical' structure of space, than we know about how spatial effects change how listeners relate to the performer in a more emotional way.
  We already know that the voice has primary significance when listening to recorded popular music. This has been explored by the French film critique Michel Chion who emphasises the 'vococentric' qualities of voices. Chion writes, with reference to voices in film that:

"In every audio mix, the presence of a human voice instantly sets up a hierarchy of perception" (Chion 1999: 5).

  This structuring is in practice a process of foregrounding the voice and backgrounding all other sound sources. The idea of a perceptual division between foreground and background is one of the main principles of gestalt psychology. This principle was later adopted as one of the ground pillars of the philosophy of Merleau-Ponty, who defines space as a form of external experience, rather than as a physical setting in which external objects are arranged.
  I will in this paper look at three examples of recorded voices to explore how different post-production techniques change, what I will call, ' the directness of the voice. Furthermore, I will explore how the relation between voices and spaces is connected to the embodiment of the voice. Firstly, I will claim that we experience voices as connected to bodies that are addressing us from a point in phonographic space. Secondly, I want to stress that bodies demand places in which to exist. If there is no experience of space there can be no bodies.


  This approach presented here is an elaboration of the study of Serge Lacasse and William Moylan, who has previously explored the staging of sounds in popular music.
 In his PhD thesis from 2000 Lacasse defines vocal staging as:

 "...any deliberate practice whose aim is to enhance a vocal sound, alter its timbre, or present it in a given spatial and/or temporal configuration with the help of any mechanical or electrical process, presumably in order to produce some effect on potential or actual listeners" (Lacasse 2000: 4)

  Lacasse's study interprets staging effects according to a semiotic framework. In this way he often encounters connotations between effects and emotions, such as the link between distortion and the arousal of anger.

  William Moylan has contributed with another useful approach to describe recorded sounds. In his book 'The Art of Recording' Moylan introduces a graphical representation of the sound stage, as a tool to describe the spatial placement of sound sources. Moylan describes how sound sources are placed in a specific location, in a specific acoustic environment on the sound stage.  Spaces on recordings are in Moylan's terminology comparable to real spaces. In this way imaginary acoustic environments on the sound stage are evaluated according to the listener's experience of acoustic environments in the real world.
  The sound stage is in this way a 'God's eye view' on recorded space that does not say much about how the different spatial positions of the vocal affects us. However, it is a useful tool to describe the production side of music.
  In any case, if the sound stage is an attempt to represent what is actually there in the mix, we still need to find out how listeners make sense of this spatial distribution.
  According to Moylan: "The recording represents an illusion of a live performance" (Moylan 2002: 174). If we support this idea, we must also accept the idea of perceived performers and that these performers are perceived bodily beings.
  I will challenge this notion by presenting an example of a voice that appears to transcend Moylan's soundstage. The following example is a voice taken from the track Radio number 1 performed by the French Electronica duo Air

Excerpt 1: Air (2001): Radio #1, 10.000 Hz Legend, Virgin

  In this example we hear a radio speaker singing along with the track, which is suggested by ducking the rest of the sound sources when the voice appears. The radio speaker is not located in the musical performance, but outside its diegetic level. This type of voice, which is comparable to the filmic voice over, has troubled many film sound theorists. In her book 'The Voice in the Cinema' Mary Ann Doane writes that:

"The voice over is a radical otherness with respect to the diegesis which endows the voice with a certain authority. As a form of direct address, it speaks without mediation to the audience, bypassing the "characters" and establishing a complicity between itself and the spectator - together they understand thus place the image (...) it censors the question "who is speaking?," "Where?," "In what time?," and "For whom?"" (Doane 1980: 42)

 The voice over enters from outside the story and after the reality of the performance. In this way the voice over is clearly separated from the rest of the performance in time and space.  
  In Radio #1 we are experiencing what we may call a sound stage in a sound stage, i.e., a track with more than one diegetic level. The radio speaker is addressing us from nowhere. It is a disembodied voice in an empty space, somewhere outset the perceived reality of the performance.
  Next, I will move on to two other examples, to discuss the spatial ambiguity we hear in many recorded voices. Spatial ambiguity is not a problem confined to modern recordings. In his book Echo and Reverb Peter Doyle writes with reference to the recording of Elvis Presley's 'Mystery Train' from 1955:

"The place of the voice in the mix reverberant, yet 'up front' compounds the ambiguities, setting up a simultaneous nearness and remoteness" (Doyle 2005: 2)

Describing the spatiality of recorded voices in physical terms can be difficult. At worst it makes no sense. One approach that might bring us closer to an understanding of the spatial distribution of sound sources is Lakoff and Johnson's idea of metaphorical projections of image schemata.

Metaphors and image-schemata in performer-listener relations

  In their book Metaphors We Live By Lakoff and Johnson describe how perceptual domains are structured by projecting patterns of experience from one domain to another, characterised as source and target domains
  Metaphors in this sense do not only apply to linguistic analogues to other phenomena, but are understood as a way of making sense of all human experience. In this way, to feel emotionally involved with someone is often described metaphorically in physical terms, as:

'I am feeling close to someone'.

  This closeness is not necessarily a physical distance, and therefore not a literal representation. Lawrence Zbikowski, who has worked with metaphor-theory in connection with musical analysis, has shown how image schemata work to structure our emotional responses to music.

Metaphors are: "a characterisation that uses our knowledge of physical space to structure our understanding of emotions. (Zbikowski 1998).)

  To feel close is a conceptual metaphor that maps physical space onto mental states. Closeness is not a physical distance, but an emotion. You cannot multiply the distance of closeness to get remoteness. Saying that a voice appears close or remote, therefore, describes an emotional involvement with the voice we are listening to.
  Closeness belongs to a horizontality image schema. This schema is derived from our bodily experience of horizontal structures, when moving around in everyday space.


Horizontality Image Schema

I want to exemplify this:
  The first example is a song performed, written and produced by the Swedish singer songwriter Stina Nordenstam, and this is from the 1. verse of the song 'Winter Killing' from the album The World is Safed recorded in 2004.

Excerpt 3: Stina Nordenstam (2004), 'Winter Killing', The World is Saved, Ume Imports

  The second example is performed by Beth Gibbons, better known as the front singer in the British trip hop group, Portishead, and this song called 'Mysteries', is from the album Out of Season from 2001:

Excerpt 4: Beth Gibbons and Rustin Man (2001): 'Mysteries', Out of Season,

  There are several aspects of these two vocal recordings that are comparable.
Both voices are close-miced. They are nearly whispering and we can clearly hear their breath and the sound of the tongues as the sing. In real life we would not be able to hear these details of the voice, unless someone were whispering in our ears. Both voices therefore appear to evoke a sense of closeness in many ways.
  However, there are several subtle, but important aspects in the staging of their voices that differs. While the voice of Beth Gibbons appears dry and natural, Stina Nordenstam's voice sound more processed. First of all there is some reverb added to Nordenstam's voice. The tail of the reverberation is very short but the early reverberations are quit loud, which creates the illusion of a very small almost claustrophobic room with hard surfaces. There is some distortion added, making the vowel sounds slightly harsh. The low frequencies have been cut out, which makes the voice overtly bright, compared to what we would expect from a close-miced voice.
 These effects tend to add a certain distance to her voice. Or to put it another way, it appears that the nearness of her voice is compromised by something that makes her appearance less 'direct'. The closeness of Beth Gibbon's voice on the other hand appears more unambiguous.


There are lots of aspects involved in describing the relation between voice and bodies and the felt closeness of voices and I have only touched upon a few of them. We may not be able to measure exactly how close or intimate one voice appears, but metaphor theory, allow us to frame such descriptions in physical concepts, without confining to physical concreteness. What I have tried to illustrate is that the voice has some kind of 'directness' that is connected with the way the voice is staged.


Doane, Mary Ann. 1985. The voice in the cinema - the articulation of body and space. In Film sound - theory and practice, ed. Elizabeth Weis and John Belton. New York: Columbia University Press.
Doyle, Peter. 2005. Echo and reverb - fabricating space in popular music, 1900-1960. Middletown, CT: Wesleyan University Press.
Chion, Michel and Claudia Gorbman. 1999. The voice in cinema. New York: Columbia University Press.
Lacasse, Serge. 2000. Listen to my voice - the evocative power of vocal
staging in recorded rock music and other forms of vocal expression.
Ph.D, University of Liverpool.
Lakoff, George and Mark Johnson. 1980. Metaphors we live by. Chicago: The University of Chicago Press.
Merleau-Ponty, Maurice. 1978 (First published 1962). Phenomenology of perception transl. from French by Colin Smith. International library of philosophy and scientific method. London: Routledge & Kegan Paul.
Moylan, William. 2002. The art of recording - understanding and crafting the mix. New York and Oxford: Focal Press.
Zbikowski, Lawrence M. 1998. Metaphor and music theory: Reflections from cognitive science. The Online Journal of the Society for Music Theory 4, no. 1.