Recording in the real world.

By Ted Fletcher (analogue designer, sound recording and acoustics consultant).



When I’m giving talks to younger people about sound and sound recording, inevitably the subject concentrates on the physical properties of sound and how to capture it.  
I am becoming more and more convinced that this is entirely the wrong approach; it is an example of how the ‘scientific method’ can be a hindrance rather than a help;  I constantly worry that a recording engineer is still, even today, like a photographer facing a Grand Prix with a box camera.  
It’s fine and necessary to learn about the available technology for capturing sound and being able to reproduce it again at a later date, but to get anything like a true understanding of what we are really getting into, one needs to understand more about how we hear sound, both physically by knowing a little about ears and the mechanisms within them, but even more importantly, how sounds affect us and how we understand what sounds are.

Once we have some sort of grasp of what and how we hear, we can start to apply that knowledge to the various parts of the recording process, starting in the studio at the microphone, then moving on to the microphone preamplifier and other necessary or unnecessary parts of the recording chain, up to the recording medium itself, and the ways of monitoring what’s going on, and how to use it once we’ve got it.


We are taught, parrot fashion, that sound doesn’t behave in a linear way like string or water, it is logarithmic.  Usually we are told some ‘gee wizz’ facts to try to understand scale, but sadly our brains don’t work that way and it’s a tough concept to grasp. I think the only way to come to terms with sound levels is to think in terms of dB (decibels) and try to remember that 1000 watt amplifier is not that much louder than a 10 watt amplifier!
I am being intentionally flippant about this; at the two ends of the spectrum; to make a loud sound louder, a lot of extra energy is needed, if a sound is already very quiet, you can take most of its energy away, and it won’t seem to get much quieter!

And what about frequency? We can ‘hear’ a range of frequencies from about 25Hz up to around 14KHz.  That can be measured as the ‘frequency response’ of our ears, but natural and musical sounds extend from as low as 8Hz and up to 40 to 50KHz.  Is that relevant? Absolutely!!


From the mid 1930s up to the late 1960s a mass of work was done in audio labs studying the biology and the physical limits of human ears.  I won’t be arrogant enough to dismiss the whole of the research out of hand, but I do insist that all that work needs to be placed in the context of the knowledge that what we think we hear is very much more important than what some figure on a graph tells us we should be hearing…. Or what some pundit in a magazine tells us we should be hearing!

The mechanism of the ear is reasonably well understood and taught; the path of pressure waves from the outside air causes movement of the eardrum, the bone structures act as an ‘impedance converter’ and transfer the vibrations across the middle ear to the inner ear where the pressure waves act on sensory cells in the ‘cochlea’.  But that level of understanding of ‘hearing’ is about on a par with the knowledge that a dog usually has four legs.

Now, I want to scratch the surface just a little…..


Physical hearing tests show that we can discern frequencies as ‘notes’ down to about 30Hz, yet in our daily lives we are subjected to, and are well aware of lower frequencies (so called ‘infra sound’). These can come from mechanical things such as heating and air conditioning systems, trains and motors, as well as naturally (storms and wind).
At the other end of the spectrum we are not only aware of frequencies above 15KHz, but of course, all musical sounds contain harmonic information at high frequencies contributing to musical quality.  
And all this has very little to do with ears…. It’s to do with sound hitting our bodies and skin; not necessarily ‘trouser flappingly’ hard, our brains interpret subtle sounds from all over our bodies and integrates them with the refined signals that it gets from our ears.  What we ‘hear’ is a combination of all that.


Any discussion about the range of volumes that are heard by the human ear is bound to be complicated.  Simplistically, we can hear sounds as quiet as a pin dropping onto carpet at a distance of 20 feet (OK, that was a guess!) up to a level where the pressure of sound causes physical pain; in front of the rig at an AC/DC concert. But within those extremes hearing does some amazing things: A trip out into the country on a quiet night can easily show how our hearing (note, I’m using the term ‘hearing’ rather than ‘ears’) changes and becomes very much more sensitive than normal.
Equally, in a noisy environment our hearing ‘desensitises’ as if it is compensating to make things more comfortable; and that’s exactly what it is doing.  
Where these effects actually take place is complex and debateable; some of the compression effects take place in the middle and inner ear but I suspect that most of it is in the brain; and there are aspects of this biological compression that have a great bearing on quality and appreciation.


So far we have considered sound and hearing in terms of ranges of perception. It’s like describing a painting as ‘various coloured patches on a flat plane’.  But eventually we want to move towards an understanding of recorded (or created) performance, and so we need to know more about what our hearing considers good and acceptable and if there are any aspects of ‘not so good’ that have to be watched out for.


The simplest ‘musical’ note is a sine wave; this is a sound of a single frequency, devoid of harmonics.  If a sine wave is distorted by compressing or constricting just the top or bottom of the wave, then harmonics appear in the sound.  These harmonics are called ‘even order’ harmonics and they are musically related to the fundamental frequency. The 2nd harmonic is one octave above the fundamental; the 4th is 2 octaves, and so on.
BUT if the sine wave is distorted symmetrically, top and bottom, the resulting harmonics are called ‘odd order’, 3rd, 5th and 7th harmonics, and these frequencies are musically (that is, within our scalar structure) unrelated to the fundamental frequency, they just sound harsh and unnatural.
I have theories as to why even order distortion sounds acceptable while odd order doesn’t, a part of the answer probably lies in the way the cells respond in the inner ear; they are tiny hairs of different lengths that sway and trigger impulses from their roots.  Another possibility is that it is because almost all harmonics that occur in nature are even order; the whistling of the wind, a human voice, the song of birds, all are rich in 2nd order harmonics, as are the sounds from physical musical instruments, like the violin, the trumpet and the piano.
Other sorts of ‘distortion’ are Amplitude Distortion; which goes back to what our ears and brains do to loud and soft sounds…. They apply compression and that is a form of distortion.
And there is Amplitude Frequency Response Distortion;  which is just a posh way of saying that some frequencies of sound are not heard as well as others.
And then there is an even more insidious form of distortion, and that is phase distortion… but that involves things like direction information and I shall come to that later.

Listening to all that stuff, you could be forgiven for thinking that I’m just a prophet of doom and that there’s no point in even trying to record sound faithfully…. But that’s not so at all;  I’m merely trying to overcome any feelings that by using such and such a microphone in such and such a way will give a perfect recording;  I’m saying that each and every recording is a separate work of art… it is a representation of the original.  It might be what you think is an accurate copy, it might be a truly creative version that enhances the performance, but in every case it just isn’t a simple representation!


Now what I really always want to talk about is compression!

Our ears are really not very good at handling the extreme range of sound volumes that we are subjected to…. No, that’s not quite right, it’s more true to say that the range of volume of sounds that we want to be able to appreciate is so vast that there has to be a variety of built in compression systems just to stop our heads exploding!  And there are!
There are purely physical compression systems both short term and long term; from non-linearities in the middle ear bone structures preventing large deviations from loud sounds, to inflammation effects in the middle and inner ear that severely reduce sensitivity when loud sounds are continuous. And there are sort of ‘software’ compression effects where our brains mask off some of the stream of impulses from the ears to reduce ‘volume’, and where I think it’s possible that extra processing power is recruited when conditions are extremely quiet to try to differentiate between meaningful sounds and extraneous noises of the body, like your heart, breathing and gut rumblings!

Because these biological compressors are active most of the time in normal daily living, what the brain ‘hears’ is constantly being altered and these alterations come and go dependent on the spectrum of the sounds and the format of the intensity… that is, there are different natural compression effects for sharp repetitive sounds and smooth continuous sounds.

What I am eventually getting to is that we can make use of knowledge of these effects; we can apply artificial corrections mimicking the natural ones, and fool the brain into thinking that certain sounds are quieter or louder than they really are.
It’s very useful that the brain is amazingly adept at processing this sort of information, and we can go a whole lot further than the simple concept of applying some gutsy optical compression to an overall mix and make it sound louder.  There are great subtleties waiting to be exploited… or plundered.
Attack and release shapes and times for these biological compressors are massively variable…. Very sharp transients, like gunfire for example are attenuated extremely fast, and, if the shock to the system was slight, the recovery time is also quick.  The effect of a rock concert can give you 30dB of ‘biological compression’ for up to 24 hours!  Between these extremes there is a world of rapidly changing ‘gains’, and intelligent use of compressors can enhance depth, height, colour and transparency very much more effectively than altering mixing levels or applying that bane of quality… EQ.


But now I would like to change the subject, from talking about how we hear once the sound gets to us, to the monitoring of the sound in the studio.

Years ago, I got involved with the development of monitoring loudspeakers for radio stations under the Independent Broadcasting Authority.  This was in the mid 1970s and the standard loudspeaker for speech studios was the Spendor BC1, which I believe, had been developed by the BBC.
It’s true that when listened to at fairly low level the sound seemed to be accurate and convincing, but when I was testing lots of different experimental loudspeakers, I realised how extreme the ‘auto correction’ of our ears is!
You can put up a strange loudspeaker and listen to a known source or record, and in a matter of minutes, if you are not concentrating carefully on exact elements of the sound reproduction, you will start to accept the sound as ‘normal’!
For years subsequently, I was niggled by a number of aspects in stereo listening, particularly the received wisdom that we should listen to an identical pair of loudspeakers placed a distance apart, and the seemingly insoluble problem that if you want to reproduce a ‘natural’ bottom end, then usually there were inadequacies in the mid ranges. Another terrible anomaly has been the use of the ‘pan pot’ as a means of specifying position; the whole idea is false, and only works because we insist on listening with widely spaced loudspeakers!
(directional information has almost nothing to do with volume difference, it is determined by time difference and in nature, it is entirely sensed by our ear spacing; ‘panned’ information only works successfully with wide-spaced loudspeakers.)
 Individually these problems have been addressed; we learned early on that one could give depth to a mono signal by adding reverb that contains multiple reflections.  The problem of stereo placement of individual signals can be overcome with clever time delays…. but the conventional monitoring setup still feels like a compromise.


It was only when I started experimenting with Sum and Difference recording techniques that I started to suspect that there might be another practical solution to stereo listening.  Recordings made with M/S mics (Middle and Side) certainly sound beautiful and ‘solid’ when replayed conventionally, but I got to thinking about the possibility of reversing the process…..

Let me talk for just a moment about M/S recording:  I know I’m telling you things that are very simple and obvious, but it’s possible that you may not have heard it quite this way before!

You have a signal source that you want to record… say an acoustic guitar.
So you place a cardioid microphone in front of it to capture the sound at that point.
But, you would also like to record the effect of that guitar in the room in which it is being played, so a way of doing this is to place a second microphone, close to the first one, but picking up sound from the right and the left of the instrument…. This is done by using a ‘figure 8’ response mic set up across the sound field.
So we have a signal with the main ‘sound’ of the guitar, and a second signal containing ‘width’ information.
Now speaking very simplistically, the first signal contains mono information; that is information from the left and the right hand side.
The second signal also contains information from both the left and the right but because one side is the front of the mic, and the other is the back, then the two signals are out of phase…. That is left minus right.
So we have ‘middle’…. Left plus right, and ‘side’… left minus right.
Another way of saying it is ‘Sum and Difference’.
I don’t need a blackboard to show how if you add together the two signals, the result is Left only…
and if you electronically deduct the second signal from the first, you are left with Right only.  And if you carry out that manoeuvre, you get a very effective stereo image.
Now doesn’t that all sound splendid…. BUT we are fooling ourselves!!
All that manipulation is actually a lie…. We have just CALLED the signals LEFT and RIGHT, they are only simple approximations; the mono signal contains the bulk of the information from the guitar direct, the ‘difference’ signal contains reflections from elsewhere in the room, but arriving back at the microphone pick-up point.
Yet the system fools our ears nicely, and recordings made that way are very effective, and much more natural than those made with a simple ‘stereo pair’, and ‘of course’ infinitely better than anything that has its stereo positioning determined by pan pots!

My ‘Single Point Monitor’ is still only a prototype and is presently the subject of a patent application, so at the time of writing I can only give the most outline of descriptions:
I have taken the normal Left and Right signals and processed them, feeding the resultant to power amplifiers which in turn drive a loudspeaker array designed to re-introduce the ‘width’ information to the listening environment, while at the same time produces a high quality reproduction of the ‘information’ channel.

As soon as the patent application is properly under way, I shall be publishing more detail of the monitor on my website, .

While there is still months of experimenting to do, the results are such that I think it’s worthwhile trying to re-educate the recording fraternity to throw away their ‘nearfields’!