New directions In Spatial Recognition
Ted Fletcher
TFPro, airSOUND, orbitSOUND
A few days ago I was listening to a speaker at a conference, where the PA system was suffering from a low level hum.... Actually a nasty, typically raspy sort of ground loop problem, but very low level, in fact, not enough to interfere with the performance of the speaker. And yet I found that there was a problem with intelligibility; listening carefully and thinking about it, the low level buzz was making it difficult to interpret the words.
Now 50 years ago Ray Dolby wrote lots about masking effects, and how coherent sounds were capable of masking background noise.... This, of course, is the foundation of the Dolby noise reduction system, and here was an example of something quite the opposite, and yet potentially just as important in sound reproduction.
It is definitely true that the presence of a continuous yet non-contiguous and non-coherent sound can interfere with recognition in the brain, and that interfering sound does not have to be very loud at all!
This set me thinking about related events and how this affects recording and reproduction, and one of the 'truisms' of successful record production became much clearer..... 'Music with holes in it is more exciting'.
Putting all the thinking together it becomes obvious, the more cluttered a sound gets, the less 'pleasant' it is; it's the presence of 'noises' where there could be gaps that muddles and produces listening fatigue in a minor way, but enough to 'not like' the music.
I believe that there is a great connection between intelligibility and musical acceptance; if something is unintelligible or garbled, then it is not pleasant, and the converse is true.
It's fascinating that this effect was recognised by Joe Meek in 1962. He produced numerous successful records and a common factor was extreme dynamics in the intros; enough 'gaps' to make the brain see it as momentary silences, and so in some way 'clearing' the brain. The records became instantly likeable no matter what followed!!
(OK, Telstar is the exception that proves the rule! That one works on another level).
We used to think that spectral balance was the most important factor, now I'm sure that it's less to do with spectra and more to do with dynamics.
And you could argue that Joe Meek was a real lunatic in his overuse of the compressor... so how could there be 'holes' in his music? Now that's interesting too; by overusing the compressor the way he did, the effect was to allow a transient through (the attack times were very slow), then the overloaded sidechain would act with typical over-compression, momentarily shutting everything down... hence huge dynamics, created by the device that in theory reduced dynamics!
Now, it was about that time, in the mid 60s, that so called stereo recording became the norm, and everyone here who is not as old as me will have grown up in a world where 2 speaker listening is the only way.
The original thinking behind it is simple and seems logical; we have 2 ears so if we make recordings from 2 places and play them back so that they come from 2 speakers, then the result will be spatial sound... or 'stereo'.
If sound worked simply than this might be fine, but sound is not simple, and neither is our perception of it.
Some simple physical facts are that low frequencies are basically non directional, as frequency rises, so the sound becomes more directional, but above say 4-6KHz, sounds interact with their own reflections and become less defined again.
Just to add to the confusion, our ears are ferociously complex...... we are taught about hearing sensitivity, the frequency range of about 40 to 14KHz, the Fletcher-Munson hearing curves, and that's all good stuff, but only a small part of the story.
Sound appearing at our ears is first of all modified by tiny reflections around the 'pinna', the outer ear, it then goes through a sort of 'impedance changer' converting the sound from operating in air to operating in a fluid, that's the tiny bone structures in the middle ear, then the sound excites tiny hairs in the 'cochlea', the inner ear. The hairs are differing lengths depending on position, fixed to nerve cells at one end, and 'fire' impulses to the brain, which has learned to make sense of this biological data-stream.... And even that is a highly simplistic description of what's going on!
But the point I'm making is that the physical 'ear' is actually just a pick-up device, and 'hearing' is more to do with what happens inside our heads; it's the interpretation of those nerve signals, not just from our ears, but also from all over our bodies.
In a perfectly quiet environment we really can hear a pin drop at ten paces, and equally, we can enjoy sounds more than a million times louder than that, and even if the multiple is approaching a thousand million, our ears will still work (for a short while!)
But even more extraordinary is the ability of this 'ear-brain' combination to sort, recognise and appreciate incredibly fine differences in sound, position and direction, distance, and even height. These are abilities that encompass so many variables that we are out of the scientific field completely and we are into art and emotion.
Since those days in the mid 1960s, all our ears have been conditioned to listening to various types of 'stereo' and in professional audio we have been developing and improving loudspeaker systems, all based around producing the sound from two (or more) locations.
The assumption all the time has been that multi-speaker arrays is the right way to listen to 'spatial' and 'stereo' sound.
WHAT IS 'STEREO' REALLY?
It's quite clear that listeners have diverse expectations about what they consider as 'stereo', and surprisingly, diverse views as to what 'stereo really is.
The real answer, as is so often the case in audio, is that there are as many answers as there are listeners!
The whole subject has become distorted by the vast number of 'stereo' recordings that succeed in so many different ways, from the extremes of what I prefer to call '2-speaker mono', where location and direction are all important, to the other extreme where direction is totally subservient to 'space' and depth.
BLUMLEIN.
Alan Blumlein came to the subject of spatial sound reproduction from an odd direction; he thought that they could achieve more realism in the cinema if the sound followed the action. His initial ideas involved separate recordings and multiple loudspeakers, he experimented with microphones set up at 90 degrees to one another, and the technique was so successful that to this day it's called the 'Blumlein pair'.
But to backtrack just a little, Blumlein was working for HMV in the late 1920s as a junior engineer, and he quickly made a name for himself by introducing improvements in disk cutting technology. (HMV became EMI in 1931) But in the very early 1930s, his interest turned to directional audio and his experiments in 'binaural'.
(play 'Blumlein1')
His experiments quickly led him to the simplistic theories of 'sum and difference', and listening to the results of recordings made that way on two or three loudspeakers.
The recording technique should be familiar...... A cardioid microphone faces the performing ensemble. A 'figure-8' microphone is placed close to the cardioid with its axis at right-angles.
The cardioid microphone picks up the whole ensemble, that is left and right (L + R).
The figure-8 microphone pics up left (L) from the left and an inverted right (-R) from the right, because the other side of a figure-8 microphone is in opposite polarity. So the signal from the mic is (L-R).
If the signals from the two mics are added together you get (L+R) + (L-R) which ends up as L (the rights cancel out) and if you subtract the figure-8 mic from the cardioid, you get (L+R) - (L-R) which results in R (the lefts cancel out), and that's all there is to 'sum and difference'......... well.... Almost, but a little more of that later!
STAND BACK AND LISTEN
It's interesting and slightly sad that after Blumlein's death in 1942, EMI considered that 'binaural' sound was not worth pursuing, and a great mass of work by Blumlein and Trott was shelved, only to be revived by commercial pressures from the Americans in the very late 1950s.
When stereo recording started seriously in the 1960s, we used 5-way switches on the channels in the mixer so that we had the ability to 'place' the audio signal L, L-centre, Centre, R-centre, and R. Listening to the results on studio monitors set up 1.6 metres apart, the results were impressive, and little thought was given to anything beyond that simple directional location information.
However, at the same time we engineers were experimenting with time delays and phasing and often produced unpredictable directional effects, but the equipment was primitive and frankly, we did not have the time to look deeper into it.
I carried out some more serious work on directional sensing in 1975 at my brother's studio. We had a newly invented 'digital delay' module and I set up some experiments using recorded speech and introducing time delays in one side.... Of course this is 'old hat' nowadays, and obvious, but in 1975 it was novel. We found that to a person sitting in the classic 'sweet spot', we could manipulate the voice to come from almost anywhere, even from beyond the loudspeakers.
Later more careful work showed that timing directional clues were most effective in the mid range, no surprise there, and that under the right conditions they were much more pervasive than the simple 'pan-pot' (that I had invented... along with others I'm sure, back in 1965/6).
But recording techniques has started to become fixed by this time; the big developments were in tape machines where up to 32 tracks became common. Mixing consoles were sophisticated, but still catered for 'stereo' by resistive panning between buses.
The thinking classical recordists had always looked warily at the 'pan-pot', and I had it pointed out to me quite forcibly that such techniques were simplistic and false.....
A lot of engineers, particularly in the big studios, notably Decca and EMI experimented with 'stereo' mic arrangements using crossed pairs and variations on 'sum and difference' and 'Blumlein' pairs. But of course they were all forced to work with the conventional 2-speaker 'stereo' monitoring.... with varying success.
Over the years, recording techniques have become less polarised and many 'popular' orchestral recordings are produced using a combination of 'spot' microphones and 'stereo' pairs, and to a very large extent, the results are big and spacious when listened to on a properly set up replay system.... And if your head is nailed in the exact sweet spot!
But to return to 'sum and difference'....
That simplistic maths mentioned before is trotted out to engineers time after time, and the reality is, that it's a poor approximation......
WHAT IS 'STEREO' REALLY?
It's quite clear that listeners have diverse expectations about what they consider as 'stereo', and surprisingly, diverse views as to what 'stereo really is.
The real answer, as is so often the case in audio, is that there are as many answers as there are listeners!
The whole subject has become distorted by the vast number of 'stereo' recordings that succeed in so many different ways, from the extremes of what I prefer to call '2-speaker mono', where location and direction are all important, to the other extreme where direction is totally subservient to 'space' and depth.
BLUMLEIN.
Alan Blumlein came to the subject of spatial sound reproduction from an odd direction; he thought that they could achieve more realism in the cinema if the sound followed the action. His initial ideas involved separate recordings and multiple loudspeakers, he experimented with microphones set up at 90 degrees to one another, and the technique was so successful that to this day it's called the 'Blumlein pair'.
But to backtrack just a little, Blumlein was working for HMV in the late 1920s as a junior engineer, and he quickly made a name for himself by introducing improvements in disk cutting technology. (HMV became EMI in 1931) But in the very early 1930s, his interest turned to directional audio and his experiments in 'binaural'.
(play 'Blumlein1')
His experiments quickly led him to the simplistic theories of 'sum and difference', and listening to the results of recordings made that way on two or three loudspeakers.
The recording technique should be familiar...... A cardioid microphone faces the performing ensemble. A 'figure-8' microphone is placed close to the cardioid with its axis at right-angles.
The cardioid microphone picks up the whole ensemble, that is left and right (L + R).
The figure-8 microphone pics up left (L) from the left and an inverted right (-R) from the right, because the other side of a figure-8 microphone is in opposite polarity. So the signal from the mic is (L-R).
If the signals from the two mics are added together you get (L+R) + (L-R) which ends up as L (the rights cancel out) and if you subtract the figure-8 mic from the cardioid, you get (L+R) - (L-R) which results in R (the lefts cancel out), and that's all there is to 'sum and difference'......... well.... Almost, but a little more of that later!
STAND BACK AND LISTEN
It's interesting and slightly sad that after Blumlein's death in 1942, EMI considered that 'binaural' sound was not worth pursuing, and a great mass of work by Blumlein and Trott was shelved, only to be revived by commercial pressures from the Americans in the very late 1950s.
When stereo recording started seriously in the 1960s, we used 5-way switches on the channels in the mixer so that we had the ability to 'place' the audio signal L, L-centre, Centre, R-centre, and R. Listening to the results on studio monitors set up 1.6 metres apart, the results were impressive, and little thought was given to anything beyond that simple directional location information.
However, at the same time we engineers were experimenting with time delays and phasing and often produced unpredictable directional effects, but the equipment was primitive and frankly, we did not have the time to look deeper into it.
I carried out some more serious work on directional sensing in 1975 at my brother's studio. We had a newly invented 'digital delay' module and I set up some experiments using recorded speech and introducing time delays in one side.... Of course this is 'old hat' nowadays, and obvious, but in 1975 it was novel. We found that to a person sitting in the classic 'sweet spot', we could manipulate the voice to come from almost anywhere, even from beyond the loudspeakers.
Later more careful work showed that timing directional clues were most effective in the mid range, no surprise there, and that under the right conditions they were much more pervasive than the simple 'pan-pot' (that I had invented... along with others I'm sure, back in 1965/6).
But recording techniques has started to become fixed by this time; the big developments were in tape machines where up to 32 tracks became common. Mixing consoles were sophisticated, but still catered for 'stereo' by resistive panning between buses.
The thinking classical recordists had always looked warily at the 'pan-pot', and I had it pointed out to me quite forcibly that such techniques were simplistic and false.....
A lot of engineers, particularly in the big studios, notably Decca and EMI experimented with 'stereo' mic arrangements using crossed pairs and variations on 'sum and difference' and 'Blumlein' pairs. But of course they were all forced to work with the conventional 2-speaker 'stereo' monitoring.... with varying success.
Over the years, recording techniques have become less polarised and many 'popular' orchestral recordings are produced using a combination of 'spot' microphones and 'stereo' pairs, and to a very large extent, the results are big and spacious when listened to on a properly set up replay system.... And if your head is nailed in the exact sweet spot!
But to return to 'sum and difference'....
That simplistic maths mentioned before is trotted out to engineers time after time, and the reality is, that it's a poor approximation......
Figure 1
And it's easy to see why; in this simple diagram the cardioid microphone is picking up the whole of the 'choir'. Previously in that over-simplified maths, this mic is picking up left plus right, so far so good. The 'figure-8' microphone, while it's true that is picking up L and R the same as the cardioid mic, it's picking up a whole lot of other sounds as well, and even considering this very simple sketch, the arithmetic doesn't stand up.... The derived 'L' (mic1 +mic2) would be (L+R) + (fL + SPLeft - (fR + SPRight)) and that's just for starters with an over-simplified diagram. In the real world we are dealing with a much more complex difference between the 'L' received by Mic1 and the 'L' received by Mic2.... And here we are looking at it in 2 dimensions only!
What I'm getting at is that while M/S or 'Sum and Difference' recording is very elegant, and it works, its conversion to L/R and its replay on a 2-speaker system is a poor approximation.
However, let's not lose sight of the idea of M/S recording because these simple manipulations can be extremely useful even if they are not perfect..... after all they are the backbone of all 'ambisonic', 'tetrahedral', 'surround sound' and other wonder systems.
AirSOUND
The original idea for airSOUND came from a time in 2003 when I was planning a recording of some voices, and in one of those 'small-hours' moments, dreamed of hearing the playbacks on a loudspeaker system that mirrored the M/S microphones that I was planning to use.
The idea felt feasible, and the next day I made a loudspeaker array out of cardboard tubes and connected it up via a 'birds nest' of cables and amplifiers.... And it produced a sort of stereo from a single array.
To cut a long story short, Eric (my business partner) and myself experimented and researched for many months and came up with a range of prototype loudspeakers varying from those as small as a mobile phone, up to a system that would serve as a serious PA.
And what is an airSOUND system?
It is a combination of loudspeakers consisting of a monopole loudspeaker reproducing a 'main' sound directly towards the listening area, and a dipole loudspeaker, or pair of loudspeakers reproducing 'spatial' sound to the right and left. It is effectively a single 'point' source producing sound that to the ears of the listener, is spatial and stereophonic, with both depth and positioning.
Now for an heretical statement:......
I am convinced that pure directional information in sound reproduction is of very little value at all.
My reason for making that statement is the result of 40+ years of listening to sound critically as a part of the process of both making recordings, and designing sound recording and reproducing equipment. Many times over those years I have listened to performances and demonstrations of stereo and all the derivatives, including tetrahedral, quad, 7.1 and lots of others, and apart from the 'gee-whizz' demos where a train runs across the stage or an obvious soloist jumps up on the right, the actual 'directional' information is hardly relevant, and in most cases its accuracy is very much 'Emperor's new clothes'!
In any case we must be pragmatic; we must be honest about the times we live in..... most entertainment has a video dominant content. Even pure concerts are made to be visually beautiful and exciting.
We sit and watch a large LED or plasma screen and expect to hear beautiful spacious sound while we watch about 4 degrees in front of us........ think about that and the whole concept of surround sound with multiple loudspeakers becomes, in my opinion laughable.
The ideas behind surround sound are all involved with the concept of immersion in the experience; we need to be involved with the performance, so therefore the sound should come from all around..... WHY? We can get intellectually and emotionally involved in something on television where all the sensory input comes from the front, and doesn't sensory input come from the front in real life?
My view is that most directional input comes from visual stimulus, reinforced by audio clues..... more important than pure directional clues is the sense of 'space', the environment in which the sound sits, and this is where airSOUND comes into its own.
An airSOUND reproduction system has the ability to allow listeners to hear the same, and correct spatial environment from anywhere in the room.
Any form of multi-speaker array suffers from an insurmountable problem; while momentary 'involvement' is possible for a listener bolted to the 'sweet spot', ANY movement away from the spot will destroy the image, and the more loudspeakers, the worse the problem!
This is a truism; it is an uncomfortable fact that cannot be faced by designers.
FATIGUE
An equally important superiority of airSOUND over any other system is the ease of listening.....
When a listener sits listening to a conventional 2-speaker system, unless his head is bolted in place, he will move slightly off axis. This movement upsets the image created in the brain making the brain work harder, leading to 'listening fatigue'; in the case of surround and 5.1 systems the situation is far worse, with fatigue being normal as the brain fights to maintain a spatial image.
With airSOUND the image is naturally produced in the environment and fatigue is minimised. This is particularly noticeable when watching a feature film.... The images and sounds remain fresh and vibrant to the end!
Early in the development of airSOUND, I was asked several times if the system could be expanded to provide surround sound. Theoretically the answer is yes, and I built systems making use of existing 5.1 recordings to demonstrate the fact, and I was surprised at first how listeners to the demos failed to get excited over the extended systems. The reason was that the basic single airSOUND array provided a very satisfactory sound system for feature films, and there was no significant advantage to adding a further spatial image at the rear!
AIRSOUND AND RECORD PRODUCTION
But this talk is about aspects of record production, and immediately we had some prototype airSOUND systems working, I was sure that this way of listening to sound would become important to the way records are made. More recently I realised that this is even more true than I thought....
95% at least of loudspeaker based systems at the moment are incapable of reproducing stereo sound; in fact, at least 70% of those are systems with loudspeakers so close together that no listener can resolve the image. Just the odd few are set up in peoples' sitting rooms with a chair strategically placed equidistant between the loudspeakers, and the owner can enjoy his music. Everyone else hears some sort of mashed mono, complete with timing distortions and cancellations.
So today's record producer has a thankless job; he works hard to produce a product that is commercially and aesthetically right, and will sound right on the studio speakers, the nearfield monitors and on a car radio, but at the back of his mind he knows that most people will hear it on inferior headphones, ghastly hand-held devices or ghetto-blasters.
Well, we have made a start to change that!
Now, this is not meant to be a sales pitch by any means.... But my Orbitsound Company has products on the market now that demonstrate airSOUND and are major improvements over conventional systems of similar price.
The commercial 'sell' is quite a soft one, but we are finding that a large percentage of buyers realise the level of improvement quickly, and appreciate the clarity and 'space' of the sound.
Now that we can see the direction and success of the sales, I can foresee a new dimension for record producers; there will be significant numbers of airSOUND systems in the next year or so, in fact I'm certain that these will become the majority over 2-speaker systems.
Why?
Listen to a modern pop record on a T12 and you can hear why..... The first notable thing is the way a central voice sounds.... It sounds clean and undistorted, and positioned well forward... and it sounds exactly the same from any listening position.
Then notice the balance and depth of the rest of the sound (of course this is dependent on the producer!) and in many cases there will be subtleties that were inaudible on a 2-speaker system.
In the studio, 'research has shown' :) that monitoring on a powerful airSOUND system, the engineer can achieve a good voice sound very much quicker and with less listening fatigue than with conventional systems..... I'm not advocating the complete replacement of 2-speaker monitoring, after all, 2-speaker monitoring works fine if the conditions are perfect; it's just that airSOUND systems sound right all the time!