Will 3D audio make remote simultaneous interpreting a pleasure?

Now THAT’S what I want: Virtual Reality for my ears!

Apparently, simultaneous interpreters are not the only ones suffering from Zoom fatigue, i.e. the confusion and inability to physically locate a speaker using our ears can lead to a condition that’s sometimes referred to as “Zoom Fatigue”. But it looks like there is still reason to hope that, acoustically, video conferences become more of a pleasure and less of an ordeal in years to come. At least that’s what current discussions around this topic make me hope … This (four year old!) video by BBC Click illustrates quite nicely what 3D sound for headphones is about:

Binaural recording is a method of recording sound with the intent to create an immersive 3D stereo sound sensation for the listener of actually being in the room with the performers. A dummy head, aka „Kunstkopf“, is used for recording, which has two dummy ears in the shape of human ears with microphones in them so as to simulate the perception of the human ear. Sound recorded via binaural recording is intended for replay using headphones. And the good thing is: What was originally made for the gaming and movie industry is now also bound to conquer the video conferencing market.

Mathias Johannson, the CEO of Dirac, is rather confident In less than a decade, 3D audio over headsets with head-tracking capabilities will allow us to have remote meetings in which you can move about an actual room, having sidebar discussions with one colleague or another as you huddle close or step away. What is more, Johannson reckons that spatial audio could be made available to videoconference users early in 2021. Dirac wants to offer 3D audio technology to video chat platforms via an off-the-shelf software solution, so no expensive hardware would be required. Google, on the other hand, already advertises its Google Meet hardware for immersive sound. But from what we have learned six months into Covid-19, it is difficult even to persuade participants to wear a cabled headset (not to mention using an ethernet cable instead of a wifi connection), I am personally not too optimistic that expensive hardware is the way forward to high-quality remote simultaneous interpretation.

So, will such a software-based solution possibly not only provide a more immersive meeting experience but also be able to provide decent sound even without remote participants connected from their home offices having to use special equipment, i.e. headsets? I asked Prof. Jochen Steffens, who is a sound and video engineer, for his opinion. The answer was rather sobering regarding equipment: For 3D audio, a dry and clean recording is required, which at the moment is not possible using built-in microphones. Equipment made for binaural recording, however, would not really serve the purpose of simultaneous interpreting either, as the room sound would actually be more of a disturbance for interpreting. Binaural recording is rather made for recording the sound of real threedimensional sound impressions like in concert halls and the like. For video conferencing, rather than headsets, Steffens recommends using unidirectional microphones; he suggests, for example, an inexpensive cardioid large-diaphragm microphone, mounted on a table stand. And the good news: If you are too vain to wear a headset in video conferences, with decent sound input being delivered by a good microphone, any odd wireless in-ear earphones can be used for listening, or even the built-in speakers of your laptop as long as you turn them off while speaking.

But what about the spatial, immersive experience? And how will a spatial distribution of participants happen if, in fact, there is no real room to match the virtual distribution to? As Prof. Steffens explained to me, once you have a good quality sound input, people can indeed be mapped into a virtual space, e.g. around a table, rather easily. The next question would be if, in contrast to the conference participants, we as interpreters would really appreciate such a being-in-the-room experience. While this immersion could indeed allow for more situational awareness, we might prefer to always be acoustically positioned right in front of the person who is speaking instead of having a „round table“ experience. After all, speakers are best understood when they are placed in front of you and both ears get an equally strong input (the so-called cocktail party effect of selectively hearing only one voice works best with binaural input). And this would, by the way, nicely match a close front view of the person speaking.

And then, if ever video conferencing can offer us a useful immersive experience, couldn’t it even end up being more convenient than a „normal“ on-site simultaneous interpreting setting? More often than not, we are rather isolated in our booths with no more than a poor view of the meeting room from a faraway/high above/hidden-next-door position. So much so that I am starting to wonder if 3D audio (and video, for that matter) could also be used in on-site conference rooms. According to Prof. Steffens, this would be perfectly feasible by „simply“ using sound engineering software.

But then the next question arises: While simultaneous interpreters used to be „the voice in your ear“, they might now be assigned a position in the meeting space … the voice from above, from behind (like in chuchotage), or our voices could even come from where the speaker is sitting who is being interpreted at the moment. Although for this to happen, the speaker’s voice would have to be muted completely, which might not be what we want. Two voices coming from the same position would be hard to process for the brain. So the interpreter’s voice would need to find its own „place in space“ after all – suggestions are welcome!

About the author

Anja Rütten is a freelance conference interpreter for German (A), Spanish (B), English (C) and French (C) based in Düsseldorf, Germany. She has specialised in knowledge management since the mid-1990s.