Project Overview

This project is my Master final project of the MSc in Audio and Music Technology at the University of York, and it has Google as Industry Partner. I started to think about it approximately one month ago when I was assigned to this topic. However, now I’m starting to research full-time, and I will be working on it until the end of August. 

If I tell you that the formal title is: “Analysis of Context-Dependent OPUS compression for Ambisonics”, probably your reply will be something like: “Sounds cool, but… what does it mean?” Let’s start from the begging then:


Background

“Analysis of Context Dependent

In my project, I will analyse a variety of different media contexts ( immersive music, virtual reality gaming, cinematic, and teleconference style presentations)‘s spatial audio quality.

[1]

Youtube this platform’s spatial audio format only supports First Order Ambisonics (no head-locked stereo),how about we use Third Order ambisonics with low bit rate OPUS? It can give some different feeling for audience?

OPUS

Opus [2]is a lossy sound encoding format developed by the Xiph.Org Foundation and later standardized by the Internet Engineering Task Force . 

[3]

The goal is to replace Speex and Vorbis with a single format that contains sound and voice , and is suitable for low latency on the network For instant sound transmission, the standard format is defined in RFC 6716 files. The Opus format is an open format and there are no patens or restrictions on its use.

Opus integrates two sound coding technologies: speech coding-oriented SILK and low-latency CELT . Opus can seamlessly adjust the high and low bit rates . Inside the encoder, it uses linear predictive coding at a lower bit rate and transform coding at a higher bit rate (the combination of the two is also used at the junction of high and low bit rates). Opus has a very low algorithmic delay (default is 22.5 ms) , which is very suitable for coding low-latency voice calls, such as real-time voice streaming on the network, real-time synchronized voice narration, etc.

Compression

Compression[4], or “data compression,” is used to reduce the size of one or more files. When a file is compressed, it takes up less disk space than an uncompressed version and can be transferred to other systems more quickly.

for Ambisonics”

Ambisonics[5] is a method for recording, mixing and playing back three-dimensional 360-degree audio. It was invented in the 1970s but was never commercially adopted until recently with the development of the VR industry which requires 360° audio solutions.

[6]

Why I want to research ambisonic?

First, Virtual Reality’s development. The most popular Ambisonics format today, widely used in VR and 360 video, is a 4-channel format called Ambisonics B-format, which uses as few as four channels (more on which below) to reproduce a complete sphere of sound.

Second, ambisonic is different compare with binaural, surrounding sounds.

Traditional surround technologies are more immersive than simple two-channel stereo, but the principle behind them is the same: they all create an audio image by sending audio to a specific, pre-determined array of speakers. Stereo sends audio to two speakers; 5.1 surround to six; 7.1 to eight; and so on.

By contrast, Ambisonics does not send the audio signal to any particular number of speakers; it is “speaker-agnostic.” Instead, Ambisonics can be decoded to any speaker array (more on which below). Ambisonic audio represents a full, uninterrupted sphere of sound, without being restricted by the limitations of any specific playback system.

Moreover, traditional surround formats can provide good imaging when static; but as the sound field rotates, the sound tends to ‘jump’ from one speaker to another. By contrast, Ambisonics can create a smooth, stable and continuous sphere of sound, even when the audio scene rotates (as, for example, when a gamer wearing a VR headset moves her head around). This is because Ambisonics is not pre-limited to any particular speaker array,

Aim and subjects

The aim is to investigate the optimal codec parameters for immersive music, virtual reality gaming, cinematic and teleconference style presentations.

What I will do in my project?

I will do a listening test!

The listening test may include Google cardboard,I will use the cardboard to combine VR and test the spatial audio quality through subjective timbral and localisation accuracy studies with different Ambisonic orders, compression rates, and channel mappings. The evaluation will consist of headphone listening using generic head-related impulse responses (HRIRs) and measured individualised HRIRs. The results of the listening tests will be valuable in ensuring optimal compression strategies for spatial audio quality with YouTube and Google services

Here is a video to introduce google cardboard plastic.

However,

Before starting a listening test I still think several questions. After the first blog, I may answer those questions.

  • How to combine the ambisonic with contexts?
  • How to change the order in ambisonic?
  • How to design the listening test?
  • Did I need to use higher-order ambisonic?
  • How to judge the audio quality?

Conclusion

I will evaluate the ambisonic audio quality with optimal codec parameters for in different contexts (immersive music, virtual reality gaming, cinematic, and teleconference style presentations).


Reference list

[1]YouTube, n.d. Youtube Image. [image] Available at: <https://icons8.cn/icons/set/youtube&gt; [Accessed 16 May 2020].

[2] Ietf.org. (2013). Ogg Encapsulation for the Opus Audio Codec. [online] Available at: https://tools.ietf.org/html/draft-terriberry-oggopus-01 [Accessed 20 May 2020].

[3]Dr.Matt Ternoway (n.d.). OPUS Image. [Accessed 14 May 2020].

[4] “Streaming VR for Immersion: Quality Aspects of Compressed Spatial Audio.” IEEE Xplore, 1 Oct. 2017, ieeexplore.ieee.org/abstract/document/8346301?casa_token=mz7iyI5MDfkAAAAA:ilw7036rwF3fYpvT1FNt2o3XyryuWPPhHDHH58YdNeB4Pb6PwEwudKr–SQow-HkyE8KvP-x-Q. Accessed 14 May 2020.

[5]Nachbar, Christian, et al. “AMBIX -A SUGGESTED AMBISONICS FORMAT.” 2011. ieeexplore.ieee.org/abstract/document/8346301?casa_token=mz7iyI5MDfkAAAAA:ilw7036rwF3fYpvT1FNt2o3XyryuWPPhHDHH58YdNeB4Pb6PwEwudKr–SQow-HkyE8KvP-x-Q. Accessed 14 May 2020.

[6]Dr.Franz Zotter (n.d.). Ambisonics Order Image. [Accessed 14 May 2020].

Leave a comment

Design a site like this with WordPress.com
Get started