Xiaojing's website

Featured

Future plan and ambisonics

In the previous blogs I posted, I have an overview of my topic, introduce technology existed, write some literature review that I’ve done, and explore my listening test. I think in the last blog, I need to conclude my plan in the Summer. Let you know more about my plans and goals soon.

If you interested in my listening test, please don’t hesitate just contact me! I will be very grateful！

Project Plan

As shown in Figure 1 Gantt Chart，I will do my project in three parts. 1. Capture Video and Audio 2.Design and Conduct Listening Test 3. Write a Report. In the three parts, many works are divided into more detailed work. In the Pre-listening test part, I will try a small group of people first. Then summarise the error in the pre-listening test. Finally, design the listening test again, and then conduct the listening test.

Other supplements

Ambisonics

I know I have already mention ambisonics before, Today, I am going to talk about ambisonics detailedly.

In the previous blog, I have already compared the surrounding sound and ambisonics. Ambisonics is designed to deliver a full sphere complete with elevation, where sounds are easily represented as coming from above and below as well as in a front or behind the user.

Now, let’s look at how does Ambisonics represents an entire 360-degree sound-field!

Let’s take a look at the most basic (and today the most widely used) Ambisonics format, the 4-channel B-format, also known as first-order Ambisonics B-format.

The four channels in first-order B-format are called W, X, Y and Z. One simplified and not entirely accurate way to describe these four channels is to say that each represents a different directionality in the 360-degree sphere: center, left-right, front-back, and up-down[2].

A more accurate explanation is that each of these four channels represents(see the picture below), in mathematical language, a different spherical harmonic component – or, in language more familiar to audio engineers, a different microphone polar pattern pointing in a specific direction, with the four being coincident (that is, conjoined at the center point of the sphere)[3]:

W is an omni-directional polar pattern, containing all sounds in the sphere, coming from all directions at equal gain and phase.
X is a figure-8 bi-directional polar pattern pointing forward.
Y is a figure-8 bi-directional polar pattern pointing to the left.
Z is a figure-8 bi-directional polar pattern pointing up.

Within these four channels, we have all the necessary information to recreate a three-dimensional sound field completely. However, as I said before, having 4 channels doesn’t mean that we need 4 speakers to play it back. We would need at least 4 speakers, but each speaker will reproduce a combination of the four channels. That’s why we need a decoder to generate the sound that each speaker has to play back.

So, I’ve explained how ambisonics works – once the sound field is already in ambisonics format – but, what about placing a mono source in the ambisonics domain? In such a case, we would need an encoder; this uses mathematical formulas to add the necessary information of our mono source to each of the four channels. The amount of information added to the channels will be always related to the position of our mono source.

In a previous post, I mention many times in ambisonic orders. Some people will ask, what is the meaning of ambisonic orders? The four-channel example that I just explained is called 1st order ambisonics, and it is the minimum order needed to obtain a 3D representation of the sound field. These four channels are the first spherical harmonics.

But, there are many more spherical harmonics that we can add. To increase the ambisonic order, a further layer of the pyramidal structure must be added each time. Therefore, 2OA has 8ch, 3OA 16ch, etc. “But… why?” The higher the order, the more channels of information there are. Each channel contributes more information about the sound field, meaning that the encoding/decoding process will be more precise.

Here is a video to compare different ambisonic orders:

Recording Ambisonics

An Ambisonics recording microphone is built of four microphone capsules encased closely together. These capsules are cardioid polar patterns, and the signals they record are usually referred to as “Ambisonics A-format.” The A-format is then transformed to B-format by a simple matrix to the WXYZ channels[6].

Future reading

if you are entirely new to the topic, reading only the information given in this post might be not enough. Ambisonics is a huge topic, and HRTFs, which I just mentioned but not explained in depth, another huge one. Therefore, here I give you some links and video that might help you better understand spatial audio and ambisonics:

Reference list

[1]Ambisonics. (n.d.). Available at: <http://www.matthiaskronlachner.com/ >[Accessed 15 May 2020].

[2]Furness, R.K. (1990). Ambisonics-An Overview. [online] http://www.aes.org. Available at:< http://www.aes.org/e-lib/browse.cfm?elib=5417> [Accessed 16 May 2020].

[3]Ambisonics Four Channels. (n.d.). Available at: <https://knightlab.northwestern.edu/assets/posts/capturing-the-soundfield/image_3.jpg> [Accessed 15 May 2020].

[4]Production Expert. (n.d.). Ambisonic Formats Explained – What Is The Difference Between A Format And B Format. [online] Available at: https://www.pro-tools-expert.com/production-expert-1/2019/12/10/ambeo-and-ambisonic-formats-a-bluffers-guide [Accessed 15 May 2020].

[5]Block-diagram-of-the-proposed-Ambisonic-decoder-for-irregular-arrays. (n.d.). Available at: <https://www.researchgate.net/profile/Yukio_Iwaya2/publication/268328126/figure/fig1/AS:295535442448389@1447472548932/Block-diagram-of-the-proposed-Ambisonic-decoder-for-irregular-arrays.png> [Accessed 15 May 2020].

[6]Creativefieldrecording.com. (2017). An Introduction to Ambisonics | Creative Field Recording. [online] Available at:< https://www.creativefieldrecording.com/2017/03/01/explorers-of-ambisonics-introduction/ >[Accessed 19 May 2020].

[7] Sound-field-Microphone-Tetrahedral-Microphone-Array. (n.d.). <Available at: https://www.creativefieldrecording.com/wp-content/uploads/ >[Accessed 15 May 2020].

Featured

Explore existing technology

Since the last post, I’ve been immersed in the first stage of the project: research and literature review. Spatial audio is not a new topic, and therefore, I wanted to analyze the existing methods that are already existed. During these days, I’ve been reviewing paper and websites,

Let’s have a look at what I found:

Technology

I found several existing encoder technologies! It can help you combine ambisonics and 360 videos!

Works Plugin + G’Audio Works Encoder (G’Audio)[1]:

Spatial audio input formats: GA5, FOA, head-locked stereo
Output destinations: Youtube, Facebook, G’Player for Gear VR

As for software, G’Audio Lab has produced a free and very intuitive spatial plugin called “Works” that integrates seamlessly with 360 videos within Pro Tools.

G’Audio Works lets you place objects and ambisonic tracks directly on a quicktime video to easily synchronize locations of sounds.
The audio tracks in your Pro Tools session will appear in Works with object-based controls such as azimuth (lateral position), elevation (height position) and distance. When moving the colorful dots on the screen, the positioning parameters are taken care of with single movements.

FB360 Encoder [3]:

Spatial audio input formats: FOA, 2OA, head-locked stereo
Output destinations: Youtube, Facebook, Oculus video

Here is an introduction video for the 360 Spatial Workstation

In the Facebook 360 Spatial Workstation official website[3], they introduce that:” the 360 spatial Workstation is a software suite for designing spatial audio for 360 video and cinematic VR. It includes plugins for popular audio workstations, a time synchronized 360 video player and utilities to help design and publish spatial audio in a variety of formats. Audio produced with the tools can be experienced on Facebook News Feed on Android and iOS devices, Chrome for desktop and the Samsung Gear VR headset through headphones.”

Here is a video to use 360 Spatial Workstation with Reaper

Spatial Media Metadata Injector (YouTube)[4]

Spatial audio input formats: FOA
Output destinations: YouTube

Here is an instructions about how to use spatial audio in 360-degree and VR videos in YouTube: https://support.google.com/youtube/answer/6395969?hl=en

We can choose Reaper for the DAW tool in the YouTube platform, to combine 360-degree and VR videos and ambisonics.

Here is a good example teach you how to upload videos with spatial audio in YouTube:

The tools of uploading

360° Ambisonics Tools in Waves[5]:

The Waves B360 plugin can help us to convert stereo and surround to Ambisonics B-format, mix B-format audio, and monitor it in high fidelity on our regular stereo headphones. It has mono, stereo, 5.1 and 7.1 components that encode the input onto B-format, with controls which allow people to position (pan) each element in the sound-field.

The Waves company also create:

Nx Virtual Mix Room: Use this plugin to monitor your B-format mix on any pair of standard stereo headphones, with professional audio quality that doesn’t color your sound. Simply insert the plugin’s Nx Ambisonics component on the buss you wish to monitor, and send it to your regular headphones.

Nx Head Tracker: This small Bluetooth device enhances the 360° realism of the Nx plugin by tracking your head movements with full precision.

Here is a video to introduce the plugins of waves:

Media requirements

Due to my project almost for the YouTube, so I founded the minimum requirements for spatial audio in YouTube.

The website[4] quote that:

“Metadata is added to your file.
- Either use the metadata tool or your own post-production tools that meet the YouTube spec
Only one audio track is used.
- Multiple audio tracks, such as tracks with spatial and stereo/mono audio in the same file, are not supported
Spatial audio uses Ambisonics (AmbiX) format:
- ACN channel ordering
- SN3D normalization
Supported First Order Ambisonics (FOA) formats:
- W, Y, Z, X as a 4-channel audio track in your uploaded file, sample rate: 48 kHz
- PCM encoded audio in a MOV container:
- AAC encoded audio in a MP4/MOV container:
  - Min. bitrate: 256 kbps
- OPUS encoded audio in an MP4 container:
  - Channel mapping family: 2
  - Min. bitrate 512 kbps
Supported First Order Ambisonics (FOA) with Head-Locked Stereo format:
- W, Y, Z, X, L, R as a 6-channel audio track in your uploaded file, sample rate: 48 kHz
- PCM encoded audio in a MOV container:
  - Sample rate: 48 kHz
- OPUS encoded audio in an MP4 container:
  - Min. bitrate 768 kbps
  - Channel mapping family: 2.”

Reading

I read the ‘Auditory spatial perception：auditory localization‘ [6]this week. I think this book has made me understand the development of localization overall, and laid the foundation for the listening test I will do later.

The elements may affect localization listening test:

1.Monaural spectral cues

2.Head movement

3.Tested visual effect

4.Sound Onset Precedence Effect

5.Other factors (hearing loss of the tested person, age ( the mixed effects of age-related hearing loss), gender, this book says that it is more difficult for women to locate the sound source in a noisy environment

I also read ‘Auditory Localisation in Low-Bitrate Compressed Ambisonic Scenes’[7]. In the paper, This author describes localisation decided by OPUS bit-rate to some degree. The accuracy of localisation will decline following low-bit-rate compression. Moreover, the accuracy of localisation decided by Ambisonic order because higher ambisonic order has more accurate localisation resolution.

In the Media part, we have questions with higher order Ambisonics.

After reading, we can understand the higher order Ambisonics can give higher accuracy of localization.

And my Approach?

I’m also going to use some system or encoders to control spatial audio, but my approach will be slightly different. In my method, I am going to explore the different order ambisonic localisation/audio quality in different contexts, such as VR gaming, immersive music)

This week, I will meet my supervisor to talk about how to achieve the decoder the audio and combine with different contexts. Next week, I’m going to write a post sharing my thoughts in listening test. If you have any questions do not hesitate just comment or email me!!

Conclusion:

After this extensive review, (not just what I posted here) I realized that the spatial audio is widely researched by lots of companies and University research groups. Now, the Youtube platform only uses the First-order in ambisonic. But in the research of Rudzki, Tomasz [7] has found the relationship between ambisonic order, OPUS bit-rate, and localisation accuracy. I may consider combining different ambisonic orders with different topic videos (VR game, VR presentation…) in the after weeks.

Reference list

[1]Gaudio Lab. (n.d.). WORKS Plugin. [online] Available at: <https://gaudiolab.com/tech-works-plugin/ >[Accessed 16 May 2020].

[2]Gaudiolab (n.d.). Works spatial plugin. Available at:< https://gaudiolab.com/ >[Accessed 6 May 2020].

[3]Fb.com. (2019b). Spatial Workstation. [online] Available at: https://facebook360.fb.com/spatial-workstation/.

[4]support.google.com. (n.d.). Use spatial audio in 360-degree and VR videos – YouTube Help. [online] Available at: <https://support.google.com/youtube/answer/6395969?hl=en> [Accessed 16 May 2020].

[5]waves.com. (2020). 360° Ambisonics Tools. [online] Available at: <https://www.waves.com/hardware/360-ambisonics-tools> [Accessed 16 May 2020].

[6]Letowski, Tomasz R., and Szymon T. Letowski. “Auditory Spatial Perception: Auditory Localization.” Apps.Dtic.Mil, 1 May 2012, apps.dtic.mil/docs/citations/ADA563540. Accessed 12 May 2020.

[7]Rudzki, Tomasz, et al. “Auditory Localization in Low-Bitrate Compressed Ambisonic Scenes.” Applied Sciences, vol. 9, no. 13, 28 June 2019, p. 2618, 10.3390/app9132618. Accessed 12 May 2020.

‌

Featured

Project Overview

This project is my Master final project of the MSc in Audio and Music Technology at the University of York, and it has Google as Industry Partner. I started to think about it approximately one month ago when I was assigned to this topic. However, now I’m starting to research full-time, and I will be working on it until the end of August.

If I tell you that the formal title is: “Analysis of Context-Dependent OPUS compression for Ambisonics”, probably your reply will be something like: “Sounds cool, but… what does it mean?” Let’s start from the begging then:

Background

“Analysis of Context Dependent…

In my project, I will analyse a variety of different media contexts ( immersive music, virtual reality gaming, cinematic, and teleconference style presentations)‘s spatial audio quality.

Youtube this platform’s spatial audio format only supports First Order Ambisonics (no head-locked stereo)，how about we use Third Order ambisonics with low bit rate OPUS? It can give some different feeling for audience？

…OPUS…

Opus [2]is a lossy sound encoding format developed by the Xiph.Org Foundation and later standardized by the Internet Engineering Task Force .

The goal is to replace Speex and Vorbis with a single format that contains sound and voice , and is suitable for low latency on the network For instant sound transmission, the standard format is defined in RFC 6716 files. The Opus format is an open format and there are no patens or restrictions on its use.

Opus integrates two sound coding technologies: speech coding-oriented SILK and low-latency CELT . Opus can seamlessly adjust the high and low bit rates . Inside the encoder, it uses linear predictive coding at a lower bit rate and transform coding at a higher bit rate (the combination of the two is also used at the junction of high and low bit rates). Opus has a very low algorithmic delay (default is 22.5 ms) , which is very suitable for coding low-latency voice calls, such as real-time voice streaming on the network, real-time synchronized voice narration, etc.

Compression

Compression[4], or “data compression,” is used to reduce the size of one or more files. When a file is compressed, it takes up less disk space than an uncompressed version and can be transferred to other systems more quickly.

…for Ambisonics”

Ambisonics[5] is a method for recording, mixing and playing back three-dimensional 360-degree audio. It was invented in the 1970s but was never commercially adopted until recently with the development of the VR industry which requires 360° audio solutions.

Why I want to research ambisonic?

First, Virtual Reality’s development. The most popular Ambisonics format today, widely used in VR and 360 video, is a 4-channel format called Ambisonics B-format, which uses as few as four channels (more on which below) to reproduce a complete sphere of sound.

Second, ambisonic is different compare with binaural, surrounding sounds.

Traditional surround technologies are more immersive than simple two-channel stereo, but the principle behind them is the same: they all create an audio image by sending audio to a specific, pre-determined array of speakers. Stereo sends audio to two speakers; 5.1 surround to six; 7.1 to eight; and so on.

By contrast, Ambisonics does not send the audio signal to any particular number of speakers; it is “speaker-agnostic.” Instead, Ambisonics can be decoded to any speaker array (more on which below). Ambisonic audio represents a full, uninterrupted sphere of sound, without being restricted by the limitations of any specific playback system.

Moreover, traditional surround formats can provide good imaging when static; but as the sound field rotates, the sound tends to ‘jump’ from one speaker to another. By contrast, Ambisonics can create a smooth, stable and continuous sphere of sound, even when the audio scene rotates (as, for example, when a gamer wearing a VR headset moves her head around). This is because Ambisonics is not pre-limited to any particular speaker array,

Aim and subjects

The aim is to investigate the optimal codec parameters for immersive music, virtual reality gaming, cinematic and teleconference style presentations.

What I will do in my project?

I will do a listening test！

The listening test may include Google cardboard，I will use the cardboard to combine VR and test the spatial audio quality through subjective timbral and localisation accuracy studies with different Ambisonic orders, compression rates, and channel mappings. The evaluation will consist of headphone listening using generic head-related impulse responses (HRIRs) and measured individualised HRIRs. The results of the listening tests will be valuable in ensuring optimal compression strategies for spatial audio quality with YouTube and Google services

Here is a video to introduce google cardboard plastic.

However,

Before starting a listening test I still think several questions. After the first blog, I may answer those questions.

How to combine the ambisonic with contexts?
How to change the order in ambisonic?
How to design the listening test?
Did I need to use higher-order ambisonic?
How to judge the audio quality?

Conclusion

I will evaluate the ambisonic audio quality with optimal codec parameters for in different contexts (immersive music, virtual reality gaming, cinematic, and teleconference style presentations).

Reference list

[1]YouTube, n.d. Youtube Image. [image] Available at: <https://icons8.cn/icons/set/youtube> [Accessed 16 May 2020].

[2] Ietf.org. (2013). Ogg Encapsulation for the Opus Audio Codec. [online] Available at: https://tools.ietf.org/html/draft-terriberry-oggopus-01 [Accessed 20 May 2020].

[3]Dr.Matt Ternoway (n.d.). OPUS Image. [Accessed 14 May 2020].

[4] “Streaming VR for Immersion: Quality Aspects of Compressed Spatial Audio.” IEEE Xplore, 1 Oct. 2017, ieeexplore.ieee.org/abstract/document/8346301?casa_token=mz7iyI5MDfkAAAAA:ilw7036rwF3fYpvT1FNt2o3XyryuWPPhHDHH58YdNeB4Pb6PwEwudKr–SQow-HkyE8KvP-x-Q. Accessed 14 May 2020.

[5]Nachbar, Christian, et al. “AMBIX -A SUGGESTED AMBISONICS FORMAT.” 2011. ieeexplore.ieee.org/abstract/document/8346301?casa_token=mz7iyI5MDfkAAAAA:ilw7036rwF3fYpvT1FNt2o3XyryuWPPhHDHH58YdNeB4Pb6PwEwudKr–SQow-HkyE8KvP-x-Q. Accessed 14 May 2020.

[6]Dr.Franz Zotter (n.d.). Ambisonics Order Image. [Accessed 14 May 2020].

‌

User perference: masking ratio

Reverb: Room Impulse Response

Hi There, I also created another plugin in this Summer. Before, I measured the room impulse response(RIR) with my classmate when I studied my Master degree. Moreover, I also convolved the RIR with audio through MatLab.

In this project, the RIR can be used for a plugin through JUCE platform. The purpose of this project is to create reverb effect.

As the graph shown on below,

Study Note!

Due to I don’t have very strong background in technology part.

I can’t wait to absorb new knowledges from the internet sources!

Here is some notes when I learn recently!

1.Data Structure. The data structure is my weakness!

2.Web Audio API

3.JavaScript

4.Python

5.AlgoExpert

Listening Test and work process

In the previous posts, I mentioned that I need to design a listening test for evaluating spatial audio quality. Some elements may affect localisation in the listening test. Today, I am going to introduce some technologies that may use in my listening test and the word process so far.

Several technologies may use in the listening test

OptiTrack:

OptiTrack [1] creates a software name is Motivate can provide an optical motion tracking system that enables tracking multiple people in much larger spaces for a collaborative VR experience. Actually, I may use this software in my listening test to detect localisation effect if the Covid-19 did not appear…..

Max:

Max can create multiply channels for ambisonic. Max can be the Listening test software for loudspeaker presentation was created using the visual audio programming environment Max. Here is a practicing video:

Reaper :

The Reaper is a DAW (digital audio production application for computers)software, it can combine Listening test software for the audio engine.

It can also can encoder ambisonics through the plugins, such as ATK, Audio Monitor VST with Ambix_rotator_o1 VST.….

If you have interested in the ambisonics mixing in Reaper, you can click here

VR Videos:

Consider the listening test’s purpose(evaluate audio quality in different contexts), I need to find some VR videos. In the beginning, I consider creating VR videos. However, I realize it is impossible because the project only has 11 weeks. I have to manage time properly, There is pointless wasting time in unfamiliar fields. I will put more time into Audio parts.
I will talk with my supervisor, may the industry partner can offer some videos. Moreover, I have already found some VR videos and contact their author. I hope I can get the Videos to use soon.

Special situation

Unfortunately, this year is very special due to Covid-19. My final project may stay home to complete, a lot of tools like (Neumann KU100) can not be used… So, I may use Reaper and send tester my project(combine ambisonics with video).

Reference list

[1]OptiTrack. (2019). Motion Capture Systems. [online] Available at: <https://optitrack.com/ >[Accessed 16 May 2020].

[2]OptiTrack (n.d.). Motivate software. Available at: <https://www.optitrack.com/software/> [Accessed 16 May 2020d].

[3]Matthiaskronlachner. (n.d.). ambiX v0.2.10 – Ambisonic plug-. Available at: http://www.matthiaskronlachner.com/?p=2015 [Accessed 16 May 2020d].

‌