Mobile Audio Processing and Memory Bus Bandwidth and Load

Gabor Szanto

A good way to understand the fundamental performance constraints on mobile is to reframe the mobile SoC hardware as a set of restaurants who coordinate their work to cook food (compute certain tasks) with raw ingredients (video and audio data).

It’s simple, we promise.

The CPU makes spaghetti.

The GPU makes cupcakes.

The audio chip makes lemonade.

The RAM is the storehouse for raw ingredients.

And the memory bus connects them all together.

NOTE: On the memory bus, the ingredients are transported between the CPU, GPU, audio chip and RAM on moving walkways rather than on roads with cars.

And these moving walkways never turn off, and never waver in speed, which we’ll see below holds ramifications in the quest for increasing the all-important 60 FPS.

Recipes are Algorithms

In order to produce finished dishes, the CPU restaurant, has to send multiple data requests in a strictly specified sequence. These sequences are recipes (algorithms).

For the CPU to make a spaghetti, it asks RAM for tomatoes first, then pasta, then salt, once they have been delivered, it combines those three and then it sends one last request to RAM for a delivery of hot water. The CPU will always make spaghetti in that sequence --- and as it waits for the ingredients to arrive in the specified order, the CPU will kick off other orders in parallel too.

The CPU won’t bake cupcakes from start to finish though. If the CPU gets an order from cupcakes, it starts putting the ingredients together, sends the wet dough to the GPU, the GPU finishes baking the cupcake and sends it straight to the customer (the device’s screen).

Both the CPU and the GPU are fast and efficient at combining ingredients and producing spaghetti and cupcakes...but only after the memory bus has delivered the ingredients.

Von Neumann Bottlenecking Computing Capacity

There are three fundamental problems with the moving walkways that slow down spaghetti and cupcake production.

1. The moving walkway speed is too low. We know this is the case because the CPU and GPU can manufacture spaghetti and cupcakes faster than the moving walkway can deliver the ingredients. Often, the CPU and the GPU wait unoccupied for ingredients to arrive instead of actually cooking any food. The unoccupied time is wasted clock cycles where work isn’t, could be, completed.

2. The moving walkway could be wider. Okay, so the speed is fixed but if there were more lanes or wider lanes, even at the same speed, the moving walkway capacity would increase and there would be less idle time where the CPU and GPU don’t do anything.

3. Inefficient use of slots. And a lot of these slots aren’t even carrying ingredients for either spaghetti or cupcakes, but instead carry the ingredients for lemonade.

Remember: the lemonade is the audio.

But why would lemonade be a wasteful, inefficient use of capacity?

After all, audio is paramount to media and game, app and VR user experiences.

Because the way the sugar, water and lemons are combined to make lemonade isn’t actually the optimal way to combine them. The current method has too many redundant steps, and our audio-lemonade (sugar, water and lemons) is trivial to make once it arrives at the CPU restaurant.

Lastly, the volume of audio data, slots full of sugar, water and lemons on the moving walkway slow the production of everything else, the stuff that most people come for ie visuals, video and graphics.

Few developers even consider CPU load, and most don’t realize that the real processing and computing bottlenecks, are not at the CPU, but at the memory bus.

[Expert readers will recognize that a generalized version of this problem is known as the Von Neumann Bottleneck.]

How the Memory Bus Bandwidth, not the CPU, Limits Mobile App Performance

As we pointed out in how 3D Audio Bottlenecks VR Video, inefficient audio processing noticeably reduces the performance of 3D graphics engines, reducing framerate, increasing device temperature and making for a frustrating experience for both developers and consumers alike.

Again, to boost spaghetti and cupcake production capacity, we cannot add lanes to the moving walkway and we cannot make the memory bus go faster. Since cannot fight physics, those two are fixed - what can we do?

Better Hardware vs Better Algorithms

The way the the human genome was sequenced provides an answer. We didn’t get there with faster hardware; rather, pioneers discovered new algorithms, and a technique called Shotgun Sequencing that catapulted Craig Ventner to the forefront of genomics.

Closer to home, John Carmack did the same thing with first-person shooters when he figured out how to render 3D spaces efficiently; his algorithms and software techniques -- not improved hardware -- underpin most modern game design.

Better audio software technology -- the type necessary for the mobile era -- can cut through Gordian Knots by provide radically better, more efficient algorithms for audio processing. This creates enormous, system-wide benefits such as:

1. More performance per watt by boosting production of spaghetti and cupcakes because better lemonade recipes now take fewer slots for carrying water, sugar and lemons, those slots can and will be used to deliver flour, water and eggs faster.

2. Less waiting around idly by the CPU and GPU for ingredients to be delivered.

Got that? The better the audio tech gets, the better the actual device performs.

Why? Again, because the fundamental bottleneck on computing capacity isn’t the CPU, it’s the memory bus load. So let’s free up capacity with better technology. More capacity on the moving walkway means that that CPU and GPU get their data faster, and can process faster. More work, same number of clock cycles.

In other words: to avoid dropping frames, developers need to remember to optimize for audio too, to achieve better quality and faster image and video processing.

Mobile Audio Processing and Memory Bus Bandwidth and Load

More Efficient Audio Increased FPS

The Crossfader case study of the Superpowered Audio SDK demonstrates that Superpowered's audio engine helped significantly improve the Crossfader app's audio performance AND the graphical framerate.

Namely, replacing Apple’s Core Audio with Superpowered Audio increased the framerate of Core Animation/Quartz/UIKit.

Superpowered technology uses fewer CPU clock cycles than other audio DSP solutions (better recipes), and therefore, also uses less memory access/transfers (fewer slots on the moving walkway). The result is higher performance and more memory bus bandwidth available for functions like texture transfers, which happen quite often in Core Animation.

In the VR world, right now, serious teams with considerable technical chops are devoting considerable resources to solving these and related challenges, teams like the folks at JauntVR, Google Cardboard, High Fidelity and Dolby.

In the near future, as more developers come to this new medium, it is patently clear that the average developer cannot and will not spend even 1/1,000,000th of the time, money and attention on VR audio as we are seeing now.

Most VR demos, games and other applications today are created by high-profile companies with great audio teams. Those teams have the resources to deal with the bottlenecks of 3D audio. For example, if there are too many spatialized sounds (virtual speakers) taxing the CPU, they optimize the entire scene.

The average VR developer of the future will not have the resources, nor the will to make these efforts. Developers want a switch, which magically transforms the current game sound in Unity or Unreal or any other game engine to an immersive 3D environment. In other words, developers want “one click” 3D audio that just works.

As the market starts to mature, there needs to be a way to make audio invisible in terms of CPU load (and memory bus load).

As we’ve noted elsewhere on our site:

“Using Superpowered is like taking a VW Bug, adding magic technology to it, which would make it accelerate like a Porsche and get the fuel efficiency of a Prius at the same time.

Now the physics of automobiles won’t actually allow you do that — but the physics of digital signal processing do.”

For Crossfader, this increase in performance was significant and resulted in more positive user reviews, with mention of "fluid 60fps" and "the iOS device no longer runs hot".

So if "squeezing performance out of your Unity Gear VR Game" is important to you, you should squeeze the performance of your audio as well - using Superpowered.

Readers interested in audio processing performance and optimizing for memory bus load should also read How 3D Spatialized Audio Bottlenecks Virtual Reality Video, 3D Audio HRTF Processing Memory Bus Calculation, and The 1% Rule for Mobile App Power Consumption

  • memory bus bandwithd
  • memory bus load
  • iOS
  • Android
  • spatialized audio