Riding The Latest Waves On The Web

Patrick Vlaskovits

The WebAudio API has been a game-changer for online audio processing and synthesis since its introduction. With the technology continually evolving, recent developments have made significant strides in enhancing and simplifying the way we create and manipulate audio on the web.

In this blog post, we thought it might be a good time to delve into some of the latest advancements in WebAudio, exploring how they're revolutionising the digital soundscape and what it means for developers of audio applications on the web.

AudioWorklets: Unravelling The Threads

One of the most notable recent improvements in WebAudio is the significant reduction in latency. Through the use of AudioWorklets, developers can now run custom DSP directly within a dedicated audio rendering thread with PCM sample access. This allows for better performance and precision when processing audio signals, ultimately leading to a more seamless and immersive audio experience for users.

AudioWorklet processing is isolated from the main JavaScript execution context with a MessagePort or SharedArrayBuffers (more on that later) offered to exchange data and commands between the threads. This is critical in ensuring high-performance audio processing without any noticeable glitches or dropouts.

This glitch-free low-latency approach swings open the doors for the Web to become a viable and stable platform for more advanced gaming, audio processing pipelines, online DAW and online collaboration tools. With a fixed frame size of 128 samples (3ms at 44.1kHz) across the board, the processing latency allows for very low round time times which make conversations more natural, games more responsive, live performance recordings tighter and synths much more playable.

After a few years of API stabilisation and waiting patiently for the vendor’s release cycles, AudioWorklets are now supported across all major browsers and devices, achieving near identical real-world performance.

You might want to take a look at https://caniuse.com/?search=audioworklet, to check specific versions, but generally speaking it’s currently ranked to have around 94% compatibility with users, so that’s pretty solid in our opinion.

We don’t want to dwell too much on the past but we need to mention the glitchy elephant that used to be in everyone’s web audio code, of course, the ScriptProcessorNode. Prior to the introduction of the global AudioWorklet support, developers (us included) relied on WebAudio’s ScriptProcessorNode for any kind of custom audio processing. For a longtime, this was the only way to get custom DSP working on Safari for a long time..

The ScriptProcessorNode had severe performance limitations, such as running audio processing on the main javascript thread and requiring relatively high buffer sizes, which could (and almost always would) cause audio glitches and latency issues. Not fun. Safe to say, if you ever see ScriptProcessorNode’s being used in any web audio libraries in 2023, it’s time to say goodbye we’re afraid, they just won’t cut the mustard these days.

As soon as full AudioWorklet support got the much anticipated green light in Safari, Superpowered moved from using ScripProcessorNodes over to AudioWorklets.

AudioWorklets and WebAssembly, a dreamy match made in heaven.

Although running WebAssembly is not particularly recent, the universal support for AudioWorklets allows us to reliably lean on WebAssembly for audio processing in the browser with outstanding results. You’ll find WebAssembly in use quite often now in audio on the web, with a spectacular example being Superpowered, of course.

WebAssembly allows developers to port C, C++ and Rust audio code to the web with relative ease with tools like Emscripten with very near to native performance. Look under the hood of all the main players in online audio applications at the moment and you’re likely to find the WebAudio API audio graph in conjunction with WebAssembly DSP code, running via AudioWorklet’s.

Seamlessly Superpowered

As we’ve said, it’s no secret that Superpowered is using WebAssembly under the hood. We’ve invested a lot of time into streamlining your experience when working with Superpowered, allowing you to harness the power of WebAssembly based DSP without getting bogged down with configuring a correct and efficient design pattern that works across browsers and devices.

Our well considered and efficient C++ API is expertly mapped over to WebAssembly and provides a lot of helper utilities to get you off the ground quickly, allowing you to focus on you product and not on technicalities. Did we mention our API is near identical across languages? Already have a Superpowered native application and want to easily port it to the web? We’ve got you covered.

AutomaticVocalPitchCorrection example

Want to run lightening fast realtime automatic vocal pitch correction in the browser? Let’s take a look at how easy this can be with Superpowered

Use Superpowered helpers in the main scope to setup a low latency WebAudio AudioContext neatly across browsers.


this.superpowered = await SuperpoweredGlue.Instantiate(
    "ExampleLicenseKey-WillExpire-OnNextUpdate",
    superPoweredLocation
);
 
this.webaudioManager = new SuperpoweredWebAudio(
    minimumSampleRate,
    this.superpowered
);

Use Superpowered helpers to grab your users microphone feed.


async setupAudioCapture() {
    // either pass in {'fastAndTransparentAudio':true} for the default soundcard/device with no processing, (mono/channel 1)
    // or pass in a MediaStreamConstraints object - see https://developer.mozilla.org/en-US/docs/Web/API/MediaStreamConstraints
    const userInputStream = await this.webaudioManager.getUserMediaForAudioAsync(
      {
        fastAndTransparentAudio: true
      }
    );
    if (!userInputStream) throw Error("Could no access user microphone");

    // We then create a WebAudio API MediaStreamSourceNode - https://developer.mozilla.org/en-US/docs/Web/API/MediaStreamAudioSourceNode
    const userInputStreamSourceNode = this.webaudioManager.audioContext.createMediaStreamSource(
      userInputStream
    );

    // If the input is mono (by default), then upmix the mono channel to a stereo node with a WebAudio API ChannelMergerNode.
    // This is to prepare the signal to be passed into Superpowered
    this.userInputMergerNode = this.webaudioManager.audioContext.createChannelMerger(
      2
    );
    // connect the userInputStreamSourceNode input to channels 0 and 1 (L and R)
    userInputStreamSourceNode.connect(this.userInputMergerNode, 0, 0);
    userInputStreamSourceNode.connect(this.userInputMergerNode, 0, 1);

    // from here we now have a stereo audio node (userInputMergerNode) which we can connect to an AudioWorklet Node
  }

Use Superpowered helpers to create an audio worklet, passing in the URL and name of the AudioWorkletProcessor script with a callback to handle incoming messages from the audio thread in our main scope.


this.audioWorkletProcessorNode = await this.webaudioManager.createAudioNodeAsync(
    "http://[wherever your processor script is hosted]",
    "PlayerProcessor", // the name the processor script was registered with
    this.onMessageProcessorAudioScope.bind(this) // a call back to handle incomign message from the AudioWorkletProcessor thread
);

Next, like any usual WebAudio graph, we connect our nodes. In this case we want to connect our stereo user microphone feed to the `audioWorkletProcessorNode` input stage then take the output stage of the `audioWorkletProcessorNode` and connect it to the speakers (AudioContext destination) to hear the output.


this.userInputMergerNode.connect(this.audioWorkletProcessorNode);
this.audioWorkletProcessorNode.connect(this.webaudioManager.audioContext.destination);

Great, everything on the main thread is all setup. Now let’s take a look at how easy things are in the AudioWorklet when using Superpowered removes all of the complexities for you.


class SuperpoweredAutomaticVocalPitchCorrectionProcessor extends SuperpoweredWebAudio.AudioWorkletProcessor {
    // Runs after the constructor
    onReady() {
        this.effect = new this.Superpowered.AutomaticVocalPitchCorrection(
        this.samplerate
        );
    
        this.effect.scale = this.Superpowered.AutomaticVocalPitchCorrection.DMINOR;
        this.effect.range = this.Superpowered.AutomaticVocalPitchCorrection.ALTO;
        this.effect.speed = this.Superpowered.AutomaticVocalPitchCorrection.MEDIUM;
    }
    
    processAudio(inputBuffer, outputBuffer, buffersize, parameters) {
        // Render the output buffers
        this.effect.process(
        inputBuffer.pointer,
        outputBuffer.pointer,
        true,
        buffersize
        );
        // if (!) this.Superpowered.memoryCopy(outputBuffer.pointer, inputBuffer.pointer, buffersize * 8);
}

Want to hear this in action? Take a look at Our online documentation.

Audio Worklet import statements

Javascript module importing is still a bit of a mess in the world of Javascript, although it is starting to get straightened out. For now, we’ve noticed firefox is a bit behind the crowd when it comes to support for use of ES import statements within AudioWorkletProcessor scripts. To find out more about how we get around this for the Superpowered library, please see https://docs.superpowered.com/getting-started/how-to-integrate?lang=js#firefox-es6-compatibility.

Assets and SharedArrayBuffer

Real-time AudioWorklet threads are certainly not the place to be downloading audio assets, in fact you’ll find the Fetch API is not even supported in the scope. Because of this, we’ll need to be fetching and decoding audio assets elsewhere, preferably via an additional Worker. The challenge then comes when we need to transfer the data across the threads.

If you've been exploring the world of JavaScript, you might have come across a mysterious object called SharedArrayBuffer. This object represents a generic, fixed-length raw binary data buffer, similar to the ArrayBuffer, but with a different kind of superpower—it enables sharing bytes between JavaScript Workers, making it a critical feature for concurrent processing in JavaScript.

Let's dive into the world of SharedArrayBuffers and see how they bring the much-needed concurrency to the JavaScript language.

JavaScript, for a long time, has been single-threaded, meaning only one operation happens at a time. While it's possible to run JavaScript in different threads using Web Workers, these threads can't share memory—they can only communicate through message passing.

This is where SharedArrayBuffer comes into play. SharedArrayBuffers provide a shared memory area that allows different workers to read and write the same data. This can lead to significant performance improvements in multi-threaded applications.

Creating a SharedArrayBuffer is similar to creating an ArrayBuffer:


// Create a SharedArrayBuffer with a size in bytes
const sharedBuffer = new SharedArrayBuffer(16);

Here, we've created a SharedArrayBuffer that can hold 16 bytes of data.

You can't directly manipulate the contents of an ArrayBuffer or SharedArrayBuffer. Instead, you need to create a "view" on the buffer using typed arrays (like Int32Array, Uint8Array, etc.) or a DataView.


// Create a view
const view = new Int32Array(sharedBuffer);

// Set some data
view[0] = 42;

Sharing Memory between Workers

The true power of SharedArrayBuffers lies in their ability to be used across different workers. To do this, you post the SharedArrayBuffer to the worker using postMessage.


// Main thread
const worker = new Worker('worker.js');

// Create a SharedArrayBuffer
const sharedBuffer = new SharedArrayBuffer(16);
const view = new Int32Array(sharedBuffer);
view[0] = 42;

// Send it to the worker
worker.postMessage(sharedBuffer);

// worker.js
self.onmessage = function(event) {
    const sharedBuffer = event.data;
    const view = new Int32Array(sharedBuffer);

    // Now the worker can access the same memory
    console.log(view[0]); // Outputs: 42
};

The Power of Atomics

When multiple threads are reading and writing to the same memory, we can run into race conditions, where the output depends on the sequence or timing of other uncontrollable events. To manage these conditions, JavaScript provides the Atomics global object.

Atomics provides static methods to perform atomic operations on shared memory, such as addition, subtraction, and loading or storing data. These operations are "atomic," meaning they're indivisible and can't be interrupted.


// Main thread
const worker = new Worker('worker.js');

const sharedBuffer = new SharedArrayBuffer(16);
const view = new Int32Array(sharedBuffer);
view[0] = 42;

Atomics.add(view, 0, 1); // Atomically adds 1 to the value at index 0

worker.postMessage(sharedBuffer);

// worker.js
self.onmessage = function(event) {
    const sharedBuffer = event.data;
    const view = new Int32Array(sharedBuffer);

    Atomics.sub(view, 0, 1); // Atomically subtracts 1 from the value at index 0
};

Since the Specture attack in 2018, the increasing SharedArrayBuffer support took a huge blow due to concerns around its vulnerability around sandbox security. Any support that was making progress was halted, then slowly began to be exposed again via experimental flags. Since 2020 , browsers have adopted a new security approach that relies on HTTP headers.

This new approach allows us as audio developers, to start passing asset data around in a clean and efficient way. Each browser vendors implementation varies slightly, but you’ll find full support for allocating and reading memory via Atomics. The support is relatively new, so you’ll often still see memory being duplicated in both the audio thread and WebAssembly, especially while we’re still relying on automated garbage collection to clear up stale memory usage.

It’s worth noting that the amount of memory that can be allocated on the main thread, in workers or via WebAssembly is still not clearly defined. Webassemly in theory allows up to 4GB of memory to be allocated but in practice we’re seeing anything above around 350mb will crash an iPhone Safari tab, which is entirely undocumented.

When it comes to memory management and WebAudio, tread quite carefully. Superpowered provides methods to clear up any unused memory, but can only act within it’s own WebAssembly sandbox and any associated Workers, so be conscious of how you pass assets into your AudioWorklets.

WebCodecs: Pick It Apart

WebCodecs aims to solve performance issues tied to the encoding and decoding of audio and video in web applications. Typically, these processes require expensive memory copying and format conversions. With WebCodecs, developers can maintain control of the lifecycle of the media they're manipulating, which can lead to significant performance improvements.

Let’s take a look at the main WebCodecs API features. We’re bringing video into the picture here as they really go hand in hand in this context.

Codec Access: WebCodecs provides direct access to the browser's built-in codecs. These are the same codecs used by HTMLMediaElement and Media Source Extensions (MSE) and WebAudio and WebRTC.
Codec Interfaces: It offers separate interfaces for audio (AudioEncoder, AudioDecoder) and video (VideoEncoder, VideoDecoder) codecs.
Codecs Configuration: Developers can configure each codec with specific parameters, allowing for optimal control over encoding and decoding processes.
Performance: By providing a more direct path to the underlying codec implementations, WebCodecs can help to reduce latency and improve efficiency in web-based media applications.
Frame and Chunk Manipulation: WebCodecs provides classes like VideoFrame and AudioData for manipulating individual frames of media, and EncodedAudioChunk and EncodedVideoChunk for manipulating encoded data.

Anyone wanting to write a complex audio application for the web (without without video), will find these concepts appealing. More control, more flexibility, less overhead.

As it stands, any customised decoding or encoding that require access to the raw frames and samples of media is typically performed via WebAssembly with library ports of FFMPEG and MP4Box, which duplicates existing codecs that already exist within the browser, adding additional bandwidth requirements and consume additional performance and development overhead to manage.

WebTransport and Audio: A Harmonious Symphony

In the realm of web development, the continuous evolution of technologies and APIs offer opportunities for richer and more interactive experiences. Let’s take a quick look at the use of the new WebTransport API standard and audio on the web, exploring how this powerful combination is set to change the way we send audio over the web.

WebTransport is a protocol framework that aims to provide clients constrained by the web security model with a path to communicate with a remote server using a low-latency, reliable and secure multiplexed transport. It was designed as part of an ongoing effort to provide web developers with capabilities equivalent to those of a native platform. In simpler terms, it offers a robust and speedy connection that's music to the ears of developers and users alike.

WebTransport has several features that set the stage for its use in audio applications:

Low Latency: Latency can be the nemesis of real-time audio applications. WebTransport addresses this concern head-on by offering a low-latency data transmission.
Multiplexing: The ability to transmit multiple data streams concurrently, a boon for applications requiring simultaneous transmission of different types of data.
Reliable and Unreliable Transport: WebTransport supports both reliable (like TCP) and unreliable (like UDP) modes of transport, providing flexibility based on application requirements.

Here's how WebTransport might harmonise with audio applications:

Real-Time Communication: Whether it's VoIP, video conferencing, or online multiplayer games, real-time communication relies heavily on low-latency and reliable data transmission. Traditional HTTP-based protocols often fall short here, leading to delays or interruptions. WebTransport, with its speedy and dependable data delivery, can improve the quality of these applications significantly, ensuring smoother conversations and interactions.
Live Streaming Services: With its support for both datagrams and reliable streams, WebTransport can tune up live audio broadcasts. Whether it's an online radio show or a live concert, you can deliver a low-latency, high-quality streaming experience that keeps audiences hooked.
Collaborative Music Applications: Imagine an online platform where musicians from all around the world come together to jam in real-time. WebTransport's low latency and multiplexed communication can synchronise the various data streams, enabling seamless collaboration, no matter how many miles separate the band.
Interactive Audio Applications: From web-based games to interactive music experiences, WebTransport enables real-time communication between the client and server, ensuring responsive and immersive user experiences.

The API offers direct access to multiple UDP transmissions of audio outside of the WebRTC API, allowing for custom audio formats to be streamed over the web using these various transports methodologies

Unreliable and unordered, using HTTP/3 datagrams for the when you are prioritising low latency streaming rather than accuracy.
Unordered and reliable transport, when the transmission and you have mechanisms to reorder what’s received
Ordered and reliable transport, when accuracy and reliability is paramount.

The standard is still gaining support from browsers, so it’s not ready for use across the board just yet.

TimingObject API: Timing is Everything

In the realm of audio application development, timing is everything. The Timing Object API, still in its proposal stage, aims to address this fundamental need for precise timing and synchronisation in web-based applications. We'l take a look at what the Timing Object API is, and the potential it holds for transforming the way we manage time-sensitive elements across devices.

Introduced by the Multi-Device Timing Community Group (MDTCG), the Timing Object API is an exciting proposal that provides a unified, shared clock reference. It allows different components of a multimedia presentation, such as audio, video, and animations, to stay in perfect sync across multiple devices. With it, developers can define and control the timing of these elements with unprecedented precision and flexibility. More simply put, think of a stopwatch that everyone can see at the same time..

Here are some key aspects of the Timing Object API that make it such a promising proposal:

Unified Timing: The Timing Object API acts as a single conductor for the symphony of timed operations in your multimedia presentation, ensuring a seamless experience for the audience.
Shared Timing: In an age of distributed digital experiences, Timing Objects can be shared across different devices, opening the door for perfectly synchronised multi-device performances.
Precision: The Timing Object API brings high-precision timing control to your fingertips, making it suitable for applications that require spot-on synchronisation like complex multimedia presentations, online gaming, and collaborative music apps.
Flexibility: Programmatically control the flow of time. With the Timing Object API, you have the power to control playback rate, direction, and even pause or resume operations as per your needs.

The Timing Object API could revolutionise various aspects of digital experiences. For instance, consider a multi-device media presentation where each device presents a different element of the presentation. One is playing audio, another one is showcasing video, while a third animates corresponding visuals. The Timing Object API could provide a single timing reference for all devices, ensuring perfect harmony in the presentation, regardless of the number of devices involved.

Another exciting application could be seen in the domain of collaborative music applications. Timing Object API could help keep each collaborator's contributions in perfect sync, potentially solving the issues of network delays or individual hardware differences.

You can find out more about TimingObject over at https://webtiming.github.io/timingobject/ , which is the draft proposal being put together by the Multi-Device Timing Community Group (MDTCG).

WebMIDI API: Qwerty Keyboards Are Tapping Out

WebMIDI is a web-based API that provides direct access to MIDI devices from a browser. MIDI, an industry-standard protocol for communication between musical instruments and computers, can be thought of as the language of musical devices. With WebMIDI, this language can be understood and spoken by our web applications, enabling interaction with a plethora of MIDI devices like synthesisers, drum machines, and even lighting systems.WebMIDI is a web-based API that provides direct access to MIDI devices from a browser. MIDI, an industry-standard protocol for communication between musical instruments and computers, can be thought of as the language of musical devices. With WebMIDI, this language can be understood and spoken by our web applications, enabling interaction with a plethora of MIDI devices like synthesisers, drum machines, and even lighting systems.

The crux of WebMIDI lies in its ability to send and receive MIDI messages. These messages contain instructions, such as to play a note, change an instrument, or modify a sound effect, that MIDI devices can interpret and act upon.

Here’s an idea of the kind of things you can achieve with WebMIDI

Device Access: WebMIDI, once given user permission, lists the connected MIDI devices. You can access any of these devices and communicate with them directly from your application.
Sending Messages: To perform a task on a MIDI device, like playing a note, you send a MIDI message to that device. This message, usually a three-byte sequence, carries information about the command (e.g., note on or off), the note to be played, and the velocity (how hard the note is played).
Receiving Messages: MIDI devices can also send messages back to your application, such as information about a note played on a MIDI keyboard. This allows for interactive experiences, where actions on a MIDI device can trigger events in your web application.

Codewise, it looks little something like this:


if (navigator.requestMIDIAccess) {
    console.log('WebMIDI is supported in this browser.');
    navigator.requestMIDIAccess()
        .then(onMIDISuccess, onMIDIFailure);
} else {
    console.log('WebMIDI is not supported in this browser.');
}

function onMIDISuccess(midiAccess) {
    console.log('MIDI Access Object', midiAccess);
    
    // List input devices
    const inputs = midiAccess.inputs.values();
    for (let input = inputs.next(); input && !input.done; input = inputs.next()) {
        console.log('MIDI input:', input.value);
        input.value.onmidimessage = onMIDIMessage;
    }

    // List output devices
    const outputs = midiAccess.outputs.values();
    for (let output = outputs.next(); output && !output.done; output = outputs.next()) {
        console.log('MIDI output:', output.value);
    }
}

function onMIDIFailure() {
    console.log('Could not access your MIDI devices.');
}

function onMIDIMessage(midiMessage) {
    console.log('Received MIDI message:', midiMessage);
}

From your main thread, you could then pass the midi messages over a message port to any AudioWorklets you might need.

WebMIDI is fairly well supported now but unfortunately safari is behind the crowd implementing the spec, so it’s not quite ready yet.

Machine Learning: Don’t stop me now

In recent years, web technologies have been evolving at an unprecedented pace, bringing new dimensions to our online experiences. Two areas that have seen remarkable growth are web-based audio processing and machine learning. As these technologies mature, they are beginning to intersect in innovative and exciting ways. Let’sl explore the flurry of audio and machine learning on the web and envision their harmonious future.

Before we dive into their interplay, let's establish what we mean by web-based audio processing and machine learning.

Web-based Audio Processing involves manipulating audio signals in a browser environment. Using APIs like the Web Audio API, developers can capture, analyse, and generate audio, as well as apply effects and spatialization.

On the other hand, Machine Learning (ML) is a subfield of artificial intelligence that focuses on teaching machines to learn from data and make predictions or decisions without being explicitly programmed. Thanks to libraries like TensorFlow.js, we can now run machine learning models directly in the browser.

Harmonizing Audio and Machine Learning

The exciting part begins when we combine these two. Machine learning, with its ability to find patterns and make predictions from data, can extract meaningful information from audio signals, leading to a wide range of applications.

Sound Classification: One of the most straightforward applications of machine learning with audio data is sound classification. A machine learning model can be trained to distinguish between different types of sounds, such as identifying a musical instrument or recognizing a specific word or phrase. This functionality could enable innovative interfaces that respond to voice commands or music, creating engaging and interactive web experiences.
Music Generation: Machine learning models can be trained on a dataset of music to generate new compositions in a similar style. Once trained, these models can be run in the browser to create dynamic, AI-generated soundtracks that respond to user input or changes in the environment.
Noise Cancellation: Machine learning can be used to identify and remove background noise from audio signals in real-time. With the Web Audio API, this processed audio can then be played back in the browser, providing users with clearer, more focused sound.
Speech-to-Text: With machine learning models, you can transcribe speech captured in the browser, providing real-time subtitles for video content or enabling voice-controlled web applications.

With more and more talk around Machine Learning and audio processing, it’s fairly clear that it’s a combination that will play a significant role in the future. Machine learning for audio on the web is still finding its feet. We expect to see a lot of developments here very soon.

Conclusion

While these applications offer a glimpse into what is possible, the combination of web-based audio and machine learning is still a largely unexplored frontier with potential for further exploration. As web technologies continue to advance and machine learning becomes more accessible, we can expect to see even more creative applications emerge at this intersection.

It is also worth noting that this convergence of technologies highlights the importance of ethical considerations. When dealing with audio data, especially when captured in real-time, it is crucial to handle user data responsibly and respect privacy boundaries. With great power comes great responsibility.

Web technologies are like musical instruments, and developers are the composers. By leveraging the capabilities of web-based audio and machine learning, we can compose symphonies that were previously unimaginable. The stage is set, and it's time to play the next note in the melody of the web. As always, it's recommended to stay updated with the latest advancements in web-based audio processing to stay in tune.

See All Superpowered Audio Tech Blog Content

superpowered documentation
superpowered reference
superpowered examples
superpowered tutorial