VstAudioBuffer optimisation

Jul 18, 2015 at 5:03 PM
I've been profiling a bit and the property indexer showed up on top:
/// <summary>
/// Gets or sets (see remarks) the sample value at <paramref name="index"/>.
/// </summary>
/// <param name="index">A zer-based index into the buffer.</param>
/// <returns>Returns the sample value.</returns>
/// <remarks>The setter will cause an exception when <see cref="CanWrite"/> is false.</remarks>
/// <exception cref="ArgumentOutOfRangeException">Thrown when the specified <paramref name="index"/> was not greater 
/// or equal to zero or it was greater or equal to <see cref="SampleCount"/>.</exception>
/// <exception cref="InvalidOperationException">Thrown when the setter was used on a read-only buffer (<see cref="CanWrite"/> is false).</exception>
public float this[int index]
{
    get
    {
        Throw.IfArgumentNotInRange(index, 0, SampleCount - 1, "index");

        unsafe
        {
            return Buffer[index];
        }
    }
    set
    {
        if (!CanWrite) throw new InvalidOperationException(Properties.Resources.VstAudioBuffer_BufferNotWritable);
        Throw.IfArgumentNotInRange(index, 0, SampleCount - 1, "index");

        unsafe
        {
            Buffer[index] = value;
        }
    }
}
I cooked up this reflection incantation to get a pointer to the private buffer.
System.Reflection.FieldInfo field = typeof(VstAudioBuffer).GetField("<Buffer>k__BackingField", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic);
float* a = (float*)System.Reflection.Pointer.Unbox(field.GetValue(mixOutputBuffers[i]));
Subsequently I just write directly to the pointer in unsafe block.
What are your thoughts about it?
Jul 25, 2015 at 9:52 AM
Are you having performance problems?

I don't really see any issue with using unsafe code in this context, but of course the question must be asked: is this necessary? Are you dong very complex audio operations?
Jul 25, 2015 at 9:58 AM
I'm currently having a problem with crackling audio, and I'm wondering if it's a performance problem...

The CPU doesn't spike but I think I've eliminated every other possibility...
Jul 25, 2015 at 10:09 AM
This method seems to be causing a problems. It's a very inefficient way to copy an array of buffers but I wouldn't think it would cause my very fast CPU to not keep up.

        private static VstAudioBuffer[] Copy(List<VstAudioBuffer[]> buffers, int bufferCount, int bufferSize)
        {
            using (interophost.VstAudioBufferManager outputMgr = new interophost.VstAudioBufferManager(bufferCount, bufferSize))
            {
                var outputBuffer = outputMgr.ToArray();

                foreach (var buffer in buffers)
                {
                    for (int i = 0; i < bufferCount; i++)
                    {
                        for (int n = 0; n < bufferSize; n++)
                        {
                            outputBuffer[i][n] = buffer[i][n];
                        }
                    }
                }

                return outputBuffer;
            }
        }
Jul 25, 2015 at 1:41 PM
Hi Kruddler,

Yeah intermittent crackling is probably a sign you couldn't fill the audio buffer fast enough in the audio card callback. The most important optimization I made was to register the audio thread with MMCSS. In my case most of the time was spent in the mixing operations. Writing to them with pointer arithmetic makes a measurable difference. The release build is getting usable but I still get crackling in debug builds.
Jul 25, 2015 at 1:56 PM
The speed of your CPU is one factor, availability of the CPU is another. Ideally you'd have one CPU core spinning waiting to execute your audio callback as soon as needed. Let's say you have 50ms to get out of audio callback. The thread scheduler notices the callback event and waits 30ms before handing control. Now you only have 20ms to do the job. One way to alleviate this kind of problem is having more latency with larger buffer count. This eliminates the thread scheduling overhead. Another way is to register the thread with MMCSS which guarantees faster scheduling if a core is available. I'd like the bottleneck to be the VST ProcessReplacing in my host. That makes sense since that's where the heavy DSP takes place.
Jul 25, 2015 at 3:20 PM
Edited Jul 25, 2015 at 3:30 PM
This method seems to be causing a problems. It's a very inefficient way to copy an array of buffers but I wouldn't think it would cause my very fast CPU to not keep up.
  • This method also relies heavily on the method I mentioned in the first post. When you execute this:
outputBuffer[i][n] = buffer[i][n];
You are in fact executing the indexer property overload for the [ ] operator:
public float this[int index] { ... }
Multiply this by number of VST, number of channels, number of samples and number of mixing operations and you spend a whole lot of time in the indexer. Much more efficient to write sequentially to optimize the pipelined output of the CPU:
// Where buffers are type of float*[]
unsafe
{
   for (int i = 0; i < channels; i++)
      for (int j = 0; j < bufferCount; j++)
         outputBuffer[i][j] = buffer[i][j];
}
One thing I worried about was memory relocation though. Is this a problem and does the fixed keyword can 'fix' it, I don't know.
Jul 26, 2015 at 1:25 AM
Thanks again Yury!

I don't work in the unsafe world very often, but this may be what I need to get over the hump. I'll post my results soon.

Good timing that you're working in the same area right now.
Jul 27, 2015 at 11:53 AM
I'm working on a modular synth. Just the other day, processing was a huge issue but I've gotten over the humps now.

2 Big things!

1) While using the indexer property probably was using up CPU, I found that my major problem was creating buffers in a loop. I was calling this code:
                var outputMgr = new interophost.VstAudioBufferManager(channelCount, sampleCount);
                var outputBuffer = outputMgr.ToArray();
That was massively expensive, so now, I create an array of working buffers at startup, and never recreate the array of buffers until I need more.

2) Parallel Processing!

One of the issues anyone in .Net will face is that by default you will be working on the same thread as the UI, and that processing is done in serial by default. But, In my case, because I am running the ProcessReplacing method on many synths at once, I can take full advantage of the many cores I have.
        public static void ProcessInputs(VstAudioBuffer[] inputAudioBuffers, List<IAudioElement> inputElements, VstAudioBuffer[][] workingBuffers)
        {
            Parallel.ForEach<IAudioElement>(inputElements, currentInput =>
            {
                currentInput.ProcessReplacing(inputAudioBuffers, workingBuffers[inputElements.IndexOf(currentInput)]);
            });
        }
Unfortunately, this caused an error to occur in some other code. This here was erroring:
        public VstTimeInfo GetTimeInfo(VstTimeInfoFlags filterFlags)
        {
            return _HostCmdStub.GetTimeInfo(filterFlags);
        }
Maybe Marc never planned for parallel programming?

But, anyway, my code still works when I comment out the line:
        public VstTimeInfo GetTimeInfo(VstTimeInfoFlags filterFlags)
        {
            return null;
        }
The moral of the story?

Don't create buffers in a loop, and use parallel programming!
Jul 27, 2015 at 12:10 PM
Thanks to some ideas from Yury, this is my summing (audio mixing) code:
    public static void Sum(VstAudioBuffer[][] sourceBuffers, int channelCount, int sampleSize, ref VstAudioBuffer[] summedAudioBuffer)
    {
        unsafe
        {
            for (int i = 0; i < channelCount; i++)
                for (int n = 0; n < sampleSize; n++)
                {
                    float value = 0;

                    for (int x = 0; x < sourceBuffers.Length; x++)                      
                        value += sourceBuffers[x][i][n];

                    summedAudioBuffer[i][n] = value;
                }

        }
    }
Yury, would using the code you posted above help to speed this up?
System.Reflection.FieldInfo field = typeof(VstAudioBuffer).GetField("<Buffer>k__BackingField", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic);
float* a = (float*)System.Reflection.Pointer.Unbox(field.GetValue(mixOutputBuffers[i]));
Jul 27, 2015 at 12:28 PM
I'm considering leaving the code like this...

I'm not sure if this would be useful or not, but I figure that if the CPU is busy with something like UI, it would push the work to a different core. IS this reasonable?

        public static void Sum(VstAudioBuffer[][] sourceBuffers, int channelCount, int sampleSize, VstAudioBuffer[] summedAudioBuffer)
        {
            Parallel.Invoke
            (
                () =>
                {
                    unsafe
                    {
                        for (int i = 0; i < channelCount; i++)
                            for (int n = 0; n < sampleSize; n++)
                            {
                                float value = 0;

                                for (int x = 0; x < sourceBuffers.Length; x++)
                                    value += sourceBuffers[x][i][n];

                                summedAudioBuffer[i][n] = value;
                            }

                    }
                }
            );
        }
Jul 27, 2015 at 3:51 PM
Edited Jul 27, 2015 at 3:53 PM
Yury, would using the code you posted above help to speed this up?
  • Yes, the idea is to hoist the Buffer private variable out of VstAudioBuffer and cast it to something like float* or float*[]. After use that pointer instead of VstAudioBuffer. If you're interested I can post complete code later.
Maybe Marc never planned for parallel programming?
  • Vst.Net isn't thread safe. If you're working in a multithreaded context (like I do) I'd avise to wrap every call to Vst.Net DLL in a lock. It's usual to have non thread-safe library because the caller is able to make better locking decisions.
You definitely don't want to do audio processing on the main thread (UI). On the other hand I have doubts about parallel processing for audio in .Net. The overhead to setup, synchronize and tear down parallel processing might be higher than the cost of executing ProcessReplacing in series. In my experience the Parallel library seems better suited for long running task. One thing worth checking out is the priority of the threads spawned by the Parallel library, one high-priority thread could be faster than 5 normal priority threads running in parallel.
I'm not sure if this would be useful or not, but I figure that if the CPU is busy with something like UI, it would push the work to a different core. IS this reasonable?
  • Yes this would likely be scheduled to another core just like a regular high-prioirty thread but the cost of frequently spawning new threads might hurt in the long run.
Here's my approach to thread management:
  • Find out your entry point for audio processing, in my case it's the audio callback.
  • Make sure you create/initialize that callback in a new thread, a regular thread will do but I recommend boosting priority with MMCSS.
  • Push events to that thread to initialize other audio processing tasks such as a VST plugins.
  • When the callback executes, you should already be in the context of that new thread
Open VisualStudio Debug->Windows->Threads and verify that the current thread is not the UI thread in the audio processing path.

The end result is a normal priority thread for the UI and a high-priority thread for the audio.
Jul 28, 2015 at 1:39 AM
Edited Jul 28, 2015 at 2:55 AM
This should give you a better idea of the direct buffer access trick:
public unsafe sealed class VstHost
{
        float*[] directBuffers;
        int channelCount;
        int sampleCount;
        [...]

        // Call this only once before processing audio, reflection is an expensive operation
        unsafe void Init()
        {
            VstAudioBufferManager vstAudioBufferManager = new VstAudioBufferManager(channelCount, sampleCount);
            VstAudioBuffer[] vstAudioBuffer = vstAudioBufferManager.ToArray();

            directBuffers = new float*[channelCount];

            System.Reflection.FieldInfo field = typeof(VstAudioBuffer).GetField("<Buffer>k__BackingField", 
                                                                                System.Reflection.BindingFlags.Instance | 
                                                                                System.Reflection.BindingFlags.NonPublic);

            for (int i = 0; i < channelCount; i++)
                directBuffers[i] = (float*)System.Reflection.Pointer.Unbox(field.GetValue(vstAudioBuffer[i]));
        }

        // Perform DSP on the direct buffer instead of vstAudioBuffer
        // Direct buffer points to the private member vstAudioBuffer.Buffer 
        unsafe void ChangeVolume(float volume)
        {
            for (int i = 0; i < channelCount; i++)
                for (int j = 0; j < sampleCount; j++)
                    directBuffers[i][j] *= volume;
        }
}
Jul 28, 2015 at 2:28 AM
Edited Jul 28, 2015 at 2:44 AM
Here's a relevant screenshot from the debugger in the audio processing path of my host:
Image

Process explorer by Mark Russinovich is another good tool to check your threads behavior:
Image
Coordinator
Aug 2, 2015 at 8:41 AM
Edited Aug 3, 2015 at 7:57 AM
Hi Guys (I'm back)

A few words on the VstAudioBuffer.

The VstAudioBufferManager manages an continuous piece of unmanaged (pinned) memory for the Managed Host. The idea is to use a static instance of the manager in your host application. All buffers you need should come from this one manager. So if you need additional (temp) buffers for other purposes, include them in the count you pass into the ctor. If you need buffers of different sizes - see if you can you can use larger size buffers or worst case - allocate another manager for that size. But keep the count of managers low.

Also allocate all your buffers up front. Do NOT allocate (large) memory blocks during processing. Using the VstAudioBufferManager as a temp tool to copy is NOT recommended at all!

The VstAudioBuffer instances that a manager owns, all point to their own little piece (size) in the continuous memory block. Internally the VstAudioBuffer is noting more than a pointer to the raw sample buffer. For a managed plugin this raw pointer is provided by the host and for a managed host the VstAudioBufferManager takes care of that for you. The thinking is that this is the most performant way to interface an unmanaged pointer to managed code (thank god C# allows this).

The public methods of the VstAudioBuffer only expose members that allow safe access to the underlying unmanaged/unsafe memory buffer. But there is a IDirectBufferAccess32/64 interface that the VstAudioBuffer implements that exposes the internal raw pointer.
http://vstnet.codeplex.com/SourceControl/latest#Source/Code/Jacobi.Vst.Core/IDirectBufferAccess.cs
Use that (instead of reflection ;-) to gain access to the raw buffer. Note that YOU are responsible for not overrunning the buffer size! Perhaps implement buffer access logic with the safe public methods in your code to check for correct function and #ifdef the unsafe implementation to get the speed you need. That way you know that if your code is working in the safe-mode - any bugs will have to be caused by your unsafe pointer manipultion code.

The VstAudioBuffer.Copy method is an example of how to use the unmanaged pointer. http://vstnet.codeplex.com/SourceControl/latest#Source/Code/Jacobi.Vst.Core/VstAudioBuffer.cs

Hope it helps.
Marc
Aug 2, 2015 at 7:26 PM
Edited Aug 2, 2015 at 7:40 PM
Thanks Marc didn't notice that the buffer was accessible through an interface. I'm glad to get rid of Reflection :)

Would this be better ?
int nbInputs = 2;
int nbOutputs = 2;

float*[] inputsDirectBuffer = new float*[nbInputs];
float*[] outputsDirectBuffer = new float*[nbOutputs];

VstAudioBuffer[] inputsVstAudioBuffer = new VstAudioBuffer[nbInputs];
VstAudioBuffer[] outputsVstAudioBuffer = new VstAudioBuffer[nbOutputs];

VstAudioBufferManager vstAudioBufferManager = new VstAudioBufferManager(nbInputs + nbOutputs, blockSize);
IEnumerator bufferEnumerator = vstAudioBufferManager.GetEnumerator();
bufferEnumerator.MoveNext();

for (int i = 0; i < nbInputs; i++)
{
    inputsVstAudioBuffer[i] = (VstAudioBuffer)bufferEnumerator.Current;
    inputsDirectBuffer[i] = ((IDirectBufferAccess32)inputsVstAudioBuffer[i]).Buffer;
    bufferEnumerator.MoveNext();
}

for (int i = 0; i < nbOutputs; i++)
{
    outputsVstAudioBuffer[i] = (VstAudioBuffer)bufferEnumerator.Current;
    outputsDirectBuffer[i] = ((IDirectBufferAccess32)outputsVstAudioBuffer[i]).Buffer;
    bufferEnumerator.MoveNext();
}
Aug 3, 2015 at 12:01 AM
Thanks to the both of you!

I've been focusing on a different part of the app, but this discussion is a treasure of trove of useful information!

I will look in to implementing this stuff when time permits.
Coordinator
Aug 3, 2015 at 8:16 AM
Edited Aug 3, 2015 at 8:16 AM
@Yury: perhaps a class that contains the VstAudioBuffer and raw float* pair (and a Count?) would be a nice way to keep them together.
public class MyAudioBufferInfo
{
    public float* Raw {get;}
    public int Count {get;}
    public VstAudioBuffer Buffer {get;}
}
Also, I would not re-assign the VstAudioBuffer array elements like that - I would prefer to use a List<MyAudioBufferInfo> - but its not wrong...

Oh, there was also a discussion on Parallel / multi threading / locking. What Yury said was bang on: VST.NET is NOT thread safe because those decisions have to be made by your implementation. Any locks VST..NET would add - would hurt some scenario. Note that loading VST.NET plugins MUST be single threaded. So don't do any fancy stuff (parallel directory scanning etc) when scanning / loading Plugins.

[2c]
Marc
Oct 25, 2015 at 2:54 AM
Guys, I'm resurrecting this thread. Using this interface has helped a lot. The accessors Jacobi.Vst.Core.VstAudioBuffer.set_Itemand Jacobi.Vst.Core.VstAudioBuffer.get_Item are taking up a lot less of the exclusive samples in the profiler. I'm getting a lot less crackling. However, I still get a little crackling. Am I using the code correctly? Any other tips here?

        public static void Sum(VstAudioBuffer[][] sourceBuffers, int channelCount, int sampleSize, VstAudioBuffer[] summedAudioBuffer)
        {
            //TODO: Are these safe to declare here? Will they get overwritten by each other?
            IDirectBufferAccess32 summedAudioBufferAsIDirectBufferAccess32 = null;
            IDirectBufferAccess32 sourceBufferAsIDirectBufferAccess32 = null;

            Parallel.Invoke
            (
                () =>
                {
                    unsafe
                    {
                        for (int i = 0; i < channelCount; i++)
                        {

                            for (int n = 0; n < sampleSize; n++)
                            {
                                summedAudioBufferAsIDirectBufferAccess32 = summedAudioBuffer[i];
                                summedAudioBufferAsIDirectBufferAccess32.Buffer[n] = 0;

                                for (int x = 0; x < sourceBuffers.Length; x++)
                                {
                                    sourceBufferAsIDirectBufferAccess32 = sourceBuffers[x][i];
                                    summedAudioBufferAsIDirectBufferAccess32.Buffer[n] += sourceBufferAsIDirectBufferAccess32.Buffer[n];
                                }
                            }
                        }
                    }
                }
            );
        }
And
        public static void ProcessInputs(VstAudioBuffer[] inputAudioBuffers, ObservableCollection<IAudioElement> inputElements, VstAudioBuffer[][] workingBuffers)
        {
            //TODO: Are these safe to declare here? Will they get overwritten by each other?
            IDirectBufferAccess32 workingBuffer = null;

            Parallel.ForEach<IAudioElement>(inputElements, currentInput =>
            {
                unsafe
                {
                    var index = inputElements.IndexOf(currentInput);
                    currentInput.ProcessReplacing(inputAudioBuffers, workingBuffers[index]);

                    for (int i = 0; i < workingBuffers[index].Length; i++)
                    {
                        for (int n = 0; n < workingBuffers[index][i].SampleCount; n++)
                        {
                            workingBuffer = workingBuffers[index][i];
                            workingBuffer.Buffer[n] *= inputElements[index].Gain;
                        }
                    }
                }
            });
        }
Oct 25, 2015 at 7:17 AM
I've had a bit of a frustrating day. I've been optimizing like a mofo. According to the profiler, what I am doing is making the code more efficient because I'm reducing the amount of exclusive % of time in my methods. But, I'm still getting crackling. Here's one of my methods that has been further optimized. I removed parallel processing because I think it was causing more performance loss than it was gaining.
        public static void ProcessInputs(VstAudioBuffer[] inputAudioBuffers, ObservableCollection<IAudioElement> inputElements, VstAudioBuffer[][] workingBuffers)
        {
            //TODO: Are these safe to declare here? Will they get overwritten by each other?
            IDirectBufferAccess32 workingBuffer = null;

            unsafe
            {
                int index = 0;
                var inputElementsEnumerator = inputElements.GetEnumerator();
                while (inputElementsEnumerator.MoveNext())
                {
                    var currentInput = inputElementsEnumerator.Current;

                    currentInput.ProcessReplacing(inputAudioBuffers, workingBuffers[index]);

                    for (int i = 0; i < workingBuffers[index].Length; i++)
                    {
                        for (int n = 0; n < workingBuffers[index][i].SampleCount; n++)
                        {
                            workingBuffer = workingBuffers[index][i];
                            workingBuffer.Buffer[n] *= inputElements[index].Gain;
                        }
                    }

                    index++;
                }
            }
        }
The question I'm really left with, is how do I know that the crackling is because of performance? Actually, I'm really starting to doubt that it is, because I can load up several instances of VSTs, sum them, and it doesn't crackle. But, as soon as I mix a signal with an effect on it (the delay), with a dry signal, it starts crackling. Note sure how to start diagnosing the problem.
Oct 25, 2015 at 7:32 AM
I tested by putting 5 delays in serial which I would expect to cause some performance degradation, but actually no. No crackling. So I'm starting to wonder where else the problem could be. The summing algorithm seems to work fine when I mix two VSTs together. But, not 1 affected, and 1 not affected.
Oct 25, 2015 at 7:39 AM
I think I've proved my point.

Here is the routing that works:
https://onedrive.live.com/redir?resid=47321C0630B57E1D!5322&authkey=!AI7k4B_tAwEYQwY&v=3&ithint=photo%2cpng

And this is what causes crackling (FM8->Output)
https://onedrive.live.com/redir?resid=47321C0630B57E1D!5322&authkey=!AI7k4B_tAwEYQwY&v=3&ithint=photo%2cpng

This isn't a performance problem...
Coordinator
Oct 28, 2015 at 3:52 PM
Why are you not using foreach? The GetEnumerator-while(MoveNext)-Current is exactly what foreach does....

For the rest of the code: I am not sure what you're trying to do here (but then I never written a host)...?
Oct 28, 2015 at 6:19 PM
The question I'm really left with, is how do I know that the crackling is because of performance?
  • First make sure you are running optimized release build in release mode. Simplify/comment out the DSP code to do the most minimal working processing (ex: just copy input to output, switch to mono, skip the gain stage etc..). When you get to a state where the DSP is good, work your way back by adding little bits at a time and testing along the way.
In my experience there are two cause to crackling:
  1. Insufficient time to fill the buffers, in this case the fix is to increase the latency block size.
  2. Partially filled buffers, in this case you need to review the DSP algorhitmn to make sure you fill the buffers completely.
Both scenario will cause crackling because they lead to gaps in the audio buffer used by the audio interface.
Oct 28, 2015 at 6:32 PM
Edited Oct 28, 2015 at 6:32 PM
Not the cause of your problem but I think you can hoist out an assignation from the nested loop
                    for (int i = 0; i < workingBuffers[index].Length; i++)
                    {
                            // Moved from inner block
                            workingBuffer = workingBuffers[index][i];

                            for (int n = 0; n < workingBuffers[index][i].SampleCount; n++)
                                 workingBuffer.Buffer[n] *= inputElements[index].Gain;
                    }
Don't see nothing obviously wrong here. Maybe the cause is in:
currentInput.ProcessReplacing(inputAudioBuffers, workingBuffers[index]);

My guess is that 'VstAudioBuffer[] inputAudioBuffers' has holes in it.
After all processing is done, check out inputAudioBuffers and outputAudioBuffers while audio should be playing.
If you find a huge bunch of zeros when it's supposed to be playing then you have holes in the signal.
Oct 29, 2015 at 1:04 AM
Edited Oct 29, 2015 at 2:26 AM
obiwanjacobi wrote:
Why are you not using foreach? The GetEnumerator-while(MoveNext)-Current is exactly what foreach does....
  • Maybe because I used it to iterate in my previous code example?
    My rationale behind it is pretty weak if not pedantic. The past have shown that ForEach implementation is not exactly contractual (see iterator variable hoisting). Avoiding the sugar syntax in some critical execution path is an attempt to avoid breaking changes in the code generation done by future C# compilers. I tend to avoid 'sugary' syntax in the DSP code. What I write is what I get... well kind of but we can help the compiler a little bit and hope :D
Looking back at it, I used it because I wanted to reuse the iterator in two unnested loops like this:
var enumeratorInputOutput = x.GetEnumerator();
ForEach Input Get Next enumeratorInputOutput
ForEach Output Get Next enumeratorInputOutput
Oct 31, 2015 at 12:35 AM
Haha. Yes, Yury - I did copy your code there. I was just experimenting to see what that pattern would look like. I wasn't sure if it was a performance booster or not. I guess now I understand.

Anyway, this was never a performance problem in the first place. I've just diverted the thread accidentally. It turns out that I was outputting from a VST to 2 different things, so ProcessReplacing was getting called twice on the VST within the 1 frame.

I would really like for it to be possible to output to 2 things at once in my synth. I guess I'll have to figure out a way to make a copy of the buffer for multiple outputs.
Nov 4, 2015 at 1:53 AM
Edited Nov 4, 2015 at 1:56 AM
A modular host routing can get messy so it's worth taking time to have a generic routing algorithm.
It's a computer science 101 problem so there's probably a documented optimal solution but I haven't bothered researching it.

Here's my take on it, I'd like to know how I could improve it if you have ideas.
Each plugin with audio inputs can act as a mixer so it needs it's own intermediate mixing buffer.

Before starting playback I order all modules in a tree structure starting from the final outputs (audio card, wave recorder...). I convert this tree to a linear structure (List) starting left to right bottom to top. Say each item of the tree at level 5 first, then level 4 etc.. up to the last item (audio card) at level 0.

Each item in that list is a module tree node with pointers to it's input and output modules. When playback starts I iterate sequentially on the modules list. I call ProcessReplacing on each module using it's intermediate mixing buffer as input buffer. The ProcessReplacing output buffer is a temporary buffer if there are multiple outputs. After processing I mix/copy the temporary buffer in the intermediate mixing buffer of the tree node outputs modules. When there is only one output I skip the temporary buffer as an optimization and call ProcessReplacing with intermediate buffer as input and output. Kind of hard to explain... hope you get the idea.
Coordinator
Nov 4, 2015 at 1:27 PM
Edited Nov 4, 2015 at 1:28 PM
I don't think you need an intermediate mixing buffer...
Some hosts even pass the same buffers for input and outputs for the plugin. That does mean the plugin must not reread any input samples it has written outputs samples for, but for most plugins this seems to work fine. To be safe you could use a double buffer and switch/swap them for every other plugin.

But in essence just pass the output buffers as input buffers to the next plugin...

[2c]
Nov 4, 2015 at 9:08 PM
I don't think you need an intermediate mixing buffer...
  • That's interesting. By intermediate mixing buffer I mean that to do generic mixing without calling ProcessReplacing twice on the same plugin I think I need more then 2 buffers overall. I see how using only 2 buffers is possible if you have a completely linear mixing process. But since in a modular host the user is able to route plugins in any arbitrary fashion (non linear) I fail to see how I can achieve the same results for every routing possible.
Let's say you have only an input and an output buffer to work with so we must reuse that for each plugin ProcessReplacing call.
Given the following tree, how would you process the modules?
Image

Let's try the left to right bottom to top route:
  • Process module #3, input is empty, output contains #3
  • Process module #4, input is empty, output contains #4 overwriting #3 output
  • When it's time to process module #0 we have lost ouput #3 buffer
Let's try another route:
  • Process module #4 #5 #6, input are empty, output is a mix of #4 #5 #6
  • When it's time to process module #2 we have lost output of #6
My approach is simply to put mixing buffers in each node. Even using tricks like passing the same buffer as input/output it seems that eventually I'll run into complex trees that can't be tackled by using only 2 buffers and not calling ProcessReplacing multiple times on the same plugin. I've never been too good in these kind of mathematical (graph?) problems. Perhaps there's something obvious I'm missing. Even then, it seems that an alternative algorithm without mixing buffers would be substantially more complicated. The logic behind my use of intermediate buffers is that the DSP pressure is on the CPU instead of memory so caching the results in memory is better than calling ProcessReplacing each time and discarding the results.
Coordinator
Nov 5, 2015 at 7:42 AM
Edited Nov 5, 2015 at 7:44 AM
Ok allowing a graph complicates things. I was assuming effects/midi plugin chains and mixing only occurs when adding to a bus.

But as a rule you should not have to call a plugin's ProcessReplacing multiple times. You start at the beginning (0?) and process that first. That output goes to 1, 2 and 3 as input. The connection from the input (or is its output?) of 3 to (the output of?) 1 is a mixing action. You can view mixing as a system plugin with variable inputs. This allows you to have one engine that simply works on ProcessReplacing calls. That output (of the mix) is then uses as input for 4, 5 and 6. And so on.

So you always start at the beginning and can call a plugin's ProcessReplacing when ALL its inputs have been processed. When for instance the output of 1 is routed to the inputs of 4, 5 and 6 - you obviously reuse that buffer content. You should prescan the graph to determine the largest number of buffers you will need and pre-allocate those before the audio engine starts. If you want to keep things simple, you allocate a buffer for each node in the graph (times the number of channels).

Disclaimer: this is all theory in my head. I have never actually built something like this...
Nov 5, 2015 at 5:48 PM
Ok allowing a graph complicates things. I was assuming effects/midi plugin chains and mixing only occurs when adding to a bus.
  • Yeah that's a major selling point of modular hosts. Younger people who have not worked in a formal studio environment find it easier to route everything to everything instead of learning bus systems. The only thing they don't get is the way digital audio handles feedback. I decided to block feedback paths in my host, it won't allow you to create a feedback path in the graph. The only alternative I see is to include a delay in the DSP to prevent an infinite loop and the implementation effort doesn't seem worth it.
The graph was actually upside down where the final output (0) is a the top but I get your point. I did combine mixing with the buffer copy operation. Haven't thought of it as a separate ProcessReplacing operation. That could have made the code much simpler for a little bit of overhead.
. You should prescan the graph to determine the largest number of buffers you will need and pre-allocate those before the audio engine starts. If you want to keep things simple, you allocate a buffer for each node in the graph (times the number of channels).
  • That's exactly the optimization I'm missing at the moment as each node has it's own buffer. When no mixing is needed I skip using that buffer but it is still allocated. Thankfully I have not noticed memory pressure in the profiler as it is, probably because most of the application code is static and memory pre-allocated.
Thanks for your input.
Nov 5, 2015 at 9:48 PM
Edited Nov 5, 2015 at 9:48 PM
I think I've made this sound more complicated than it actually is.

Yes, my synth uses a graph (or nexus) system for processing. It's very simple. Each element has a ProcessReplacing method (as part of the interface) which is exactly the same as the Vst.Net ProcessReplacing method. Then, in turn, each element has a set of inputs. These inputs are visually displayed as lines on the UI. When ProcessReplacing is called, it subsequently calls ProcessReplacing on all of its inputs, and they in turn call their inputs. If there are multiple inputs, mixing is necessary. If there is only 1 input, no mixing is necessary. People like to call mixing "Summing" because you are adding the amplitude together. That's why I called my method "Sum". I posted it earlier.

Actually what Obiwan has described in his last post is pretty much exactly what my synth does.

The problem I was having was that an instance of FM8 was the input on two different elements, so ProcessReplacing was getting called on that twice. I supposed that some VSTs might actually handle that scenario. But, FM8 doesn't. So, I've just got to keep a copy of the buffer so if a second element asks for the input from FM8, it just gives it a copy of the copy. Shouldn't be a problem to code up. But, if it is, I'll just stop users from being able to do it.

You should prescan the graph to determine the largest number of buffers you will need and pre-allocate those before the audio engine starts. If you want to keep things simple, you allocate a buffer for each node in the graph (times the number of channels). That's exactly the optimization I'm missing at the moment as each node has it's own buffer. When no mixing is needed I skip using that buffer but it is still allocated. Thankfully I have not noticed memory pressure in the profiler as it is, probably because most of the application code is static and memory pre-allocated.

Actually, what I've done is keep a set of buffers for each element. They don't ever get refreshed until the buffer size changes. That's uncommon so it doesn't affect performance I don't think. I don't think there's really a need to prescan for the largest number of buffers every time. It sounds like I'm doing what Yury is doing.
Nov 5, 2015 at 10:21 PM
Yeah seems like we all agree on the same thing. To keep off-topic, Kuddler how would you deal with feedback in the audio path? I think hosts like Reaktor allows it and adds some internal delay to processing or some audio fade out to get out of the infinite loop. I'm still split on this issue I'd like to implement it but haven't done because it's pretty hard to implement and I doubt there's really a practical need for it.
Nov 9, 2015 at 8:36 AM
Yury, I totally agree. I've thought about this a fair bit. For completeness sake it would be good to allow for a feedback loop, but under normal circumstances it will not be useful for most users. So, I'm going to just validate against it for now.

Saying that, I think I will eventually invent some kind of Max Depth variable that stop processing after it has fed back through itself say 4 or 5 times. I think it would be a fun feature for users to be able to set up a string of effects in serial, and then feed that back on itself. It would create very unpredictable results and that's what I'm all about.