Understanding Samples

Topics: Audio, Newbie, Plugin Development, VST.NET Core
Dec 14, 2015 at 9:29 AM
Just to clear things up. In this case, when I say "Sample" I mean the stream of numbers that is eventually translated in to audio. As opposed to wave file samples.

So, is my understanding correct?
  • A sample is a number which represents amplitude.
  • Audio over time is represented by a stream of samples.
  • VST.Net represents samples with .Net type float
But, a float is a 32 bit number. My audio card only deals with 16 bit, and 24 bit numbers. How does the conversion work? How is a 32 bit float eventually converted to a 16 bit, or 24 bit number that is outputted by the sound card?

I ask this because I am trying to use some code that manipulates a stream of samples. But, the comments say that the data it expects:

The data passed to the routine in indata[] should be in the range [-1.0, 1.0)

So, how do I transform the samples within VST.Net to work in the buffer it is asking for?

Sorry if this sounds like an absolute DSP for noobs question. It's because I'm a noob.
Coordinator
Dec 14, 2015 at 11:33 AM
Yes, a sample is a single value (float or double-for extra precision) in the [-1.0,1.0] range that represents the amplitude of the audio signal. The sample rate indicates how many of these values there are for every second.

If your audio library supports Int16 (or UInt16?) you have to map the values. So the full range of -1.0 -> 1.0 represents the full range if Int16.MinValue -> Int16.MaxValue. You can map these values linearly.

Check out the download page, there is a converter sample project that implements some methods of converting these samples.
There are also several other threads in the discussion board about this subject.

Hope it helps,
Dec 14, 2015 at 7:10 PM

Thanks. Yes. That's roughly what I thought. The audio library I'm using is the same. Floats [-1-1].

Out of curiosity, is float used because we never know the bit rate of the audio card? I mean, if Int16 were used, it wouldn't have high enough resolution for say 24bit audio cards?

Is float the generally accepted standard? Does 16bits worth of resolution exist between -1 and 1?

Sorry about all the obvious questions. I just need to make sure I understand the fundamentals.




On Mon, Dec 14, 2015 at 3:33 AM -0800, "obiwanjacobi" <[email removed]> wrote:

From: obiwanjacobi

Yes, a sample is a single value (float or double-for extra precision) in the [-1.0,1.0] range that represents the amplitude of the audio signal. The sample rate indicates how many of these values there are for every second.

If your audio library supports Int16 (or UInt16?) you have to map the values. So the full range of -1.0 -> 1.0 represents the full range if Int16.MinValue -> Int16.MaxValue. You can map these values linearly.

Check out the download page, there is a converter sample project that implements some methods of converting these samples.
There are also several other threads in the discussion board about this subject.

Hope it helps,
Coordinator
Dec 14, 2015 at 7:57 PM
The float is a choice of Steinberg who made the VST interface(s). Could have been something else but they decided on float in a -1.0, 1.0 value range.
The resolution of a float is 32 bits it uses to represent the value. I don't know the technical details (I'm sure wikipedia can help out there) but in practice its more than enough resolution.
Dec 14, 2015 at 8:56 PM

It got too technical so now I'm going to listen to the album "Floating Point" by John McLaughlin as per Wikipedia's recommendation.

Thanks though! That has cleared things up. I only ask because users tend to fixate on stuff like this. There are religious levels of faith that people put in certain DAWs and VSTs, and an equal amount of distain for others. All unwarranted I'm sure. I'm always looking for objective information on how to explain things to people: i.e. numbers are numbers, bits are bits.


Dec 14, 2015 at 10:27 PM
Wasn't the VstAudioPrecisionBuffer intended for 64bit precision (double) and the VstAudioBuffer intended for 32bit precision?
Dec 15, 2015 at 12:30 AM
Edited Dec 15, 2015 at 12:37 AM
The simplest way to understand what is a sample IMO is to think of it's direct effect on a physical speaker cone.
The cone can move backward and forward X times per second. The X time is the sample rate, with 44.1 kHz sample rate you are moving the cone 44100 times per seconds. The motion range of the cone is equivalent to the range of your sample [-1, 1] and the sample is the position in the interval where the speaker should move to. The difference between 64bit (double samples) and 32bit (float samples) is the precision of the speaker position. If you had 1 bit audio (0 or 1) the speaker could only be Min/Max or move 1 position towards min/max from where it is currently. This is usually not practical but would work if the sampling rate was very high. Some new hardware are supporting that, the sample value is not absolute in the range [-1, 1] it's instead a relative value that tells move current sample down or up.

Working on samples amplitude (known as time domain) is a pretty easy concept, audio becomes complicated when frequencies are involved (known as frequency domain). To work on frequencies, changing the amplitude of a single sample is insufficient. You have to look at a bigger window of samples and perform complex math to interpret frequencies (involves Fast Fourier transform and non-linear convolution).
Dec 15, 2015 at 1:50 AM
Bonus: the microphone is the inverse process. Sound wave put pressure on a foil. The position of the foil is measured X times per second (sampling rate). Precision of the foil position is determined by the number of bits. Not technically accurate at the hardware level but as a basic concept I find it's easy to understand.

It's similar to image sampling. When you had 2 bit colors you could only do black and white. Then 2 bits palette giving 4 colors. At that time monitors were very low resolution. Now imagine if they had 4 bits color (sample bits) with 256K resolution (sample rate) on a 14" screen at that time. With a setup like this, you can use only 4 bits for fully saturated red/green/blue and achieve a great result. That's because the resolution (sample rate) is so high that your eyes blend the low precision pixels together. Same happens with audio, sample rates and sample precision are correlated in the end result. Sample rate being the most important variable as it can compensate for a lack of precision. Practically once a sufficient precision is reached (arguably 16 bits) better sounding result are achieved by increasing sample rate. Precision being useful in scenario where you are going to do a lot of math processing on the audio like mastering.