All About Sampling

Here it is, finally! Part three of the “Tech Talk” series on the subject of high definition audio. In part one we gave a general overview of high definition digital audio and discussed dither. In part two we went “beyond dither” and discussed noise shaped dither and pre-emphasis. We also explored the question of resolution and asked the question, “how many bits?”. If you are not familiar with these subjects, it may be useful to review the first two installments of this series.

Now that we have an idea about how to best use our available bits, we need to decide which sample rate is best for the transparent reproduction of audio. Let’s start with a brief discussion of the basics of sampling. Joseph Fourier was the French mathematician who in his sampling theorem stated that if you take enough discrete samples of a waveform, you can exactly reproduce it by reconstructing those samples. At regular intervals, one takes a digital “snapshot” of the voltage of a waveform at that particular instant and stores it as a binary value. The Fourier Sampling Theorem tells us that if we take enough of these samples, we’ll be able to accurately reproduce the waveform. So the question is, “how many is enough?”. The most basic answer can be found in another theorem, this one by Nyquist. The Nyquist theorem states that you must sample at twice the rate of the highest frequency you wish to reproduce. (For those really into the details, the theorem states that it must be greater than twice the desired bandwidth, but the “greater than” part is often omitted in conversation.) Therefore, if you wish to reproduce the commonly accepted range of human hearing, 20Hz to 20kHz, you sample at 2 X 20kHz, or 40kHz - 40,000 samples per second. At the CD standard sample rate of 44.1kHz, the system takes 44,100 samples every second. On the face of it, this seems like it would be enough according to the Nyquist theorem, and in a perfect theoretical world, it may be, provided you believe that ultrasonic frequencies are undetectable by humans, and that they don’t affect the audible band. But even if you accept all this, it is, of course, not that simple. There are many other real world implementation problems working against us. The first is the problem of aliasing.

Aliasing and Filters

Let’s look at our example of wanting to reproduce 20Hz to 20kHz by sampling at 40kHz. Real world music has content above 20kHz, even if we grant for the moment that it is unimportant to human perception (which is a whole other debated subject). So does sampling just ignore anything above 20kHz? Unfortunately not. For mathematical reasons, the spectrum is “folded down” around the Nyquist frequency, so 21kHz is reproduced as 19kHz, 22kHz is reproduced as 18KHz, and so on. You can see why this is a problem! The answer is to limit the bandwidth of your incoming signal to avoid presenting content above the Nyquist frequency to prevent the “aliases” from appearing. This, of course, is called the anti-aliasing filter.

The unfortunate reality of building a filter is that it cannot be “ideal." In other words, you can’t just have all the frequencies below
20 kHz unaffected, and anything greater than that eliminated. There are three parts of a filter’s response. The passband, where all frequencies are passed, the stopband, where the frequencies can’t get through (i.e the area of maximum attenuation), and the transition band (rolloff). The rolloff is a slope between the passband and stopband, so the frequencies close to the passband pass more easily than the frequencies close to the stopband. So the first problem we need to account for is the space it takes for the filter to reduce the output to nothing. This is one major reason why CD uses 44.1kHz instead of 40 KHz, to allow space for the rolloff without attenuating the desired passband. The second problem is that of filter ripple. Ripple is a deviation from the flat amplitude response of an ideal filter. On either side of the rolloff region, some frequencies are passed a little louder, and some a little softer than the flat input. As you try to implement a steeper filter (with a smaller rolloff region), the ripple gets worse, the filter becomes more difficult and expensive to build, and the third problem with filters becomes even more pronounced and problematic: phase distortion. Depending on the action of the filter, certain frequencies propagate through the filter more quickly than others. Some expensive digital EQs currently offer an option called “linear phase” which delays all frequencies by the same amount (There also exist analog linear or minimum phase designs). This linear phase distortion isn’t too bad usually. It just introduces a short delay into the signal. The larger problem is what we find in more than 99% of all the other filters in the world. Nonlinear phase distortion, often called “group delay”, causes certain frequencies to lag behind others, and the more aggressive the filter, the worse the potential distortion. In a 44.1 kHz system, the passband ripple and group delay have an impact on the sound quality, and the cheaper the filter, the worse the problem usually is. The last problem, at least with analog filters, is that you have resistors, capacitors, amplifiers, and diodes whose characteristics can drift as they heat or cool, with age, or from one production lot to another, and this causes an unpredictability in the filter design that must be taken into account. So even if you are of the opinion that the information in music above 20 kHz is completely unimportant, and I’m not sure I totally agree with this, it becomes obvious that a 44.1 kHz sample rate is inadequate for transparent audio reproduction based solely on the real world problems associated with the anti-aliasing filter. It should be mentioned that one way that current state of the art addresses filter problems is with oversampling. If one initially samples at a rate that is several multiples higher than the target rate for the audio, one can relax the filter requirements in the initial sampling, and convert this high rate audio to the desired sample rate using linear phase, equiripple digital filters. It's not a perfect solution, but oversampling and current converter design is a subject unto itself, so we'll leave it there for the moment.

While on the topic of filters, there is another at work in digital audio. This is the anti-image (reconstruction) filter. While anti-aliasing filters are often analog, or combinations of analog and digital (as in today’s common oversampling converters), anti-imaging filters are more often digital. These filters are similar to the antialiasing filter, but their main purpose is to eliminate the “stairsteps” from the D/A process. All of the same problems exist in these filters, including ripple and group delay. One problem that is less pronounced is that of the unpredictability of analog filter components. However, another problem creeps in. This is pre-ringing. The group delay in a steep low pass filter sometimes prevents the high frequency transients from arriving in time with the lower frequencies. So, you actually get a tiny burst of signal before the transient. This leads to a blurring of the transient. So the simplest reason to raise the sampling rate is to make the filters less steep, easier to implement, cheaper to make, and less obtrusive to the audible band.

Hearing and Sample Rates

Let’s look at some of the other factors concerning higher sample rates. The most obvious one is the debated question of whether sound above the commonly quoted human auditory perception limit of 20 kHz is significant or not. There are tests that show that some younger people can hear sine waves in air up to 24 kHz if reproduced loudly enough, so that alone suggests needing higher sampling rates, at least for some people. Other tests show that bone conductivity provides perception out to 90 KHz, however the sound is often heard as being between 8 and 16 KHz This suggests to some that it is a distortion process at work, and to others that human hearing has not been sufficiently studied to conclude that ultrasonic sounds are irrelevant. Some also point out that we are concerned with sound in air, not direct bone conductivity, as sound in air is how we listen to music. There have also been studies of the possibility of sensing ultrasonic frequencies with our skin. Certainly, ultrasonics in high levels can cause dizziness and nausea, and thus there are workplace limits on how high ultrasonic sound levels can legally be. There has also been plenty of anecdotal evidence, and some tests that are only semi-scientific, but compelling nonetheless. Rupert Neve does a test where he changes sine waves to square waves with high fundamentals, and people can hear the difference when they should not theoretically be able to, as the only difference is in harmonics that are above the commonly accepted audible range. He also tells a story of Geoff Emmerick correctly pointing out a couple of improperly terminated channels just by listening to the console output when the differences were only a few db down at around 50 kHz. In both cases above, there may be other distortions at work that explain the differences heard, but it remains interesting nonetheless. It has also been pointed out that trumpet with a harmon mute has a harmonic near 50 kHz which is near the amplitude of the fundamental, thus the argument of the upper harmonics being so low as not to matter is not an entirely accurate statement. So it seems that there is sometimes significant energy above 20 KHz, and ultrasonics may in some way be perceptible to humans, or possibly have some affect on what’s in the audible band. The jury may still be out, but it seems reasonable to make some effort to leave a margin of safety in our chosen sample rate. With these things in mind, many people adopt a “better safe than sorry” attitude and shoot for the sky where sample rates are concerned. One problem with this approach is that the storage space and the available rate of transfer from a storage medium have practical limits, thus reducing the amount of audio channels or other related data (pictures, video, text etc.) that can be included, and requiring more DSP power (thus more money) to be spent dealing with these large data requirements. It makes little sense to waste available resources for no reason.

Even if you discount the contested evidence on human perception of ultrasonic frequencies, to ensure coverage of the entire population, you still need to cover a 24 kHz bandwidth according to the studies, plus leave room for gentler filter slopes, and a bit of space to ensure that the filters won’t have audible artifacts due to ripple. At the very least, you still need 60 - 64 kHz sample rates according to most studies and industry task groups. Interestingly, the committee on sample rate in the 70’s had suggested a 60 kHz sample rate, but for practical reasons having to do with the available technology at the time, the 44.1 and 48 kHz rates were settled upon. And the last advantage of a higher sample rate, which was mentioned in the second installment of this series, is that you gain flexibility with noise shaped dither in that you can put more dither energy higher in the spectrum, thus improving low level detail in the critical bands.

One additional thing that should be mentioned is that it has been shown that at the standard CD sample rate, two channels arriving even a sample out of sync can be detected by our brains as they compare the signals from out left and right ears. This is of concern, especially when you add 4 more channels for 5.1 surround sound. According to some experts like Bob Stuart and Tom Holman, the resolution of the sampling on the time axis can potentially be improved. This should not be confused with the fallacy that a higher sample rate provides a more detailed representation of a complex wave within the bandwidth of a system. It pertains only to the potential for better imaging among multiple channels, whether stereo, or surround. The higher sample rates theoretically could give you an advantage in capturing comparative time axis resolution among multiple channels, resulting in better and sharper imaging.

We’ve seen that there are numerous advantages to increasing the sample rate and bit-resolution of our digital audio systems. The actual increases required remain debated, and in practical, real world situations, we have to make decisions based on our bit budget as well as our desire for the best possible quality. How much storage space do we have? How fast can we read data from the delivery medium? What do we want to fit on the disc: how many audio channels, video, still images, text, animations? If we educate ourselves as to the drawbacks, benefits, or lack of benefits of certain things, we can make informed decisions for our project, resulting in the best quality from the available resources.

I know I originally promised to talk in this installment about all of the acronyms inundating us lately, but sampling was too big an issue! You’ll have to wait until next time to get the basics on DVD-A, DVD-V, DTS, DD, AC-3, MLP, and a bunch of other letters that concern modern audio professionals.

Copyright 1998 Jay Frigoletto