There is a single very annoying thing about lots of audio software products which is due to either lack of the programmers' knowledge about the human auditory system, laziness, or even both. If you are a programmer who could ever be involved — even remotely — in the development of a software or hardware product involving sound, please read this text carefully, burn its core message into your memory and spread the news!
Because people have so little time nowadays, here is the essence of this text, compressed into a few sentences:
If you want to know more, read on. Otherwise, read the above list again and make sure you'll never forget it.
Most audio software nowadays has sliders or even rotating knobs to control the volume. The intention is to simulate the sliders of ‘classic’ audio hardware. Unfortunately, there is one thing about a lot of volume sliders which makes them a pain in the ass: they are LINEAR. You might ask, what could possibly be wrong with a linear slider: it is zero at the one end, 100% at the other end, and neatly linear in between, isn't that just ideal? The answer is a big no.
Just try this: open your favourite audio player, start playing a song, grab the volume slider, and wobble it to and fro at the ‘loud’ end of the volume range. Next, do the same at the ‘silent’ end of the volume range. Chances are that you will experience the following: almost no audible volume variations at the ‘loud’ end, and extreme volume variations at the ‘silent’ end even if you made smaller excursions with the latter. In that case you can be pretty sure the slider is linear.
A few popular applications that suffer from this flaw at the time of this writing, are:
The evil has even spread to hardware. Velleman sells a solderable kit of a graphic equaliser, K4302. I don't know if this has been corrected now, but when I bought the kit around 1995 it had linear sliders while they should be logarithmic (C law if I'm correct). Even the G3 iMac's volume control was linear, and I'm afraid that this is just one of many examples. The result is that the most silent volume setting is still way too loud even if the volume increments are small, and the perceived maximum volume level is already reached around the middle of the slider. Ultimately this leads to frustrated people cursing the damn volume control, or feeling uneasy while using your product without really knowing why. Luckily there are lots of products with correct volume controls, but I have the feeling that they are only a minority.
Now what exactly is wrong with a linear volume slider? The answer lies within the way our ears perceive sound. The point is that our sensation of ‘loudness’ is LOGARITHMIC.
This means that with silent sounds, we are much more sensitive to small variations in amplitude than with loud sounds. This allows us to cope with a very large dynamic range of sound amplitudes. It also means that with a linear volume slider we have a logarithmic sensation of volume variations, and that just doesn't feel right. At the right you can see a logarithmic curve. Two identical sections are marked on the horizontal axis (read: the volume slider). The vertical axis shows the perceived volume changes. As you can see, the corresponding section marked by the curve at the ‘silent’ end is much larger than at the ‘loud’ end.
The solution to implementing a real volume slider is fairly simple: instead of being linear, the slider should be EXPONENTIAL. Because log(exp(x)) = x, the sensation of volume variations will be linear, and that is what we want(2).
In this text I will assume that the amplitude of the audio hardware is controlled by giving it a value between zero (silence) and 1.0 (maximum).
Exponential functions however have two annoying properties. The first is that they only reach zero at minus infinity. One can't make a slider that is infinitely long (but as will be explained below, in an ideal setup there is no need to reach absolute zero). The second is that in the general form y = a·exp(b·x)+c, an exponential function going through two points can have various shapes. Even a linear function is a limit case of such a curve. Luckily in the case of our volume control, we can and should limit the equation to y = a·exp(b·x) because our ears do not have an offset. This means that two points suffice to obtain a unique solution for the constants a and b. We already know one of those points, because we want the function to have a value of 1 for x = 1. This means that a = 1/exp(b). So the problem is reduced to determining the correct value of b, which controls the shape of the curve. Small values produce a very ‘sharp’ curve while large values produce a more linear-like curve.
If you are still thinking linearly you might be tempted to pick (0,0) as the second point, which it is not. As I said above, our exponential volume control will inevitably still have a non-zero amplitude at the zero slider position. This is not a problem because the logarithmic response curve of our ears also hits zero below a certain non-zero input loudness, the hearing threshold. In fact a sound played in any normal environment will already become inaudible when its loudness is below that of the background noise, which is nearly always well above the hearing threshold. The major problem is that even though the hearing threshold is roughly the same across different persons, the loudness produced by any audio system for a given signal amplitude depends on a multitude of parameters. To determine the correct value for b, we need more information. If we want to provide the user with a “fully linear volume control sensation”, we would need to know how ‘loud’ his/her audio equipment plays at its loudest setting. You'll immediately understand that this is not a practical question. There simply is no specific answer to it unless you are developing software for some very specific audio hardware. So we will have to make some assumptions. First something about how sound ‘loudness’ is measured.
Because the human auditory system has a logarithmic sensitivity curve, a special unit of ‘sound loudness’ was invented. The unit is the deciBel, abbreviated to dB. Actually the original unit is the Bel but this unit is so large that it is always used with a factor 1/10, hence deci-Bel (1 Bel = 10 dB). There are two kinds of dB scales, an absolute scale and a relative scale.
The absolute scale tries to give an indication of how loud a certain sound is perceived by an average human listener, aka the “sound pressure level” (SPL). There are some variations on this scale, but the most widely used one is the “dB(A)” scale. To determine the dB(A) value for a certain sound, the sound has to be filtered through a filter which corresponds to the frequency response curve of the “average human”. Next, the 10-base logarithm of the power is taken and the result is multiplied by 10. I will not go into more detail on this because it is not of much use here. What you should know is that the most silent audible volume level (the ‘hearing threshold’) corresponds to 0dB(A). In practice, people will already perceive 30dB(A) as silence because that is about the background noise level in many environments. Being in an environment with 0dB(A) is actually a weird experience. The loudest volume level (the ‘pain threshold’) is about 120dB(A). A classical orchestra can produce about 94dB(A). Note that because of the logarithm, multiplying the power of a sound by a factor of 10 means adding 10 to the dB(A) value.
The relative scale is used for all kinds of physical quantities, and indicates the relative amplitude of a signal compared to another. The symbol is simply “dB”. The calculation of the dB value depends on whether you are working with amplitude values or power values. For power values, the formula is 10·log10(x), with x the relative power. For amplitude values, the formula is 20·log10(x). The reason is that power ∝ amplitude2, and the square (second power) becomes a factor two after taking the logarithm.
Theoretically, the absolute and relative scales cannot be readily interchanged. When taking a sound of 90dB(A) and attenuating it to −20dB, there is no guarantee at all that it will be perceived exactly as 70dB(A). But in practice it will be an OK approximation, therefore in this text I consider the scales as compatible.
Now we know more about the dB scale, we can go back to our problem of determining a good b value in a·exp(b·x). We should make sure the resulting curve results in a near linear loudness experience with the listener. To start with, there is little to no point in going below 30dB(A) because the background noise in any realistic environment will be around that level. So we should consider 30dB(A) as the threshold instead of 0dB(A).
Now let us assume that the maximum loudness that can be produced by the user's equipment is 90dB(A). This is quite loud already and people will generally not want to expose themselves to more than 90dB(A) for a prolonged period anyway. Built-in speakers in PCs and laptops may not even be able to reach this level, but earphones and headphones as well as Hi-Fi or PA systems can easily exceed it.
We now know two points of our y = a·exp(b·x) curve, namely: (0, 30dB(A)) and (1, 90dB(A)). If we move to relative units, this translates to either (0, −60dB) and (1, 0dB) when working with the usual convention of attenuation levels. If we offset this by 60dB we get (0, 0dB) and (1, 60dB), making our calculations somewhat more intuitive. Given that we work with amplitudes, 60dB is 1060/20 = 1000 times the amplitude of 0dB. Hence 1000 = exp(b·1) and b = ln(1000) = 6.908. The value of a is simply 1/1000.
Now we have a practical curve which should produce an agreeable result in most situations. Theoretically, the lowest position on the slider should correspond to 30dB(A), the level at which any sound becomes masked by background noise. Although this means there is no real need to force the output to zero, in practice this is desirable because people expect absolute silence at the zero setting, and this is not guaranteed with all our guesswork. A simple solution is to add “
if(x == 0) ampl = 0;” to the slider code. For a smoother transition to zero, something like this could be used: “
if(x < .1) ampl *= x*10;”
Table 1 shows values for a and b for various dynamic ranges (i.e. the difference between the maximum loudness and background noise level), giving the ‘ideal’ response curve for a volume control whose position is described by a number in the interval [0,1]. If you can afford implementing the exponential function in your software/hardware, by all means use this formula. If you do not know for sure what the actual maximum loudness is that the consumer's hardware can produce with the volume control at position 1, try to make an educated guess. 90dB(A) with a background noise level of 30dB(A) hence a useful dynamic range of 60dB, is probably a good guess. It will never be exact anyway because the db(A) value also depends on the kind of sound being played. Yet, even a curve with parameters for a max dB(A) that is off quite a bit, will still be much better than a silly linear curve especially when using the smooth roll-off to zero as described above.
Many programmers will not like including an entire math library just to calculate an exponential function to make their volume slider right. Luckily, there is an alternative which sufficiently approximates an exponential curve, is much cheaper and reaches zero at zero automatically. The graph at the right shows three curves: the linear curve (yuck), the 60dB exponential curve (red), and the curve of the function x4 (blue). As you can see, the blue curve lies pretty close to the red curve, and you can also see how monstrously the linear curve deviates. The fourth power-function demands only three multiplications (or two at the cost of an extra line of code), and it starts from zero, what more could one want?
I tried the x4 curve in some experiments and for most volume settings it has a very natural ‘feel’, so I can highly recommend it. Depending on your personal taste you may find x5 an even better approximation. Keep in mind that in situations where the maximum volume is rather quiet you may need a less ‘strong’ curve like x2, and a ‘stronger’ curve if the maximum volume is really loud. For a dynamic range of 90dB, x6 is a good approximation but keep in mind that only few systems will need that kind of range.
If you are going to use a discrete control instead of a slider, e.g. a volume control that can be increased or decreased in steps by pressing an ‘up’ and ‘down’ button, you should be aware that the smallest difference in volume that humans can perceive is about 1dB, or 10%. Actually this also counts for many other perceptions, like the size of an object, or speed. Hence it is useless to make your increments smaller than 10%, but you shouldn't make them too large either or your volume control will be too coarse.
I sometimes get mails from people who want to know how they should configure a hardware or software volume control that already uses dB values by design. Some seem to believe they still need to apply a non-linear transformation to the dB values. No! The only things you need to determine there, are the range you want to use and the step size if applicable. For instance, if the volume control offers a range of 120dB, most likely you will want to limit it to the upper 60dB range. Some controls offer attenuation (negative dB values) as well as amplification (positive dB values), you need to determine if for your application you need either or both.
Some people have half-baked knowledge about sound perception being logarithmic, which causes them to make half-baked reasonings like the following. “A sound of 98dB(A) is annoyingly loud, but if we can reduce it to 95dB(A), it is only half the power, therefore only half as annoying!” Right and wrong. The power is indeed halved (and the amplitude reduced to a fraction 0.71), but the perceived loudness is only 3dB lower. Since a 1dB difference is at the limit of being unnoticeable, 3dB is only just above barely noticeable. 95dB(A) is still mightily loud and unless other characteristics of the sound have changed, it will be only slightly less annoying than at 98dB(A). The same reasoning is often applied to hearing damage, which is equally wrong because the relation between loudness and hearing damage is not linear as well.
Remember that all this does not only apply to sliders. It also applies to rotating knobs (although these are quite rare in software, but you bet that all potentiometers of decent audio equipment have an exponential characteristic) and menus with volume presets. It also counts for equalisers, because these are volume controls in their own right, even if they only control a part of the frequency spectrum. After reading this text it should be clear that implementing volume controls is not exact science except in well-controlled situations. However, the core message you should take home is: volume must be exponential, or at least look like it!
This is somewhat less of an issue, because few applications have to deal with frequencies at the user end. However, for those that do a similar story holds, but with a slight difference. The human sensation of ‘tone’ is also far from linear. But it is not exactly exponential either. At the low frequency side, it is more linear, while at the high frequency side it is exponential. However, an exponential curve is a much better overall approximation than a linear curve. So please, no linear frequency controls either! You would not want to listen to a piano tuned to a linear scale.
This does not only apply to sound generation but also to sound analysis. If you want to create a spectral analysis, the graph should have a logarithmic scale (on both axes, frequency and amplitude) unless there are specific reasons to use a linear scale. With a linear frequency scale all low frequencies will be squeezed into a few lines while the high frequencies will be smeared over a wide area. Mind that even though the audible sound range reaches until 20kHz, “high frequencies” already start at ±2kHz! The most interesting stuff in music happens below 2 kHz. For speech, you can't do much with frequencies above 4kHz (that's why telephones filter these out). Yet these would occupy 80% of a linear spectrogram!
Unfortunately it is not easy to generate a spectrum with a logarithmic frequency axis. FFT's are linear and the only way of getting a log scale from an FFT is to warp the output, resulting in poor resolution at the low frequencies and exaggerated resolution at the high frequencies. To counter this, one could take an FFT of such high resolution that it is accurate enough even at the lowest frequencies. That will however result in poor temporal resolution at high frequencies. There is no such thing as a variation on the FFT which produces a log scale right away, but other approaches can be used, for instance a filter bank with filters whose bandwidth increases with frequency. The only problem with that approach is that the time interval of the lower frequency filters needs to be longer than the higher frequencies, which makes it difficult to provide a unified frequency response at any given time.
(1): A funny example of this is the current (early 2012) BBC video player that is embedded in news article pages. It has a Spinal Tap-inspired volume slider that goes “all the way up to eleven” but because it is linear, the difference between 10 and 11 is completely unnoticeable. There is hardly even any perceivable increase in volume between 8 and 11.
(2): This equation may not be strictly mathematically correct, but it is sufficiently valid for all intents and purposes in this text.