Recent Comments |
Categories |
Archives |
Tags |
Audio Normalization
I consider my understanding of audio technology limited but sometimes, on internet forums, I even recognize incorrect answers about topics I consider very basic. Before reading on you might want to read one of my earlier posts From (True) Peak via RMS to LUFS.
Recently, there was a discussion about audio normalization following a question what it exactly is and whether or not you should normalize your audio fragments (e.g., as part of gain staging). Couple of answers indicated that you should never normalize your audio since it would affect its dynamics. This is not true. Dynamics is, for example, changed by compressors and limiters but, in principle, normalization should not affect this.
When I say (below) that dynamics is not affected, I should be more precise. In music production, dynamic range means the difference between the loudest and quietest sounds. It’s measured in decibels (dB). In a single audio track, dynamic range means the dB difference between the loudest and quietest moment in the audio file. I we normalize an audio file then, of course, the dynamic range is changing. However, when I say ‘dynamics is not changing’ then I talk about the relative dynamics, i.e., the ratios between the audio levels at certain time points will not change.
However, there is much more to say about audio normalization then it looks at first sight. Below a further explanation.
dBFS
Let us first define dB Full Scale (dbFS). 0 dBFS is the highest signal level achievable in a digital audio WAV file. Higher levels are possible inside a DAW such as Cubase, but in the files that are recorded on disk, 0 dBFS is the highest level. All other levels can be defined with respect to 0 dBFS. So for example a signal that is 10 decibels lower than the maximum possible level is -10 dBFS.
Now, we can define the gain as G(dBFS) = 20 log10 (A/A0) with A representing the amplitude (peak-to-peak voltage level ) of the audio wave and A0 is the reference level, i.e., the amplitude at 0dbFS. Thus, a gain of 0dbFS corresponds to A=A0 (since log(1)=0).
If we reduce the audio level (amplitude A) to -6dB then we get -6dBFS = 20log10(A/A0), which gives A = 0.5*A0. Thus, reducing the audio level (volume) with a factor 2 is the same as applying a gain of -6dB. Similarly, reducing the volume to 25% corresponds to a gain reduction of -12dB.
We can also look at this little bit differently. If we change the audio level with a factor f we get G(dBFS) = 20 log10 (f*A/A0). This gives G(dBFS) = 20 log10 (f) + 20log10(A/A0). In the context of dbFS we have f<=1.0. Thus we see that multiplying the audio signal with a certain factor (in our case a reduction of volume; f<=1.0) is identical to substracting 20log(f) dB’s.
The maximum peak level is reached at the end of the binary bit-depth resolution (all 1’s in, typically speaking, a 16-bit or 24-bit system). All 0’s, then, would represent no digital signal
Audio normalization
Audio normalization is nothing else then adding a constant amount of gain until a pre-specified target level (e.g., 0dBFS the highest level in a digital system) is reached. This does not affect the signal-to-noise ratio nor the dynamics.
But there is more to say. We can distinguish between peak normalization and loudness normalization. In peak normalization the audio level is increased until its highest level (peak) reaches its target value. In loudness normalization the level is adjusted based on perceived loudness. However, both only affect the audio level (volume) and nothing else.
In peak normalization we, thus, multiply the audio signal such that the target level (e.g., 0dbFS) is reached. The procedure for loudness normalization is more complicated but in essence it is doing the same thing (see this MSc thesis).
Since peak normalization is based on the highest level, peak normalization alone does not account (alone) for the apparent loudness of the audio. Our perception of loudness is largely unrelated to the peaks in a track, and much more dependent on the average level throughout the track. We naturally perceive a track with a higher average level, with less high peaks as “louder” than a track with a lower average level and higher peaks. Peak normalization to 0dBFS can still clip the audio signal due to inter-sample peaks (True Peaks) or due to further processing of the signal. Therefore, if possible, one should preferably do peak normalization for true peaks but in general it is a good idea to leave some headroom.
With loudness normalization, since it normalizes the average level, the peaks of audio may start to clip resulting in distorted audio. Therefore, one should be careful since while loudness normalization should not affect dynamics it may do so in practice if one attempts to increase the average level too much (in which case compression or limiting might be applied by the algorithm).
In the past I peak normalized all my audio clips in Cubase as a sort of ‘gain staging’. I abandoned that approach for other, more appropriate, gain staging approaches.
Some examples
To demonstrate both types of normalization I have taken a short audio fragment of a drum track (32 bit, 44.1kHz) and normalized in Steinberg Wavelab Pro 10.0.
Not normalized (original wave file)
You can see that the maximum levels (digital peaks and true peaks) and the loudness level in the screens below (analyzed by WaveLab). The true peak level is around -4dB and the loudness around -20 LUFS (Loudness Units Full Scale). Strangely, the maximum levels of the wave do not seem to completely correspond to the digital peak levels as analyzed by Wavelab. For this analyses this is not too important. You can listen to this audio clip on soundcloud:
Loudness normalized file with subsequent peak normalization to -5.5dBFS
Next, I applied peak normalization to the previous loudness normalized clips. This brings the peaks back to about -5.5 dBFS but does, of course, not restore the dynamics. In comparison to the unnormalized clip, this clip still has a higher volume (about -18 LUFS compared to -20 LUFS for the unnormalized clip) but it also sounds different because the dynamics was changed. Thus, one should be careful with loudness normalization.
Loudness Normalization to -16LUFS
Next I applied loudness normalization to about -16 LUFS (instead of -12 LUFS in the previous example). This does not cause any peaks to go beyond 0 dbFS (no peak clipping) and hence the dynamics is not changed and this clip sounds the same as the peak normalized clip.
Further references
- Loopmasters
- Pro-tools-expert.com
- From (True) Peak via RMS to LUFS
- The ultimate dB guide
- Adaptive Normalisation of Programme Loudness in Audiovisual Broadcasts (MSc Thesis, Herman Molinder)