Aqua Phoenix
     >>  Lectures >>  Matlab 10  


10.5 Discrete Cosine Transform in Audio Compression

We examine DCT for audio compression, as we did for image compression. This would be a first-level approximation to mpeg audio compression, without the bells and whistles.

An audio waveform is a continuous sequence of data in one long vector. In that sense, an audio data structure is different from an image data structure. We will need to apportion the vector manually into several pieces, and cannot rely on existing rows or columns.

Read the audio file:

[funky, f] = wavread('funky.wav');
Determine a value for the number of samples that will undergo a DCT at once. In other words, the audio vector will be divided into pieces of this length.

windowSize = 8192;
Again, we examine at different compression rates:
  • 50%
  • 75%
  • 87.5%

samplesHalf = windowSize / 2;
samplesQuarter = windowSize / 4;
samplesEighth = windowSize / 8;
Resulting compressed-and-uncompressed audio waves:

funkyCompressed2 = [];
funkyCompressed4 = [];
funkyCompressed8 = [];
For simplicity, we iterate over the vector, window-by-window, but we discard whatever remainder exists:

for i=1:windowSize:length(funky)-windowSize
    windowDCT = dct(funky(i:i+windowSize-1));
    funkyCompressed2(i:i+windowSize-1) = idct(windowDCT(1:samplesHalf), windowSize);
    funkyCompressed4(i:i+windowSize-1) = idct(windowDCT(1:samplesQuarter), windowSize);
    funkyCompressed8(i:i+windowSize-1) = idct(windowDCT(1:samplesEighth), windowSize);
h1 = subplot(4,1,1); plot(funky), title('Original Waveform');
subplot(4,1,2), plot(funkyCompressed2), title('Compression Factor 2'), axis(axis(h1));
subplot(4,1,3), plot(funkyCompressed4), title('Compression Factor 4'), axis(axis(h1));
subplot(4,1,4), plot(funkyCompressed8), title('Compression Factor 8'), axis(axis(h1));
Figure 10.39
Click image to enlarge, or click here to open
Figure 10.40
Click image to enlarge, or click here to open
h1 = subplot(4,1,1); plot(funky(100000:120000)), title('Portion of Original Waveform');
subplot(4,1,2), plot(funkyCompressed2(100000:120000)), title('Portion of Compression Factor 2'), axis(axis(h1));
subplot(4,1,3), plot(funkyCompressed4(100000:120000)), title('Portion of Compression Factor 4'), axis(axis(h1));
subplot(4,1,4), plot(funkyCompressed8(100000:120000)), title('Portion of Compression Factor 8'), axis(axis(h1));
However, closer inspection does reveal qualitative differences in the densely packed regions (high frequencies).

Figure 10.41
Click image to enlarge, or click here to open
subplot(4,1,1), specgram(funky), title('Original Waveform');
subplot(4,1,2), specgram(funkyCompressed2), title('Compression Factor 2');
subplot(4,1,3), specgram(funkyCompressed4), title('Compression Factor 4');
subplot(4,1,4), specgram(funkyCompressed8), title('Compression Factor 8');
A look at the spectrogram reveals a clear idea of the loss of high frequencies.

Figure 10.42
Click image to enlarge, or click here to open
The qualitative difference is clearly apparent when listening to the audio files:

wavplay(funky, f);
disp('Compression Factor 2');
wavplay(funkyCompressed2, f);
disp('Compression Factor 4');
wavplay(funkyCompressed4, f);
disp('Compression Factor 8');
wavplay(funkyCompressed8, f);