| |
|
10.4 Discrete Cosine Transforms in Image Compression
The discrete cosine transform (DCT) is used to transform a signal from the spatial domain into the frequency domain. The reverse process, that of transforming a signal from the frequency domain into the spatial domain, is called the inverse discrete cosine transform (IDCT).
A signal in the frequency domain contains the same information as that in the spatial domain. The order of values obtained by applying the DCT is coincidentally from lowest to highest frequency.
This feature and the psychological observation that the human eye and ear are less sensitive to recognizing the higher-order frequencies leads to the possibility of compressing a spatial signal by transforming it to the frequency domain and dropping high-order values and keeping low-order ones. When reconstructing the signal, and transforming it back to the spatial domain, the results are remarkably similar to the original signal.
This process, with a few extra bells and whistles and slightly modified versions of DCT, is the essence behind jpeg, mpeg, and mp3 compression.
Here, we look at a simplified case of compression using the DCT and IDCT without bells and whistles. The process:
- X = Apply DCT to a sequence of values.
- X' = Drop a portion of high-order values from X.
- X'' = Apply IDCT to X'
- Draw X'' and observe the similarity to the original X.
For the purpose of comparison, we drop high-order values in the following ranges:
- 1/2 of all values, resulting in 50% compression.
- 3/4 of all values, resulting in 75% compression.
- 7/8 of all values, resulting in 87.5% compression.
10.4.1 1-dimensional compression
We apply the above process to only one dimension (the horizontal) for now. Each row is considered as a sequence of values, and undergoes a DCT and an IDCT.
pomegranate = imread('pomegranateSeeds.jpg');
|
The original size of each row:
origWidth = size(pomegranate, 2);
|
Number of low-order samples to extract for the 3 cases (50%, 75%, 87.5% compression):
samplesHalf = floor(origWidth / 2); samplesQuarter = floor(origWidth / 4); samplesEighth = floor(origWidth / 8);
|
Final results are stored in these matrices:
pomegranateCompressed2 = []; pomegranateCompressed4 = []; pomegranateCompressed8 = [];
|
The following code traverses all colors and for each color all rows. Each row is transformed using the DCT, then high-order samples are discarded, and the low-order samples are reverse-transformed using IDCT.
Function
dct
takes as an argument a sequence of numerical values, and returns a sequence of coefficients.
Function
idct
takes as an argument coefficients, as well as a value for the size of the desired inverse transform vector. Values of the input sequence are padded or truncated accordingly.
For efficiency of coding, all 3 images are built in one round
for k=1:3 % all color layers: RGB for i=1:size(pomegranate, 1) % all rows rowDCT=dct(double(pomegranate(i,:,k))); pomegranateCompressed2(i,:,k) = idct(rowDCT(1:samplesHalf), origWidth); pomegranateCompressed4(i,:,k) = idct(rowDCT(1:samplesQuarter), origWidth); pomegranateCompressed8(i,:,k) = idct(rowDCT(1:samplesEighth), origWidth); end end
|
subplot(2,2,1), image(uint8(pomegranate)), title('Original Image'); subplot(2,2,2), image(uint8(pomegranateCompressed2)), title('Compression Factor 2'); subplot(2,2,3), image(uint8(pomegranateCompressed4)), title('Compression Factor 4'); subplot(2,2,4), image(uint8(pomegranateCompressed8)), title('Compression Factor 8');
|
|
Figure 10.33 Click image to enlarge, or click here to open
|
|
|
Figure 10.34 Click image to enlarge, or click here to open
|
|
10.4.2 2-dimensional compression
In addition to the previous compression of rows, we can compress columns of data as well. The compression rate for the entire image then increases by some factor:
- 50% horizontal, 50% vertical = 25% total
- 75% horiztonal, 75% vertical = 93.75% total
- 87.5% horizontal, 87.5% vertical = 98.44% total
Starting from the horizontal compression:
origHeight = size(pomegranate, 1); samplesHalf = floor(origHeight / 2); samplesQuarter = floor(origHeight / 4); samplesEighth = floor(origHeight / 8);
|
pomegranateCompressed2full = []; pomegranateCompressed4full = []; pomegranateCompressed8full = [];
|
for k=1:3 % all color layers: RGB for i=1:size(pomegranate, 2) % all columns columnDCT2=dct(double(pomegranateCompressed2(:,i,k))); columnDCT4=dct(double(pomegranateCompressed4(:,i,k))); columnDCT8=dct(double(pomegranateCompressed8(:,i,k))); pomegranateCompressed2full(:,i,k) = idct(columnDCT2(1:samplesHalf), origHeight); pomegranateCompressed4full(:,i,k) = idct(columnDCT4(1:samplesQuarter), origHeight); pomegranateCompressed8full(:,i,k) = idct(columnDCT8(1:samplesEighth), origHeight); end end
|
subplot(2,2,1), image(uint8(pomegranate)), title('Original Image'); subplot(2,2,2), image(uint8(pomegranateCompressed2full)), title('Compression Factor 2 * 2'); subplot(2,2,3), image(uint8(pomegranateCompressed4full)), title('Compression Factor 4 * 4'); subplot(2,2,4), image(uint8(pomegranateCompressed8full)), title('Compression Factor 8 * 8');
|
|
Figure 10.35 Click image to enlarge, or click here to open
|
|
|
Figure 10.36 Click image to enlarge, or click here to open
|
|
10.4.3 Discussion
The compression shown here is somewhat simplistic - we compressed full rows separately from full columns. Typically, images are compressed in macroblocks of 8 rows by 8 columns, where each block is linearized into a one-dimensional vector. This approach is certainly better, since it leverages localization of color within a macroblock. It is unlikely that color within an 8x8 block changes dramatically, and thus high compression rates can be achieved while retaining quality of detail.
When does compression fail? Frequently changing colors in dense spaces cannot be represented well with few coefficients. For example, a row of pixels interchanging between black and white pixel-by-pixel, is viewed as a high frequency in the frequency domain. However, a high frequency cannot be represented with few coefficients, and thus dropping high-order coefficients from the DCT removes the necessary detail. This is also the reason why diagrams are not compressed using jpeg compression.
10.4.4 3-rd dimension compression
Out of curiosity, we will compress the 3rd dimension of the image, i.e. the 3-value color for each pixel. Because this requires computing the DCT for rows*columns vectors, and such computation is lengthy, we will resize the original image.
A simple algorithm for resizing by factors of 2 is the nearest neighbor algorithm. For a half-size in both dimensions, we simply remove every other pixel in rows and columns. This is easily done in Matlab:
pomegranateSmall = pomegranate(1:2:size(pomegranate,1),1:2:size(pomegranate,2),:);
|
This image is roughly the same as the original, but it is missing 75% of its content.
Because there aren't many options for compressing 3 values, we will examine compression for:
- 3 to 2 colors
- 3 to 1 color (grayscale)
The final compressed-and-uncompressed images will be stored in these variables:
pomegranateColorCompression2 = []; pomegranateColorCompression3 = [];
|
It is necessary to iterate over all rows and all columns, and for each pixel, we linearize the RGB color values, take the DCT, drop values, and apply the IDCT.
for i=1:size(pomegranateSmall, 1) for j=1:size(pomegranateSmall, 2) rgb = pomegranateSmall(i,j,:); rgb = rgb(:)'; rgbDCT = dct(double(rgb)); rgbIDCT2 = idct(rgbDCT(1:2), 3); rgbIDCT3 = idct(rgbDCT(1), 3);
pomegranateColorCompression2(i,j,1) = rgbIDCT2(1); pomegranateColorCompression2(i,j,2) = rgbIDCT2(2); pomegranateColorCompression2(i,j,3) = rgbIDCT2(3);
pomegranateColorCompression3(i,j,1) = rgbIDCT3(1); pomegranateColorCompression3(i,j,2) = rgbIDCT3(2); pomegranateColorCompression3(i,j,3) = rgbIDCT3(3); end end
|
subplot(2,2,1), image(pomegranateSmall), title('Original Image'); subplot(2,2,3), image(uint8(pomegranateColorCompression2)), title('Compression 2 color values'); subplot(2,2,4), image(uint8(pomegranateColorCompression3)), title('Compression 3 color values');
|
|
Figure 10.37 Click image to enlarge, or click here to open
|
|
|
Figure 10.38 Click image to enlarge, or click here to open
|
|
|
|