{"title": "PCA-Pyramids for Image Compression", "book": "Advances in Neural Information Processing Systems", "page_first": 941, "page_last": 948, "abstract": null, "full_text": "PCA-Pyramids for Image Compression* \n\nHorst Bischof \n\nDepartment for Pattern Recognition \n\nand Image Processing \n\nTechnical University Vienna \n\nTreitlstraf3e 3/1832 \n\nA-1040 Vienna, Austria \nbis@prip.tuwien.ac.at \n\nKurt Hornik \n\nInstitut fur Statistik und \nWahrscheinlichkeitstheorie \nTechnische UniversiUit Wien \n\nWiedner Hauptstraf3e 8-10/1071 \n\nA-1040 Vienna, Austria \n\nKurt.Hornik@ci.tuwien.ac.at \n\nAbstract \n\nThis paper presents a new method for image compression by neural \nnetworks. First, we show that we can use neural networks in a py(cid:173)\nramidal framework, yielding the so-called PCA pyramids. Then we \npresent an image compression method based on the PCA pyramid, \nwhich is similar to the Laplace pyramid and wavelet transform. \nSome experimental results with real images are reported. Finally, \nwe present a method to combine the quantization step with the \nlearning of the PCA pyramid. \n\n1 \n\nIntroduction \n\nIn the past few years, a lot of work has been done on using neural networks for \nimage compression, d . e.g. (Cottrell et al., 1987; Sanger, 1989; Mougeot et al., 1991; \nSchweizer et al., 1991)). Typically, networks which perform a Principal Component \nAnalysis (PCA) were employed; for a recent overview of PCA networks, see (Baldi \nand Hornik, 1995). \nA well studied and thoroughly understood PCA network architecture is the linear \nautoassociative network, see (Baldi and Hornik, 1989; Bourlard and Kamp, 1988). \nThis network consists of N input and output units and M < N hidden units, and is \n\n*This work was supported in part by a grant from the Austrian National Fonds zur \n\nForderung der wissenschaftlichen Forschung (No. S7002MAT) to Horst Bischof. \n\n\f942 \n\nHorst Bischof, Kurt Hornik \n\ntrained (usually by back-propagation) to reproduce the input at the output units. \nAll units are linear. Bourlard & Kamp (Bourlard and Kamp, 1988) have shown that \nat the minimum of the usual quadratic error function \u00a3, the hidden units project \nthe input on the space spanned by the first M principal components of the input \ndistribution. In fact, as long as the output units are linear, nothing is gained by \nusing non-linear hidden units. On average, all hidden units have equal variance. \nHowever, peA is not the only method for image compression. Among many others, \nthe Laplace Pyramid (Burt and Adelson, 1983) and wavelets (Mallat, 1989) have \nsuccessfully been used to compress images. Of particular interest is the fact that \nthese techniques provide a hierarchical representation of the image which can be \nused for progressive image transmission. However, these hierarchical methods are \nnot adaptive. \nIn this paper, we present a combination of autoassociative networks with hierar(cid:173)\nchical methods. We propose the so-called peA pyramids, which can be seen as \nan extension of image pyramids with a learning algorithm as well as cascaded lo(cid:173)\ncally connected autoassociative networks. In other words, we combine the structure \nof image pyramids and neural network learning algorithms, resulting in learning \npyramids. \nThe structure of this paper is as follows . We first present image pyramids and, in \nparticular, the peA pyramid. Then, we discuss how these pyramids can be used \nfor image compression, and present some experimental results. Next, we discuss a \nmethod to combine the quantization step of compression with the transformation. \nFinally, we give some conclusions and an outline of further research. \n\n2 The peA Pyramid \n\nBefore we introduce the peA pyramid, let us describe regular image pyramids. \nFor a discussion of irregular pyramids and their relation to neural networks, see \n(Bischof, 1993). In the simplest case, each successive level ofthe pyramid is obtained \nfrom the previous level by a filtering operation followed by a sampling operator. \nMore general functions can be used to achieve the desired reduction. We therefore \ncall them reduction functions. The structure of a pyramid is determined by the \nneighbor relations within the levels of the pyramid and by the \"father-son\" relations \nbetween adjacent levels. A cell (if it is not at the base level) has a set of children \n(sons) at the level directly below which provide input to the cell, a set of neighbors \n(brothers/sisters) at the same level, and (if it is not the apex of the pyramid) a \nset of parents (fathers) at the level directly above. We denote the structure of a \n(regular) pyramid by the expression n x nlr, where n x n (the number of sons) is \nthe size of the reduction window and r the reduction factor which describes how \nthe number of cells decreases from level to level. \n\n2.1 peA Pyramids \n\nSince a pyramid reduces the information content of an image level by level, an \nobjective for the reduction function would be to preserve as much information as \npossible, given the restrictions imposed by the structure of the pyramid, or equiva(cid:173)\nlently, to minimize the information loss by the reduction function. This naturally \n\n\fPCA-Pyramids for Image Compression \n\n943 \n\nleads to the idea of representing the pyramid by a suitable peA network. Among \nthe many alternatives for such networks, we have chosen the autoassociative net(cid:173)\nworks for two reasons. First, the analysis of Hornik & Kuan (Hornik and Kuan, \n1992) shows that these networks are more stable than competing models. Second, \nautoassociative networks have the nice feature that they automatically provide us \nwith the expansion function (weights from the hidden layer to output layer). \nSince the neural network should have the same connectivity as the pyramid (i.e., \nthe same father-son relations), its topology is determined by the structure of the \npyramid. In this paper, we confine ourselves to the 4 x 4/4 pyramid for two reasons. \nFirst, the 4 x 4/4 pyramid has the nice property that every cell has the same number \nof fathers, which results in homogeneous networks. Second, as experiments have \nshown (Bischof, 1993) the results achieved with this pyramid are similar to other \nstructures, e.g. the 5 x 5/4 pyramid, using fewer weights. \n\nR \n\nE \n\nIi. = E(I n+l ) = E(R(ln\u00bb \n\nL-..J \n\nr n \n\n(a) General Setting \n\n(b) 4/2 pyramid \n\nCorrespon-\n\n(c) \nding network \n\nFigure 1: From the structure of the pyramid to the topology of the network \n\nFigure 1 depicts the one-dimensional situation of a 4/2 pyramid (this is the one(cid:173)\ndimensional counterpart of the two-dimensional 4 x 4/2 pyramid). Figure 1a shows \nthe general goal to be achieved and the notations employed; Figure 1 b shows a 4/2 \npyramid. When constructing the corresponding network, we start at the output \nlayer (Le., I~). For an n/r pyramid we typically choose the size of the output layer \nas n. Next, we have to include all fathers of the cells in the output layer as hidden \nunits. Finally, we have to include all sons of the hidden layer cells in the input \nlayer. For the 4/2 pyramid, this results in an 8-3-4 network as shown in Figure 1c. \nA similar construction yields an 8 x 8-3 x 3-4 x 4 network for the 4 x 4/4 pyramid. \nThe next thing to consider are the constraints on the network weights due to the \noverlaps in the pyramid. To completely cover the input image with output units, \nwe can shift the network only by four cells in each direction. Therefore, the hidden \nunits at the borders overlap. For the 4/2 pyramid, the left and right hidden units \nmust have identical weights. In the case of the 4 x 4/4 pyramid, the network has \nfour independent units. \n\nThe thus constructed network can be trained by some suitable learning algorithm, \ntypically of the back-propagation type, using batches of an image as input for trai-\n\n\f944 \n\nHorst Bischof, Kurt Hornik \n\nning the first pyramid level. After that, the second level of the pyramid can be \ntrained in the same way using the first pyramid level as training data, and so on. \n\n2.2 PeA-Laplace Pyramid and Image Compression \n\nThus far, we have introduced a network which can learn the reduction function R \nand the expansion function E of a pyramid. Analogously to the Laplace pyramid \nand the wavelet transform we can now introduce the level Li of the PCA-Laplace \npyramid, given by \n\nLi = Ii - I: = Ii - E(R(Ii)) \n\nIt should be noted that during learning we exactly minimize the squared Laplace \n\n(a) First 2 levels of a Laplace pyramid \n(upper half) and peA-Laplace pyramid \n(lower half) (grey = 0) \n\n(9) Reconstruction error of house \nimage with quantization of 3 bits, 4 \nbits, 7 bits, and reconstructed image \n\nFigure 2: Results of PCA-Laplace-Pyramid \n\nlevel. The original image 10 can be completely recovered from level In and the \nLaplace levels Lo, ... ,Ln - 1 by \n\n10 = E(\u00b7\u00b7\u00b7 E(E(In) + L n- 1 ) + Ln- 2 )\u00b7\u00b7\u00b7) + Lo\u00b7 \n\nSince the level In is rather small (e.g., 32 x 32 pixels) and the levels of the PCA(cid:173)\nLaplace pyramid are typically sparse (i.e., many pixels are zero, see Figure 2a) and \ncan therefore be compressed considerably by a conventional compression algorithm \n\n\fPCA-Pyramids for Image Compression \n\n945 \n\n(e.g. Lempel-Ziv (Ziv and Lempel, 1977)), this image representation results in a \nlossless image compression algorithm. \nIn order to achieve higher compression ratios we can quantize the levels of the PCA(cid:173)\nLaplace pyramid. In this case, the compression is lossy, because the original image \ncannot be recovered exactly. The compression ratio and the amount of loss can be \ncontrolled by the number of bits used to quantize the levels of the PCA-Laplacian. \n\nTo measure the difference between the compressed and the original image, we use \nthe normalized mean squared error (NMSE) as in (Cottrell et al., 1987; Sanger, \n1989) . The NMSE is given by the mean squared error divided by the average \nsquared intensity of the image, i.e., \n\nNMSE = MSE = ((10 - C(10))2) \n\n(I~) \n\n(I~)' \n\nwhere 10 and C(lo) are the original and the compressed image, respectively. The \ncompression ratio is measured by the amount of bits used to store 10 , divided by \nthe amount of bits used to store C(1o). \n\n2.3 Results \n\nFor the results reported here we trained the networks by a conjugate gradient al(cid:173)\ngorithm for 100 steps! and used a uniform quantization which is fixed for all levels \nof the pyramid. As was shown in (Burt and Adelson, 1983; Mayer and Kropatsch, \n1989), the results could be improved by gradually increasing the quantization from \nbottom to top. \nFigure 2b shows the error images when the levels of the PCA-Laplacian pyramid are \nquantized with 3, 4, and 7 bits and the reconstructed image from the 7 bit Laplacian. \nNote that we used the same lookup-table for the error images. To compress the levels \nof the PCA-Laplacian pyramid, we employed the standard UNIX compress program \nwhich implements a Lempel-Ziv algorithm. \n\niFrom these images one can see that the results with the 4 and 7 bit quantization \nare very good. Visually, no difference between the reconstructed and the original \nimage can be perceived. Table 1 shows the compression ratios and the NMSEs on \nthese images. We have performed experiments on 20 different images, the results \non these images are comparable to the ones reported here. \n\nThese results compare favorably with the results in the literature (see Table 1). \nWe have also applied a 5 x 5/4 Laplace pyramid to the house image which gave a \ncompression ratio of 3.42 with an NMSE of 0.000087 for quantization with four bits \nof the Laplace levels. We have also included results achieved with JPEG. One can \nsee that our method gives considerably better results. \nWe have also demonstrated experimentally what happens if we train a pyramid on \none image and then apply this pyramid to another image without retraining. These \nexperiments indicate that the errors are only a little bit larger for images not trained \non. With five additional steps of training the errors are almost the same. iFrom \n\nlIn all our experiments the training algorithm converged (i.e. usually after 200 steps, \n\nhowever the improvements between steps 20 and convergence are negligible). \n\n\f946 \n\nHorst Bischof, Kurt Hornik \n\nQuant. \n3 Bit \n4 Bit \n7 Bit \nno Quant. \nCottrell,~Cottrell et al., 1987) \nSanger (Sanger, 1989) \n5 x 5/4 Laplace \nJPEG \nJPEG \n\nCompression ratio Bits/Pixel NMSE \n0.0172 \n37.628 \n0.0019 \n24.773 \n0.0000215 \n8.245 \n0.0 \n3.511 \n8.0 \n0.0059 \n0.043 \n22.0 \n0.000087 \n3.420 \n0.00139 \n8.290 \n15.774 \n0.00348 \n\n0.212 \n0.323 \n0.970 \n2.279 \n1.000 \n0.360 \n2.339 \n0.965 \n0.507 \n\nTable 1: Compression ratios and NMSE for various compression methods \n\nthis results we can conclude that we do not need to retrain the pyramid for each \nnew image. \n\n3 \n\nIntegration of Quantization \n\nFor the results reported in the previous section we have used a fixed and uniform \nquantization scheme which can be improved by using adaptive quantizers like the \nLloyd I algorithm, Kohonen's Feature Maps, learning vector quantization, or some(cid:173)\nthing similar. Such an approach as taken by Schweizer (Schweizer et al., 1991) who \ncombined a Cottrell-type network with self-organizing feature maps. However, we \ncan go further. \nWith the PCA network we minimize the squared Laplace level which does not ne(cid:173)\ncessarily yield low compression errors. What we really want to minimize are the \nquantized Laplace levels. Usually, the Laplace levels have an unimodally shaped \nhistogram centered at zero. However, for the result of the compression (i.e., com(cid:173)\npression ratio and NMSE), it is irrelevant if we shift the histogram to the left or \nthe right as long as we shift the quantization intervals in the same way. The best \nresults could be achieved if we have a multimodal histogram with peaks centered \nat the quantization points. \n\nUsing neural networks for both PCA and quantization, this goal could e.g. be achie(cid:173)\nved by a modular network as in Figure 3 for the 4/2 pyramid. For quantization, \nwe could either apply a vector quantizer to a whole patch of the Laplace level, or \nuse a scalar quantizer (as depicted in Figure 3) for each pixel of the Laplace level. \nIn the second case, we have to constrain the weights of the quantization network to \nbe identical for every Laplace pixel. Since scalar quanti.zation is simpler to analyze \nand uses less free parameters, we only consider this case. \nAs each quantization subnetwork can be treated separately (we only have to average \nthe weight changes over all subnetworks), the following only considers the case of \none output unit of the PCA network. \n\n\fPCA-Pyramids for Image Compression \n\n947 \n\nQuantization \n\npeA \n\nFigure 3: PCA network and Quantization network \n\nThe error to be minimized is the squared quantization error \n\nwhere p refers to the patterns in the training set, Ck is the kth weight of the quan(cid:173)\ntization network, and 1 is the output of the PCA-Laplace unit. \nChanging the weights of the quantization network by gradient descent leads to the \nLVQl rule of Kohonen \n\nAc - { 2a(lp - Ck), \n\nk -\n\n0, \n\nif k = kp is the winning unit, \notherwise. \n\nFor the PCA network we can proceed similarly to back-propagation to obtain the \nrule \n\nAWij = -K 8Ep = _K 8Ep alp = _K 8Ep 8~p 8i~ = -2K(lp _ Ck) 8i~ . \n8Wij \n\nalp 8z~ 8Wij \n\nalp 8Wij \n\n8Wij \n\nOf course, this is only one out of many possible algorithms. More elaborate mi(cid:173)\nnimization techniques than gradient descent could be used; similarly, LVQl could \nbe replaced by a different quantization algorithm. But the basic idea of letting \nthe quantization step and the the compression step adapt to each other remains \nunchanged. \n\n4 Conclusions \n\nIn this paper, we presented a new image compression scheme based on neural net(cid:173)\nworks. The PCA and PCA-Laplace pyramids were introduced, which can be seen \nas both an extension of image pyramids to learning pyramids and as cascaded, lo(cid:173)\ncally connected autoassociators. The results achieved are promising and compare \nfavorably to work reported in the literature. \n\nA lot of work remains to be done to analyze these networks analytically. The \nconvergence properties of the PCA pyramid are not known; we expect results similar \n\n\f948 \n\nHorst Bischof, Kurt Hornik \n\nto the ones (Baldi and Hornik, 1989) for the autoassociative network. Also, for the \nPCA network it would be desirable to characterize the features which are extracted. \nSimilarly, the integrated network needs to be analyzed. It is clear that for such \nnetworks, the usual error function has local minima, but maybe they can be avoided \nby a proper training regime (i.e. start training the PCA pyramid, then train the \nvector quantizer, and finally train them together). \n\nReferences \n\nBaldi, P. and Hornik, K. (1989). Neural Networks and principal component analysis: \n\nLearning from examples without local minima. Neural Networks, 2:53-58. \n\nBaldi, P. and Hornik, K. (1995). Learning in Linear Neural Networks: a Survey. \n\nIEEE Transactions on Neural Networks, to appear. \n\nBischof, H. (1993). Pyramidal Neural Networks. PhD thesis, TU-Vienna, Inst. f. \n\nAutomation, Dept. f. Pattern Recognition and Image Processing. \n\nBourlard, H. and Kamp, Y. (1988). Auto-Association by Multilayer Perceptrons \n\nand Singular Value Decomposition. Biological Cybernetics, 59:291-294. \n\nBurt, P. J. and Adelson, E. H. (1983). The Laplacian pyramid as a compact image \ncode. IEEE Transactions on Communications, Vol. COM-31(No.4):pp.532-540. \nCottrell, G., Munro, P., and Zipser, D. (1987). Learning Internal Representations \nfrom Grey-Scale Images: An Example of Extensional Programming. In Ninth \nAnnual Conference of the Cognitive Science Society, pages 462-473. Hillsdale \nErlbaum. \n\nHornik, K. and Kuan, C. (1992). Convergence analysis of local feature extraction \n\nalgorithms. Neural Networks, 5(2):229-240. \n\nMallat, S. G. (1989). A Theory for Multiresolution Signal Decomposition: The \nWavelet Representation. IEEE Transactions on Pattern Analysis and Machine \nIntelligence, Vol. PAMI-ll(No. 7):pp. 674-693. \n\nMayer, H. and Kropatsch, W. G. (1989). Progressive Bildubertragung mit der 3x3/2 \nPyramide. In Burkhardt, H., H6hne, K., and Neumann, B., editors, Informatik \nFachberichte 219: Mustererkennung 1989, pages 160-167, Hamburg. l1.DAGM \n- Symposium, Springer Verlag. \n\nMougeot, M., Azencott, R., and Angeniol, B. (1991). Image Compression with \nBack Propagation: Improvement of the Visual Restoration using different Cost \nFunctions. Neural Networks, 4:467-476. \n\nSanger, T. (1989). Optimal Unsupervised learning in a Single-Layer Linear Feed(cid:173)\n\nforward Neural Network. Neural Networks, 2:433-459. \n\nSchweizer, L., Parladori, G., Sicranza, G., and Marsi, S. (1991). A fully neural \napproach to image compression. In Kohonen, T., Makissara, K., Simula, 0., \nand Kangas, J., editors, Artificial Neural Networks, volume I, pages 815-820. \nZiv, J. and Lempel, A. (1977). A universal algorithm for sequential data compres(cid:173)\n\nsion. IEEE Trans. on Information Theory, 23(5):337 - 343. \n\n\f", "award": [], "sourceid": 922, "authors": [{"given_name": "Horst", "family_name": "Bischof", "institution": null}, {"given_name": "Kurt", "family_name": "Hornik", "institution": null}]}