{"title": "Image Recognition in Context: Application to Microscopic Urinalysis", "book": "Advances in Neural Information Processing Systems", "page_first": 963, "page_last": 969, "abstract": null, "full_text": "Image Recognition in Context: Application to \n\nMicroscopic Urinalysis \n\nXuboSong* \n\nDepartment of Electrical and Computer Engineering \nOregon Graduate Institute of Science and Technology \n\nBeaverton, OR 97006 \nxubosong@ece.ogi.edu \n\nJoseph Sill \n\nDepartment of Computation and Neural Systems \n\nCalifornia Institute of Technology \n\nPasadena, CA 91125 \n\njoe@busy.work.caltech.edu \n\nYaser Abu-Mostafa \n\nHarvey Kasdan \n\nDepartment of Electrical Engineering \n\nInternational Remote Imaging Systems, Inc. \n\nCalifornia Institute of Technology \n\nChatsworth, CA 91311 \n\nPasadena, CA 91125 \n\nyase r@work.caltech.edu \n\nAbstract \n\nWe propose a new and efficient technique for incorporating contextual \ninformation into object classification. Most of the current techniques face \nthe problem of exponential computation cost. In this paper, we propose a \nnew general framework that incorporates partial context at a linear cost. \nThis technique is applied to microscopic urinalysis image recognition, \nresulting in a significant improvement of recognition rate over the context \nfree approach. This gain would have been impossible using conventional \ncontext incorporation techniques. \n\n1 BACKGROUND: RECOGNITION IN CONTEXT \n\nThere are a number of pattern recognition problem domains where the classification of an \nobject should be based on more than simply the appearance of the object itself. In remote \nsensing image classification, where each pixel is part of ground cover, a pixel is more like(cid:173)\nly to be a glacier if it is in a mountainous area, than if surrounded by pixels of residential \nareas. In text analysis, one can expect to find certain letters occurring regularly in particu(cid:173)\nlar arrangement with other letters(qu, ee,est, tion, etc.). The information conveyed by the \naccompanying entities is referred to as contextual information. Human experts apply con(cid:173)\ntextual information in their decision making [2][ 6]. It makes sense to design techniques and \nalgorithms to make computers aggregate and utilize a more complete set of information in \ntheir decision making the way human experts do. In pattern recognition systems, however, \n\n*Author for correspondence \n\n\f964 \n\nX B. Song, J Sill, Y. Abu-Mostafa and H. Kasdan \n\nthe primary (and often only) source of information used to identify an object is the set of \nmeasurements, or features, associated with the object itself. Augmenting this information \nby incorporating context into the classification process can yield significant benefits. \n\ni = 1, ... N. With each object we associate a \nConsider a set of N objects Ti , \nclass label Ci that is a member of a label set n = {1 , ... , D} . Each object Ti \nis characterized by a set of measurements Xi E R P, which we call a feature vec(cid:173)\ntor. Many techniques [1][2][4J[6} incorporate context by conditioning the posterior \nprobability of objects' identities on the joint features of all accompanying objects. i.e .\u2022 \nP(Cl, C2,\u00b7\u00b7\u00b7 , cNlxl , . . . , XN). and then maximizing it with respectto Cl, C2, . .. , CN . It can \nbe shown thatp(cl,c2, . . . ,cNlxl, . . . ,xN) ex p(cllxl) ... p(CNlxN) (~ci\"\"'(N\\ given \ncertain reasonable assumptions. \n\n1 \u2022.. p CN \n\np \n\nOnce the context-free posterior probabilities p( Ci IXi) are known. e.g. \nthrough the \nuse of a standard machine learning model such as a neural network, computing \nP(Cl, ... ,CNlxl, . . . ,XN) for all possible Cl, ... ,CN would entail (2N + 1)DN multi(cid:173)\nplications. and finding the maximum has complexity of DN. which is intractable for large \nNand D. [2J \n\nAnother problem with this formulation is the estimation of the high dimensional joint dis(cid:173)\ntribution p( Cl, ... , CN), which is ill-posed and data hungry. \n\nOne way of dealing with these problems is to limit context to local regions. With this \napproach, only the pixels in a close neighborhood. or letters immediately adjacent are con(cid:173)\nsidered [4][6][7J. Such techniques may be ignoring useful information, and will not apply \nto situations where context doesn't have such locality, as in the case of microscopic uri(cid:173)\nnalysis image recognition. Another way is to simplify the problem using specific domain \nknowledge [1], but this is only possible in certain domains. \n\nThese difficulties motivate the efficient incorporation of partial context as a general frame(cid:173)\nwork, formulated in section 2. In section 3, we discuss microscopic urinalysis image recog(cid:173)\nnition. and address the importance of using context for this application. Also in section 3, \ntechniques are proposed to identify relevant context. Empirical results are shown in section \n4. followed by discussions in section 5. \n\n2 FORMULATION FOR INCORPORATION OF PARTIAL \n\nCONTEXT \n\nTo avoid the exponential computational cost of using the identities of all accompanying \nobjects directly as context, we use \"partial context\". denoted by A. It is called \"partial\" be(cid:173)\ncause it is derived from the class labels. as opposed to consisting of an explicit labelling of \nall objects. The physical definition of A depends on the problem at hand. In our application. \nA represents the presence or absence of certain classes. Then the posterior probability of \nan object Ti having class label Ci conditioned on its feature vector and the relevant context \nA is \n\np(XiICi, A)P(Ci ; A) \n\nP(Xi ; A) \n\nWe assume that the feature distribution of an object depends only on its own class. i.e., \np(xilci, A) = p(xi lci) . This assumption is roughly true for most real world problems. \nThen. \n\n\fImage Recognition in Context: Application to Microscopic Urinalysis \n\n965 \n\n( .1 . A) - p(xilci)p(Ci; A) _ \n\npC~Xt, \n\n.)p(ciI A ) p(A)p(Xi) \np(Ci) P(Xi; A) \n\n-\n()( p(cilxi) () = p(cilxi)P(Ci, A) \n\np(xijJ~IIA) \n\n( .1 \n\n-pCtXt \n\nP Ci \n\nwhere p(Ci, A) = p~(~j~) is called the context ratio, through which context plays its role. \nThe context-sensitive posterior probability p( Ci lXi, A) is obtained through the context-free \nposterior probability p(cilxi) modified by the context ratio P(Ci, A) . \n\nThe partial-context maximum likelihood decision rule chooses class label Ci for element i \nsuch that \n\nA systematic approach to identify relevant context A is addressed in section 3.3. \n\nCj \n\nCi = argmaxp(cilxi, A) \n\n(I) \n\nThe partial-context approach treats each element in a set individually, but with addi(cid:173)\ntional information from the context-bearing factor A. Once p(cilxi) are known for all \ni = 1, ... , N, and the context A is obtained, to maximize p(cilxi, A) from D possible \nvalues that Ci can take on and for all i, the total number of multiplications is 2N, and the \ncomplexity for finding the maximum is N D. Both are linear in N. The density estimation \npart is also trivial since it is very easy to estimate p(cIA). \n\n3 MICROSCOPIC URINALYSIS \n\n3.1 \n\nINTRODUCTION \n\nUrine is one of the most complex body fluid specimens: it potentially contains about 60 \nmeaningful types of elements. Microscopic urinalysis detects the presence of elements that \noften provide early diagnostic information concerning dysfunction, infection, or inflamma(cid:173)\ntion of the kidneys and urinary tract. Thus this non-invasive technique can be of great value \nin clinical case management. Traditional manual microscopic analysis relies on human op(cid:173)\nerators who read the samples visually and identify them, and therefore is time-consuming, \nlabor-intensive and difficult to standardize. Automated microscopy of all specimens is more \npractical than manual microscopy, because it eliminates variation among different technol(cid:173)\nogists. This variation becomes more pronounced when the same technologist examines \nincreasing numbers of specimens. Also, it is less labor-intensive and thus less costly than \nmanual microscopy. It also provides more consistent and accurate results. An automat(cid:173)\ned urinalysis system workstation (The Y ellowI RI ST M, International Remote Imaging \nSystems, Inc.) has been introduced in numerous clinical laboratories for automated mi(cid:173)\ncroscopy. Urine samples are processed and examined at lOOx (low power field) and 400x \nmagnifications (high power field) with bright-field illumination. The Y ellowI RI ST M au(cid:173)\ntomated system collects video images of formed analytes in a stream of un centrifuged urine \npassing an optical assembly. Each image has one analyte in it. These images are given to a \ncomputer algorithm for automatic identification of analytes. \n\nContext is rich in urinalysis and plays a crucial role in analyte classification. Some com(cid:173)\nbinations of analytes are more likely than others. For instance, the presence of bacteria \nindicates the presence of white blood cells, since bacteria tend to cause infection and thus \ntrigger the production of more white blood cells. If amorphous crystals show up, they tend \nto show up in bunches and in all sizes. Therefore, if there are amorphous crystallook-alikes \nin various sizes, it is quite possible that they are amorphous crystals. Squamous epithelial \ncells can appear both flat or rolled up. If squamous epithelial cells in one form are detected, \n\n\f966 \n\nX B. Song, J Sill, Y. Abu-Mostafa and H. Kasdan \n\nTable I: Features extracted from urine anylates images \n\nreature number \n\nreature desc:ription \n\n( \n2 \n\n4 \n\n9 \n10 \nII \n12 \n13 \n14 \nIS \n16 \n\ntht: m~an or hluc distribution \nthe mean of gn...-cn dislrihulmn \n15th paccnlile of \u00a3ray level hislo\u00a3ram \n85 th percenlile of gray level hislogmm \nlh~ standard devia.tion \\11' gray level intensity \nenergy of the (.aplacian lransl\\)rmalion of grey level image \n\nthen it is likely that there are squamous epithelial cells in the other form. Utilizing such \ncontext is crucial for classification accuracy. \n\nThe classes we are looking at are bacteria, calcium oxalate crystals, red blood cells, white \nblood cells, budding yeast, amorphous crystals, uric acid crystals, and artifacts. The task \nof automated microscopic urinalysis is, given a urine specimen that consists of up to a \nfew hundred images of analytes, to classify each analyte into one of these classes. The \nautomated urinalysis system we developed consists of three steps: image processing and \nfeature extraction, learning and pattern recognition, and context incorporation. Figure 1 \nshows some example analyte images. Table 1 gives a list of features extracted from analyte \nimages. 1 \n\n3.2 CONTEXT-FREE CLASSIFICATION \n\nThe features are fed into a nonlinear feed-forward neural network with 16 inputs, 15 hidden \nunits with sigmoid transfer functions, and 8 sigmoid output units. A cross-entropy error \nfunction is used in order to give the output a probability interpretation. Denote the input \nfeature vector as x, the network outputs a D dimensional vector (D = 8 in our case) \np = {p(dlx)} , d = 1, ... , D, where p(dlx) is \n\np{dlx) = Prob( an analyte belongs to class dl feature x) \n\nThe decision made at this stage is \n\nd{x) = argmax p(dlx) \n\nd \n\n3.3 \n\nIDENTIFICATION OF RELEVANT PARTIAL CONTEXT \n\nNot all classes are relevant in terms of carrying contextual information. We propose three \ncriteria based on which we can systematicalIy investigate the relevance of the class pres(cid:173)\nence. To use these criteria, we need to know the folIowing distributions: the class prior dis(cid:173)\ntribution p(c) for c = 1, ... ,D; the conditional class distribution p{cIAd) for c = 1, ... ,D \n1 >'1 and >'2 are respectively the larger and the smaller eigenvalues of the second moment matrix \n\nof an image. \n\n\fImage Recognition in Context: Application to Microscopic Urinalysis \n\n967 \n\nand d = 1, . .. ,D; and the class presence prior distribution p(Ad) for d = 1, . . . , D. Ad is \na binary random variable indicating the presence of class d. Ad = 1 if class d is present, \nand Ad = 0 otherwise. All these distributions can be easily estimated from the database. \n\n1)) \n\nThe first criterion is the correlation coefficient between the presence of any two class(cid:173)\nes; the second one is the classical mutual information I(e; Ad) between the presence of a \nclass Ad and the class probability pee), where I(e; Ad) is defined as I(e; Ad) = H(e) -\nH(eIAd) where H(e) = 2:~1 p(e = i)ln(p(e = i)) is the entropy of the class priors and \nH(eIAd) = P(Ad = I)H(eIAd = 1)+P(Ad = O)H(eIAd = 0) is the conditional entropy \nof e conditioned on Ad. The third criterion is what we call the expected relative entropy \nD(eIIAd) between the presence ofa class Ad and the labeling probability pee) , which we \ndefine as D(eIIAd) = P(Ad = I)D(p(e)llp(eIAd = 1)) + P(Ad = O)D(p(e)llp(eIAd = \n2:~lP(e = ilAd = l)ln(p(c;/l~t)=l)) and \n0)) where D(p(e)llp(eIAd \nD(p(e)llp(eIAd = 0)) = 2:~1 p(e = ilAd = O)ln(p(C;/l~t)=O)) \nAccording to the first criterion, one type of analyte is considered relevant to another if the \nabsolute value of their correlation coefficient is beyond a certain threshold. It shows that \nuric acid crystals, budding yeast and calcium oxalate crystals are not relevant to any other \ntypes even by a generous threshold of 0.10. Similarly, the bigger the mutual information \nbetween the presence of a class and the class distribution, the more relevant this class is. \nRanking the analyte types in terms of I(e; Ad) in a descending manner gives rise to the \nfollowing list: bacteria, amorphous crystals, red blood cells, white blood cells, uric acid \ncrystals, budding yeast and calcium oxalate crystals. Once again, ranking the analyte types \nin terms of D(eIIAd) in a descending manner gives rise to the following list: bacteria, red \nblood cells, amorphous crystals, white blood cells, calcium oxalate crystals, budding yeast \nand uric acid crystals. All three criteria lead to similar conclusions regarding the relevance \nof class presence - bacteria, red blood cells, amorphous crystals, and white blood cells are \nrelevant, while calcium oxalate crystals, budding yeast and uric acid crystals are not. (Baed \non prior knowledge, we discard artifacts from the outset as an irrelevant class.) \n\n3.4 ALGORITHM FOR INCORPORATING PARTIAL CONTEXT \n\nOnce the M relevant classes are identified, the following algorithm is used to incorporate \npartial context. \n\nStep 0 Estimate the priors p(eIAd) and pee), for e E {I, 2, .. . , D} and d E {I, 2, ... , D}. \n\nStep 1 For a given Xi, compute p(edxi) for ei = 1,2, . .. , Dusing whichever base machine \nlearning model is preferred ( in our case, a neural network). \n\nStep 2 Let the M relevant classes be R 1 , ..\u2022 , RM. According to the no-context p( ei IXi) \nand certain criteria for detecting the presence or absence of all the relevant classes, get \nARI , \u00b7 \u00b7\u00b7 ,ARM' \nStep 3 Letp(ei lXi , Ao) = p(eilxi), where Ao is the null element. Incorporate context from \neach relevant class sequentially, i.e., for m = 1 to M, iteratively compute \n\np(eilxi; Ao, .. . , ARm_I ' ARTn) = p(eilxi' Ao,.\u00b7 . , ARTn_J \n\np(ei IARTn)p(AR\"J \n\npee) \n\nStep 4 Recompute A RI , . . . ,ARM based on the new class labellings. Return to step 3 and \nrepeat until algorithm converges.2 \n\n2Hence, the algorithm has an E-M flavor, in that it goes back and forth between finding the most \n\n\f968 \n\nX B. Song, J. Sill, Y. Abu-Mostafa and H Kasdan \n\namorphous crystals \n\nartifacts \n\ncalcium oxalate crystals \n\nhyaline casts \n\nFigure I: Example of some of the analyte images. \n\n5 Label \n\nStep \np(cilxi, ARI'\u00b7\u00b7 \u00b7 ' ARM)' i.e., Ci = argmaxp(ciIXi, AR1 , ... , ARM) for i = 1, ... , N. \n\ncontext-contammg \n\nfinal \n\nthe \n\nthe \n\nobjects \n\naccording \n\nto \n\nThis algorithm is invariant with respect to the ordering of the M relevant classes in \n(Ai, ... , AM). The proof is omitted here. \n\nCi \n\n4 RESULTS \n\nThe algorithm using partia.1 context was tested on a database of 83 urine specimens, contain(cid:173)\ning a total of 20,276 analyte images. Four classes are considered relevant according to the \ncriteria described in section 3.3: bacteria, red blood cells, white blood cells and amorphous \ncrystals. We measure two types of error: analyte-by-analyte error, and specimen diagnostic \nerror. The average analyte-by-analyte error is reduced from 44.48% before using context \nto 36.66% after, resulting a relative error reduction of 17.6% (Table 2). The diagnosis for a \nspecimen is either normal or abnormal. Tables 3 and 4 compare the diagnostic performance \nwith and without using context, and Table 5 lists the relative changes. We can see using \ncontext significantly increases correct diagnosis for both normal and abnormal specimens, \nand reduces both false positives and false negatives. \n\naverage element-by-element error \n\n44.48 % \n\n36.66 % \n\nwithout context with context \n\nTable 2: Comparison of using and not using contextual information for analyte-by-analyte \nerror. \n\nprobable class labels given the context and determining the context given the class labels. \n\n\fImage Recognition in Context: Application to Microscopic Urinalysis \n\n969 \n\nestimated normal \n\nestimated abnormal \n\ntruly normal \ntruly abnormal \n\n40.96 % \n19.28 % \n\n7.23 % \n32.53 % \n\nTable 3: Diagnostic confusion matrix not using context \n\nestimated normal \n\nestimated abnormal \n\ntruly normal \ntruly abnormal \n\n42.17 % \n16.87 % \n\n6.02% \n34.94 % \n\nTable 4: Diagnostic confusion matrix using context \n\nestimated normal \n\nestimated abnormal \n\ntruly normal \ntruly abnormal \n\n+2.95 % \n- 12.50 % \n\n-16.73 % \n+7.41 % \n\nTable 5: Relative accuracy improvement (diagonal elements) and error reduction (off diag(cid:173)\nonal elements) in the diagnostic confusion matrix by using context. \n\n5 CONCLUSIONS \nWe proposed a novel framework that can incorporate context in a simple and efficien(cid:173)\nt manner, avoiding exponential computation and high dimensional density estimation. The \napplication of the partial context technique to microscopic urinalysis image recognition \ndemonstrated the efficacy of the algorithm. This algorithm is not domain dependent, thus \ncan be readily generalized to other pattern recognition areas. \n\nACKNOWLEDGEMENTS \n\nThe authors would like to thank Alexander Nicholson, Malik Magdon-Ismail, Amir Atiya \nat the Caltech Learning Systems Group for helpful discussions. \n\nReferences \n\n[I) Song, X.B . & SilU. & Abu-Mostafa & Harvey Kasdan, (1997) \"Incorporating Contextual Infor(cid:173)\nmation in White Blood Cell Identification\", In M. Jordan, MJ. Kearns and S.A. Solla (eds.), Advances \nin Neural Information Processing Systems 7,1997, pp. 950-956. Cambridge, MA: MIT Press. \n\n[2] Song, Xubo (1999) \"Contextual Pattern Recognition with Application to Biomedical Image Iden(cid:173)\ntification\", Ph.D. Thesis, California Institute of Science and Technology. \n\n[3) Boehringer-Mannheim-Corporation, Urinalysis Today, Boehringer-Mannheim-Corporation, \n1991. \n\n[4] Kittler, J..\"Relaxation labelling\", Pattern Recognition Theory and Applications, 1987, pp. 99-\n108., Pierre A. Devijver and Josef Kittler, Editors, Springer-Verlag. \n\n[5] Kittler, J. & Illingworth, J., \"Relaxation Labelling Algorithms - A Review\", Image and Vision \nComputing, 1985, vol. 3, pp. 206-216. \n\n[6] Toussaint, G., \"The Use of Context in Pattern Recognition\", Pattern Recognition, 1978, vol. 10, \npp. 189-204. \n\n[7] Swain, P. & Vardeman, S. & Tilton, J., \"Contextual Classification of Multispectral Image Data\", \nPattern Recognition, 1981, Vol. 13, No.6, pp. 429-441. \n\n\f", "award": [], "sourceid": 1675, "authors": [{"given_name": "Xubo", "family_name": "Song", "institution": null}, {"given_name": "Joseph", "family_name": "Sill", "institution": null}, {"given_name": "Yaser", "family_name": "Abu-Mostafa", "institution": null}, {"given_name": "Harvey", "family_name": "Kasdan", "institution": null}]}