{"title": "Incorporating Contextual Information in White Blood Cell Identification", "book": "Advances in Neural Information Processing Systems", "page_first": 950, "page_last": 956, "abstract": "", "full_text": "Incorporating Contextual Information in White \n\nBlood Cell Identification \n\nXubo Song* \n\nDepartment of Electrical Engineering \n\nCalifornia Institute of Technology \n\nPasadena, CA 91125 \n\nxubosong@fire.work.caltech.edu \n\nYaser Abu-Mostafa \n\nDept. of Electrical Engineering \nand Dept. of Computer Science \nCalifornia Institute of Technology \n\nPasadena, CA 91125 \n\nYaser@over. work.caltech.edu \n\nJoseph Sill \n\nComputation and Neural Systems Program \n\nCalifornia Institute of Technology \n\nPasadena, CA 91125 \n\njoe@busy.work.caltech.edu \n\nHarvey Kasdan \n\nInternational Remote Imaging Systems \n\n9162 Eton Ave., \n\nChatsworth, CA 91311 \n\nAbstract \n\nIn this paper we propose a technique to incorporate contextual informa(cid:173)\ntion into object classification. In the real world there are cases where the \nidentity of an object is ambiguous due to the noise in the measurements \nbased on which the classification should be made. It is helpful to re(cid:173)\nduce the ambiguity by utilizing extra information referred to as context, \nwhich in our case is the identities of the accompanying objects. This \ntechnique is applied to white blood cell classification. Comparisons are \nmade against \"no context\" approach, which demonstrates the superior \nclassification performance achieved by using context. In our particular \napplication, it significantly reduces false alarm rate and thus greatly re(cid:173)\nduces the cost due to expensive clinical tests. \n\n\u2022 Author for correspondence. \n\n\fIncorporating Contextual Information in White Blood Cell Identification \n\n951 \n\n1 \n\nINTRODUCTION \n\nOne of the most common assumptions made in the study of machine learning is that the \nexamples are drawn independently from some joint input-output distribution. There are \ncases, however, where this assumption is not valid. One application where the indepen(cid:173)\ndence assumption does not hold is the identification of white blood cell images. Abnormal \ncells are much more likely to appear in bunches than in isolation. Specifically, in a sample \nof several hundred cells, it is more likely to find either no abnormal cells or many abnormal \ncells than it is to find just a few. \n\nIn this paper, we present a framework for pattern classification in situations where the \nindependence assumption is not satisfied. In our case, the identity of an object is dependent \nof the identities of the accompanying objects, which provides the contextual information. \nOur method takes into consideration the joint distribution of all the classes, and uses it to \nadjust the object-by-object classification. \n\nIn section 2, the framework for incorporating contextual information is presented, and an \nefficient algorithm is developed. In section 3 we discuss the application area of white \nblood cell classification, and address the importance of using context for this application. \nEmpirical testing results are shown in Section 4, followed by conclusions in Section 5. \n\n2 \n\nINCORPORATING CONTEXTUAL INFORMATION INTO \nCLASSIFICATION \n\n2.1 THE FRAMEWORK \n\nLet Xi be the feature vector of an object, and Ci = C(Xi} be the classification for Xi, i = \nI, ... N, where N is the total number of objects. Ci E {I, ... , D}, where D is the number of \ntotal classes. \n\nAccording to Bayes rule, \n\n( I ) - p(xlc}p(c} \npC X \n\np(x} \n\n-\n\nIt follows that the \"with context\" a posteriori probability of the class labels of all the objects \nassuming values Cl, C2, \"', C N, given all the feature vectors, is \n\nIt is reasonable to assume that the feature distribution given a class is independent of the \nfeature distributions of other classes, i.e., \n\nP(Xb X2, ... , xNlcl, C2, ... , CN} = p(xllcd\u00b7\u00b7\u00b7p(XNlcN} \n\nThen Equation (1) can be rewritten as \n\n-\n\np(cllxd\",p(CNlxN }P(Xl}\u00b7\u00b7\u00b7P(XN )P(Cl ' C2, ... , CN) \n\nP(Cl}\u00b7 .. P(CN }P(Xl ' X2, ... , XN} \n\n\f952 \n\nX Song, Y. Abu-Mostafa, 1. Sill and H. Kasdan \n\nwhere p( cilxi) is the \"no context\" object-by-object Bayesian a posteriori probability, and \np( Ci) is the a priori probability of the classes, p( Xi) is the marginal probability of the \nfeatures, and P(Xl' X2, ... , XN) is the joint distribution of all the feature vectors. \n\nSince the features (Xl, X2, ... , XN) are given, p(Xb X2, ... , XN) and p(xd are constant, \n\nwhere \n\n(3) \n\nThe quantity p( Cl, C2, \u2022.. , C N ), which we call context ratio and through which the context \nplays its role, captures the dependence among the objects. In the case where all the objects \nare independent, p( Cl, C2, ..\u2022 , CN) equals one - there will be no context. In the dependent \ncase, p( Cl, C2, ..\u2022 , CN) will not equal one, and the context has an effect on the classifications. \n\nWe deal with the application of object classification where it is the count in each class, \nrather than the particular ordering or numbering of the objects, that matters. As a result, \np ( Cl , C2, .\u2022. , C N) is only a function of the count in each class. Let N d be the count in class \nd, and Vd = !ft, d = 1..., D, \n\nwhere Pd is the prior distribution of class d, for d = 1, ... D. 2:f=l Nd \nL:f=l Vd = l. \nThe decision rule is to choose class labels Cl, C2, .. . , CN such that \n\n(4) \n\nNand \n\n(Cl' C2, \u2022.. , CN) = \n\nargmax P(Cll C2, .\u2022. , cNIXl, X2, ... , XN) \n\n(5) \n\n(Cl ,C2 , ... ,CN) \n\nWhen implementing the decision rule, we need to compute and compare DN cases for \nEquation 5. In the case of white blood cell recognition, D = 14 and N is typically around \n600, which makes it virtually impossible to implement. \n\nIn many cases, additional constraints can be used to reduce computation, as is the case in \nwhite blood cell identification, which will be demonstrated in the following section. \n\n3 WmTE BLOOD CELL RECOGNITION \n\nLeukocyte analysis is one of the major routine laboratory examinations. The utility of \nleukocyte classification in clinical diagnosis relates to the fact that in various physiological \nand pathological conditions the relative percentage composition of the blood leukocytes \n\n\fIncorporating Contextual Infonnation in White Blood Cell Identification \n\n953 \n\nchanges. An estimate of the percentage of each class present in a blood sample conveys \ninformation which is pertinent to the hematological diagnosis. Typical commercial differ(cid:173)\nential WBC counting systems are designed to identify five major mature cell types. But \nblood samples may also contain other types of cells, i.e. immature cells. These cells occur \ninfrequently in normal specimen, and most commercial systems will simply indicate the \npresence of these cells because they can't be individually identified by the systems. But it \nis precisely these cell types that relate to the production rate and maturation of new cells \nand thus are important indicators of hematological disorders. Our system is designed to \ndifferentiate fourteen WBC types which includes the five major mature types: segmented \nneutrophils, lymphocytes, monocytes, eosinophils, and basophils; and the immature types: \nbands (unsegmented neutrophils), metamyelocytes, myelocytes, promyelocytes, blasts, and \nvariant lymphocytes; as well as nucleated red blood cells and artifacts. Differential counts \nare made based on the cell classifications, which further leads to diagnosis or prognosis. \n\nThe data was provided by IRIS, Inc. Blood specimens are collected at Harbor UCLA \nMedical Center from local patients, then dyed with Basic Orange 21 metachromatic dye \nsupravital stain. The specimen is then passed through a flow microscopic imaging and im(cid:173)\nage processing instrument, where the blood cell images are captured. Each image contains \na single cell with full color. There are typically 600 images from each specimen. The task \nof the cell recognition system is to categorize the cells based on the images. \n\n3.1 PREPROCESSING AND FEATURE EXTRACTION \n\nThe size of cell images are automatically tailored according to the size of the cell in the \nimages. Images containing larger cells have bigger sizes than those with small cells. The \nrange varies from 20x20 to 40x40 pixels. The average size is around 25x25. See Figure \n3.1. At the preprocessing stage, the images are segmented to set the cell interior apart from \nthe background. Features based on the interior of the cells are extracted from the images. \nThe features include size, shape, color 1 and texture. See Table 1 for the list of features. 2 \n\nFigure 1: Example of some of the cell images. \n\n3.2 CELL\u00b7BY\u00b7CELL CLASSIFICATION \n\nThe features are fed into a nonlinear feed-forward neural network with 20 inputs, 15 hidden \nunits with sigmoid transfer functions, and 14 sigmoid output units. A cross-entropy error \n\n1 A color image is decomposed into three intensity images - red, green and blue respectively \n2The red-blue distribution is the pixel-by-pixel log(red)- log(blue) distribution for pixels in cell \n\ninterior. The red distribution is the distribution of the red intensity in cell interior. \n\n\f954 \n\nX Song, Y. Abu-Mostafa, 1. Sill and H. Kasdan \n\nfeature number \n\nfeature description \n\n1 \n2 \n3 \n4 \n5 \n6 \n7 \n8 \n9 \n10 \n11 \n12 \n13 \n14 \n15 \n16 \n17 \n18 \n19 \n20 \n\ncell area \nnumber of pixels on cell edge \nthe 4th quantile of red-blue distribution \nthe 4th quantile of green-red distribution \nthe median of red-blue distribution \nthe median of green-red distribution \nthe median of blue-green distribution \nthe standard deviation of red-blue distribution \nthe standard deviation of green-red distribution \nthe standard deviation of blue-green distribution \nthe 4th quantile of red distribution \nthe 4th quantile of green distribution \nthe 4th quantile of blue distribution \nthe median of red distribution \nthe median of green distribution \nthe median of blue distribution \nthe standard deviation of red distribution \nthe standard deviation of green distribution \nthe standard deviation of blue distribution \nthe standard deviation of the distance from the edge to the mass center \n\nfunction is used in order to give the output a probability interpretation. Denote the input \nfeature vector as x, the network outputs a D dimensional vector ( D = 14 in our case) \np = {p(dlx)}, d = 1, ... , D, where p(dlx) is \n\np(dlx) = Prob( a cell belongs to class dl feature x) \n\nThe decision made at this stage is \n\nd(x) = argmax p(dlx) \n\nd \n\n3.3 COMBINING CONTEXTUAL INFORMATION \n\nThe \"no-context\" cell-by-cell decision is only based on the features presented by a cell, \nwithout looking at any other cells. When human experts make decisions, they always look \nat the whole specimen, taking into consideration the identities of other cells and adjusting \nthe cell-by-cell decision on a single cell according to the company it keeps. On top of the \nvisual perception of the cell patterns, such as shape, color, size, texture, etc., comparisons \nand associations, either mental or visual, with other cells in the same specimen are made to \ninfer the final decision. A cell is assigned a certain identity if the company it keeps supports \nthat identity. For instance, the difference between lymphocyte and blast can be very subtle \nsometimes, especially when the cell is large. A large unusual mononuclear cell with the \ncharacteristics of both blast and lymphocyte is more likely to be a blast if surrounded by or \naccompanied by other abnormal cells or abnormal distribution of the cells. \n\n\fIncorporating Contextual Infonnation in White Blood Cell Identification \n\n955 \n\nThis scenario fits in the framework we described in section 2. The Combining Contextual \nInformation algorithm was used as the post-precessing of the cell-by-cell decisions. \n\n3.4 OBSERVATIONS AND SIMPLIFICATIONS \n\nDirect implementation of the proposed algorithm is difficult due to the computational \ncomplexity. In the application of WBC identification, simplification is possible. We ob(cid:173)\nserved the following: First, we are primarily concerned with one class blast, the presence \nof which has clinical significance. Secondly, we only confuse blast with another class \nlymphocyte. In other words, for a potential blast, p(blastlx) \u00bb 0, p(lymphocytelx) \u00bb \n0, p(any other classlx) ~ O. Finally, we are fairly certain about the classification of all \nother classes, i.e. p(a certain c1asslx) ~ 1, p(any other c1asslx) ~ O. Based on the above \nobservations, we can simplify the algorithm, instead of doing an exhaustive search. \nLet pf = P(Ci = dlxi), i = 1, ... , N. More specifically, let pf = p(blastlxd, pf = \np(lymphocytelxi) and pi = p(c1ass * IXi) where * is neither a blast nor a lymphocyte. \nSuppose there are K potential blasts. Order the pf, pf, ... , pf 's in a descending manner \nover i, such that \n\nB \n\nB \nPl ~ P2 ~ ... ~ PK \n\nB \n\nthen the probability that there are k blasts is \nPB(k) = PP \u00b7\u00b7\u00b7pfpr+l\u00b7\u00b7\u00b7pj( PK+1\"' piv p(VB = t, VL = v~ + K ;/, V3, \"\" VD) \nwhere v~ is the proportion of unambiguous lymphocytes and V3, \"\" VD are the proportions \nof the other cell types, \n\nWe can compute the PB(k)'s recursively, \n\nfor k:::: 1, \"\" K-l, and \n\nThis way we only need to compute K terms to get PB(k)'s . Pick the optimal number of \nblasts k* that maximizes PB (k), k = 1, \"., K, \nAn important step is to calculate p(Vl, \"\" VD) which can be estimated from the database, \n\n3.5 THE ALGORITHM \n\nStep 1 Estimate P(Vl' ,.\" VD) from the database, for d = 1\"\", D, \nStep 2 Compute the object-by-object \"no context\" a posteriori probability p(cilxi), i \n1, \"', N, and Ci E {I, ... , D}, \nStep 3 Compute PB (k) and find k* for k = 1, .'\" K, and relabel the cells accordingly, \n\n\f956 \n\nX Song, Y. Abu-Mostafa, 1. Sill and H. Kasdan \n\n4 EMPIRICAL TESTING \n\nThe algorithm has been intensively tested at IRIS, Inc. on the specimens obtained at Harbor \nUCLA medical center. We compared the performances with or without using contextual \ninformation on blood samples from 220 specimens (consisting of 13,200 cells). In about \n50% of the cases, a false alarm would have occurred had context not been used. Most cells \nare correctly classified, but a few are incorrectly labelled as immature cells, which raises a \nflag for the doctors. Change of the classification of the specimen to abnormal requires ex(cid:173)\npert intervention before the false alarm is eliminated, and it may cause unnecessary worry. \nWhen context is applied, the false alarms for most of the specimens were eliminated, and \nno false negative was introduced. \n\nmethods \n\ncell \n\nnormality \n\nclassification \n\nidentification \n\nno context \nwith context \n\n88% \n89% \n\nI\"V 50% \nI\"V 90% \n\nfalse \n\npositive \n1\"V50% \n,....10% \n\nfalse \n\nnegative \n\n0% \n0% \n\nTable 2: Comparison of with and without using contextual information \n\n5 CONCLUSIONS \n\nIn this paper we presented a novel framework for incorporating contextual information into \nobject identification, developed an algorithm to implement it efficiently, and applied it to \nwhite blood cell recognition. Empirical tests showed that the \"with context\" approach is \nsignificantly superior than the \"no context\" approach. The technique described could be \ngeneralized to a number of domains where contextual information plays an essential role, \nsuch a speech recognition, character recognition and other medical diagnosis regimes. \n\nAcknowledgments \n\nThe authors would like to thank the members of Learning Systems Group at Caltech for \nhelpful suggestions and advice: Dr. Amir Atiya, Zehra Cataltepe, Malik Magdon-Ismail, \nand Alexander Nicholson. \n\nReferences \n\nRichard, M.D., & Lippmann, R.P., (1991) Neural network classifiers estimate Bayesian a \nposteriori probabilities. Neural Computation 3. pp.461-483. Cambridge, MA: MIT Press. \n\nKasdan, H.K., Pelmulder, J.P., Spolter, L., Levitt, G.B., Lincir, M.R., Coward, G.N., Haiby, \nS. 1., Lives, J., Sun, N.C.J., & Deindoerfer, F.H., (1994) The WhiteIRISTM Leukocyte \ndifferential analyzer for rapid high-precision differentials based on images of cytoprobe(cid:173)\nreacted cells. Clinical Chemistry. Vol. 40, No.9, pp.1850-1861. \n\nHaralick, R.M., & Shapiro, L.G.,(l992),Computer and Robot Vision, Vol.1 , Addison(cid:173)\nWelsley. \nAus, H. A., Harms, H., ter Meulen, v., & Gunzer, U. (1987) Statistical evaluation of com(cid:173)\nputer extracted blood cell features for screening population to detect leukemias. In Pierre \nA. Devijver and Josef Kittler (eds.) Pattern Recognition Theory and Applications, pp. 509-\n518. Springer-Verlag. \n\nKittler, J., (1987) Relaxation labelling. In Pierre A. Devijver and Josef Kittler (eds.) Pattern \nRecognition Theory and Applications, pp. 99-108. Springer-Verlag. \n\n\f", "award": [], "sourceid": 1435, "authors": [{"given_name": "Xubo", "family_name": "Song", "institution": null}, {"given_name": "Yaser", "family_name": "Abu-Mostafa", "institution": null}, {"given_name": "Joseph", "family_name": "Sill", "institution": null}, {"given_name": "Harvey", "family_name": "Kasdan", "institution": null}]}