{"title": "Comparing the prediction accuracy of artificial neural networks and other statistical models for breast cancer survival", "book": "Advances in Neural Information Processing Systems", "page_first": 1063, "page_last": 1067, "abstract": "", "full_text": "Comparing the prediction accuracy of \n\nartificial neural networks and other \nstatistical models for breast cancer \n\nsurvival \n\nHarry B. Burke \n\nDepartment of Medicine \nNew York Medical College \n\nValhalla, NY 10595 \n\nDavid B. Rosen \n\nDepartment of Medicine \nNew York Medical College \n\nValhalla, NY 10595 \n\nPhilip H. Goodman \nDepartment of Medicine \n\nUniversity of Nevada School of Medicine \n\nReno, Nevada 89520 \n\nAbstract \n\nThe TNM staging system has been used since the early 1960's \nto predict breast cancer patient outcome. In an attempt to in(cid:173)\ncrease prognostic accuracy, many putative prognostic factors have \nbeen identified. Because the TNM stage model can not accom(cid:173)\nmodate these new factors, the proliferation of factors in breast \ncancer has lead to clinical confusion. What is required is a new \ncomputerized prognostic system that can test putative prognostic \nfactors and integrate the predictive factors with the TNM vari(cid:173)\nables in order to increase prognostic accuracy. Using the area un(cid:173)\nder the curve of the receiver operating characteristic, we compare \nthe accuracy of the following predictive models in terms of five \nyear breast cancer-specific survival: pTNM staging system, princi(cid:173)\npal component analysis, classification and regression trees, logistic \nregression, cascade correlation neural network, conjugate gradient \ndescent neural, probabilistic neural network, and backpropagation \nneural network. Several statistical models are significantly more ac-\n\n\f1064 \n\nHarry B. Burke, David B. Rosen, Philip H. Goodman \n\ncurate than the TNM staging system. Logistic regression and the \nbackpropagation neural network are the most accurate prediction \nmodels for predicting five year breast cancer-specific survival \n\n1 \n\nINTRODUCTION \n\nFor over thirty years measuring cancer outcome has been based on the TNM staging \nsystem (tumor size, number of lymph nodes with metastatic disease, and distant \nmetastases) (Beahr et. al., 1992). There are several problems with this model \n(Burke and Henson, 1993). First, it is not very accurate, for breast cancer it is \n44% accurate. Second its accuracy can not be improved because predictive vari(cid:173)\nables can not be added to the model. Third, it does not apply to all cancers. In \nthis paper we compare computerized prediction models to determine if they can \nimprove prognostic accuracy. Artificial neural networks (ANN) are a class of non(cid:173)\nlinear regression and discrimination models. ANNs are being used in many areas \nof medicine, with several hundred articles published in the last year. Representa(cid:173)\ntive areas of research include anesthesiology (Westenskow et. al., 1992), radiology \n(Tourassi et. al. , 1992) , cardiology (Leong and Jabri, 1982), psychiatry (Palombo, \n1992), and neurology (Gabor and Seyal, 1992). ANNs are being used in cancer \nresearch including image processing (Goldberg et. al., 1992) , analysis of labora(cid:173)\ntory data for breast cancer diagnosis (0 Leary et. al., 1992), and the discovery of \nchemotherapeutic agents (Weinstein et . al., 1992). It should be pointed out that \nthe analyses in this paper rely upon previously collected prognostic factors. These \nfactors were selected for collection because they were significant in a generalized \nlinear model such as the linear or logistic models. There is no predictive model that \ncan improve upon linear or logistic prediction models when the predictor variables \nmeet the assumptions of these models and there are no interactions. Therefore \nhe objective of this paper is not to outperform linear or logistic models on these \ndata. Rather, our objective is to show that, with variables selected by generalized \nlinear models, artificial neural networks can perform as well as the best traditional \nmodels . There is no a priori reason to believe that future prognostic factors will \nbe binary or linear, and that there will not be complex interactions between prog(cid:173)\nnostic factors. A further objective of this paper is to demonstrate that artificial \nneural networks are likely to outperform the conventional models when there are \nunanticipated nonmonotonic factors or complex interactions. \n\n2 METHODS \n\n2.1 DATA \n\nThe Patient Care Evaluation (PCE) data set is collected by the Commission on \nCancer of the American College of Surgeons (ACS). The ACS, in October 1992, \nrequested cancer information from hospital tumor registries in the United States. \nThe ACS asked for the first 25 cases of breast cancer seen at that institution in 1983, \nand it asked for follow up information on each of these 25 patients through the date \nof the request. These are only cases of first breast cancer. Follow-up information \nincluded known deaths. The PCE data set contains, at best, eight year follow-up. \n\n\fPrediction Accuracy of Models for Breast Cancer Survival \n\n1065 \n\nWe chose to use a five year survival end-point. This analysis is for death due to \nbreast cancer, not all cause mortality. \n\nFor this analysis cases with missing data, and cases censored before five years, are \nnot included so that the prediction models can be compared without putting any \nprediction model at a disadvantage. We randomly divided the data set into training, \nhold-out, and testing subsets of 3,100, 2,069, and 3,102 cases, respectively. \n\n2.2 MODELS \n\nThe TMN stage model used in this analysis is the pathologic model (pTNM) based \non the 1992 American Joint Committee on Cancer's Manual for the Staging of \nCancer (Beahr et. al., 1992). The pathologic model relies upon pathologically de(cid:173)\ntermined tumor size and lymph nodes, this contrasts with clinical staging which \nrelies upon the clinical examination to provide tumor size and lymph node infor(cid:173)\nmation. To determine the overall accuracy of the TNM stage model we compared \nthe model's prediction for each patient, where the individual patient's prediction \nis the fraction of all the patients in that stage who survive, to each patient's true \noutcome. \n\nPrincipal components analysis, is a data reduction technique based on the linear \ncombinations of predictor variables that minimizes the variance across patients (Jol(cid:173)\nlie, 1982). The logistic regression analysis is performed in a stepwise manner, with(cid:173)\nout interaction terms, using the statistical language S-PLUS (S-PLUS, 1992), with \nthe continuous variable age modeled with a restricted cubic spline to avoid assuming \nlinearity (Harrell et. al., 1988). Two types of Classification and Regression Tree \n(CART) (Breiman et. al., 1984) analyses are performed using S-PLUS. The first \nwas a 9-node pruned tree (with 10-fold cross validation on the deviance), and the \nsecond was a shrunk tree with 13.7 effective nodes. \n\nThe multilayer perceptron neural network training in this paper is based on the \nmaximum likelihood function unless otherwise stated, and backpropagation refers \nto gradient descent. Two neural networks that are not multilayer perceptrons are \ntested. They are the Fuzzy ARTMAP neural network (Carpenter et. al., 1991) and \nthe probabilistic neural network (Specht, 1990). \n\n2.3 ACCURACY \n\nThe measure of comparative accuracy is the area under the curve of the receiver \noperating characteristic (Az) . Generally, the Az is a nonparametric measure of \ndiscrimination. Square error summarizes how close each patient's predicted value is \nto its true outcome. The Az measures the relative goodness of the set of predictions \nas a whole by comparing the predicted probability of each patient with that of all \nother patients. The computational approach to the Az that employs the trapezoidal \napproximation to the area under the receiver operating characteristic curve for \nbinary outcomes was first reported by Bamber (Bamber, 1975), and later in the \nmedical literature by Hanley (Hanley and McNeil, 1982). This was extended by \nHarrell (Harrell et. al., 1988) to continuous outcomes. \n\n\f1066 \n\nHarry B. Burke, David B. Rosen, Philip H. Goodman \n\nTable 1: PCE 1983 Breast Cancer Data: 5 Year Survival Prediction, 54 Variables. \n\nPREDICTION MODEL \n\nACCURACY\u00b7 SPECIFICATIONS \n\npTNM Stages \nPrincipal Components Analysis \nCART, pruned \nCART, shrunk \nStepwise Logistic regression \nFuzzy ARTMAP ANN \nCascade correlation ANN \nConjugate gradient descent ANN \nProbabilistic ANN \nBackpropagation ANN \n* The area under the curve of the receiver operating characteristic. \n\nO,I,I1A,I1B,IIIA,I1IB,IV \none scaling iteration \n9 nodes \n13.7 nodes \nwith cubic splines \n54-F2a, 128-1 \n54-21-1 \n54-30-1 \nbandwidth = 16s \n54-5-1 \n\n.720 \n.714 \n.753 \n.762 \n.776 \n.738 \n.761 \n.774 \n.777 \n.784 \n\n3 RESULTS \n\nAll results are based on the independent variable sample not used for training (i.e., \nthe testing data set), and all analyses employ the same testing data set. Using \nthe PCE breast cancer data set, we can assess the accuracy of several prediction \nmodels using the most powerful of the predictor variables available in the data set \n(See Table 1). \nPrincipal components analysis is not expected to be a very accurate model; with \none scaling iteration, its accuracy is .714. Two types of classification and regres(cid:173)\nsion trees (CART), pruned and shrunk, demonstrate accuracies of .753 and .762, \nrespectively. Logistic regression with cubic splines for age has an accuracy of .776. \nIn addition to the backpropagation neural network and the probabilistic neural net(cid:173)\nwork, three types of neural networks are tested. Fuzzy ARTMAP's accuracy is \nthe poorest at .738. It was too computationally intensive to be a practical model. \nCascade-correlation and conjugate gradient descent have the potential to do as well \nas backpropagation. The PNN accuracy is .777. 