{"title": "Comparing the prediction accuracy of artificial neural networks and other statistical models for breast cancer survival", "book": "Advances in Neural Information Processing Systems", "page_first": 1063, "page_last": 1067, "abstract": "", "full_text": "Comparing the prediction accuracy of \n\nartificial neural networks and other \nstatistical models for  breast  cancer \n\nsurvival \n\nHarry B.  Burke \n\nDepartment of Medicine \nNew  York  Medical  College \n\nValhalla, NY  10595 \n\nDavid B.  Rosen \n\nDepartment of Medicine \nNew  York  Medical College \n\nValhalla, NY  10595 \n\nPhilip H.  Goodman \nDepartment of Medicine \n\nUniversity of Nevada School of Medicine \n\nReno,  Nevada 89520 \n\nAbstract \n\nThe  TNM  staging  system  has  been  used  since  the  early  1960's \nto  predict  breast  cancer  patient  outcome.  In  an  attempt  to  in(cid:173)\ncrease  prognostic  accuracy,  many putative prognostic factors  have \nbeen  identified.  Because  the  TNM  stage  model  can  not  accom(cid:173)\nmodate  these  new  factors,  the  proliferation  of factors  in  breast \ncancer  has  lead  to  clinical  confusion.  What  is  required  is  a  new \ncomputerized  prognostic system  that  can test  putative prognostic \nfactors  and  integrate  the  predictive  factors  with  the  TNM  vari(cid:173)\nables  in order to increase  prognostic  accuracy.  Using  the area un(cid:173)\nder  the curve of the receiver  operating characteristic,  we  compare \nthe  accuracy  of the  following  predictive  models  in  terms  of five \nyear  breast cancer-specific  survival:  pTNM staging system,  princi(cid:173)\npal component analysis,  classification and regression  trees,  logistic \nregression,  cascade  correlation neural network,  conjugate gradient \ndescent  neural,  probabilistic neural network,  and backpropagation \nneural network.  Several statistical models are significantly more ac-\n\n\f1064 \n\nHarry B.  Burke,  David B. Rosen,  Philip H.  Goodman \n\ncurate than the TNM  staging system.  Logistic regression  and the \nbackpropagation neural  network  are  the  most  accurate  prediction \nmodels for  predicting five  year  breast  cancer-specific  survival \n\n1 \n\nINTRODUCTION \n\nFor over thirty years measuring cancer outcome has been based on the TNM staging \nsystem  (tumor size,  number  of lymph  nodes  with  metastatic  disease,  and  distant \nmetastases)  (Beahr  et.  al.,  1992).  There  are  several  problems  with  this  model \n(Burke  and  Henson,  1993).  First,  it  is  not  very  accurate,  for  breast  cancer  it  is \n44%  accurate.  Second  its  accuracy  can  not  be  improved  because  predictive  vari(cid:173)\nables  can  not  be  added  to  the  model.  Third,  it does  not  apply  to  all cancers.  In \nthis  paper  we  compare  computerized  prediction  models  to  determine  if they  can \nimprove prognostic  accuracy.  Artificial  neural  networks  (ANN)  are  a  class  of non(cid:173)\nlinear  regression  and  discrimination  models.  ANNs  are  being  used  in  many  areas \nof medicine,  with  several  hundred  articles  published  in the last  year.  Representa(cid:173)\ntive  areas  of research  include  anesthesiology  (Westenskow  et.  al.,  1992), radiology \n(Tourassi et.  al. ,  1992) , cardiology  (Leong  and Jabri,  1982), psychiatry  (Palombo, \n1992),  and  neurology  (Gabor  and  Seyal,  1992).  ANNs  are  being  used  in  cancer \nresearch  including  image  processing  (Goldberg  et.  al.,  1992)  ,  analysis  of labora(cid:173)\ntory  data for  breast  cancer  diagnosis  (0 Leary et.  al.,  1992),  and the  discovery  of \nchemotherapeutic  agents  (Weinstein  et .  al.,  1992).  It  should  be  pointed  out  that \nthe analyses in this paper rely  upon previously  collected  prognostic factors.  These \nfactors  were  selected  for  collection  because  they  were  significant  in  a  generalized \nlinear model such as the linear or logistic models.  There is no predictive model that \ncan improve upon linear or logistic prediction  models  when the predictor  variables \nmeet  the  assumptions  of these  models  and  there  are  no  interactions.  Therefore \nhe  objective  of this  paper  is  not  to outperform  linear  or  logistic  models  on  these \ndata.  Rather,  our objective  is  to show  that,  with variables selected  by generalized \nlinear models,  artificial  neural networks  can perform as  well as  the best traditional \nmodels  .  There  is  no  a  priori  reason  to believe  that future  prognostic  factors  will \nbe binary or linear,  and  that there will not be complex interactions  between  prog(cid:173)\nnostic  factors.  A  further  objective  of this  paper  is  to  demonstrate  that  artificial \nneural  networks  are  likely  to  outperform the  conventional  models  when  there  are \nunanticipated nonmonotonic factors  or complex interactions. \n\n2  METHODS \n\n2.1  DATA \n\nThe  Patient  Care  Evaluation  (PCE)  data  set  is  collected  by  the  Commission on \nCancer  of the  American  College  of Surgeons  (ACS).  The  ACS,  in  October  1992, \nrequested  cancer  information from  hospital  tumor  registries  in the  United  States. \nThe ACS asked for the first 25 cases of breast cancer seen at that institution in 1983, \nand it asked for follow up information on each of these 25  patients through the date \nof the  request.  These  are  only  cases  of first  breast  cancer.  Follow-up  information \nincluded  known  deaths.  The PCE  data set  contains,  at best,  eight  year follow-up. \n\n\fPrediction Accuracy of Models for Breast Cancer Survival \n\n1065 \n\nWe  chose  to  use  a  five  year  survival  end-point.  This  analysis  is  for  death  due  to \nbreast  cancer,  not all cause  mortality. \n\nFor this analysis  cases  with  missing data,  and cases  censored  before five  years,  are \nnot  included  so  that the prediction models  can  be  compared  without  putting  any \nprediction model at a disadvantage. We randomly divided the data set into training, \nhold-out, and testing subsets of 3,100, 2,069,  and 3,102 cases, respectively. \n\n2.2  MODELS \n\nThe TMN stage model used  in this analysis is the pathologic model (pTNM) based \non  the  1992  American  Joint  Committee  on  Cancer's  Manual  for  the  Staging  of \nCancer  (Beahr  et.  al.,  1992).  The  pathologic model relies  upon  pathologically de(cid:173)\ntermined  tumor  size  and  lymph  nodes,  this  contrasts  with  clinical  staging  which \nrelies  upon  the  clinical  examination to provide  tumor  size  and lymph  node  infor(cid:173)\nmation.  To  determine the overall  accuracy  of the TNM stage model we  compared \nthe  model's  prediction  for  each  patient,  where  the  individual  patient's  prediction \nis  the fraction  of all  the patients in that stage  who  survive,  to each  patient's true \noutcome. \n\nPrincipal  components  analysis,  is  a  data reduction  technique  based  on  the  linear \ncombinations of predictor variables that minimizes the variance across patients (Jol(cid:173)\nlie,  1982).  The logistic regression analysis is performed in a stepwise manner, with(cid:173)\nout interaction terms,  using the statistical language S-PLUS  (S-PLUS,  1992), with \nthe continuous variable age modeled with a restricted cubic spline to avoid assuming \nlinearity  (Harrell  et.  al.,  1988).  Two  types  of Classification  and  Regression  Tree \n(CART)  (Breiman  et.  al.,  1984)  analyses  are  performed  using  S-PLUS.  The first \nwas  a  9-node  pruned  tree  (with  10-fold cross  validation on the  deviance),  and the \nsecond  was  a  shrunk  tree with  13.7 effective  nodes. \n\nThe  multilayer  perceptron  neural  network  training  in  this  paper  is  based  on  the \nmaximum likelihood function  unless  otherwise  stated,  and  backpropagation refers \nto gradient descent.  Two neural  networks  that are  not multilayer perceptrons  are \ntested.  They are the Fuzzy  ARTMAP neural network  (Carpenter et.  al.,  1991) and \nthe probabilistic neural network  (Specht,  1990). \n\n2.3  ACCURACY \n\nThe measure  of comparative accuracy  is  the area under  the  curve  of the  receiver \noperating  characteristic  (Az) .  Generally,  the  Az  is  a  nonparametric  measure  of \ndiscrimination.  Square error summarizes how  close each patient's predicted value is \nto its true outcome.  The Az  measures the relative goodness of the set of predictions \nas  a  whole  by  comparing the predicted  probability of each  patient  with that of all \nother patients.  The computational approach to the Az  that employs the trapezoidal \napproximation  to  the  area  under  the  receiver  operating  characteristic  curve  for \nbinary  outcomes  was  first  reported  by  Bamber  (Bamber,  1975),  and  later  in  the \nmedical  literature  by  Hanley  (Hanley  and  McNeil,  1982).  This  was  extended  by \nHarrell  (Harrell  et.  al.,  1988) to continuous outcomes. \n\n\f1066 \n\nHarry B.  Burke,  David B.  Rosen,  Philip H.  Goodman \n\nTable 1:  PCE  1983  Breast  Cancer  Data:  5 Year  Survival Prediction,  54 Variables. \n\nPREDICTION  MODEL \n\nACCURACY\u00b7  SPECIFICATIONS \n\npTNM Stages \nPrincipal Components Analysis \nCART, pruned \nCART, shrunk \nStepwise  Logistic regression \nFuzzy  ARTMAP ANN \nCascade correlation ANN \nConjugate gradient descent  ANN \nProbabilistic ANN \nBackpropagation ANN \n* The area under  the curve of the  receiver  operating characteristic. \n\nO,I,I1A,I1B,IIIA,I1IB,IV \none scaling iteration \n9 nodes \n13.7  nodes \nwith cubic  splines \n54-F2a,  128-1 \n54-21-1 \n54-30-1 \nbandwidth = 16s \n54-5-1 \n\n.720 \n.714 \n.753 \n.762 \n.776 \n.738 \n.761 \n.774 \n.777 \n.784 \n\n3  RESULTS \n\nAll results  are based on the independent  variable sample not used for training (i.e., \nthe  testing  data  set),  and  all  analyses  employ  the  same  testing  data set.  Using \nthe  PCE  breast  cancer  data set,  we  can  assess  the  accuracy  of several  prediction \nmodels  using  the most  powerful of the predictor  variables available in the data set \n(See Table 1). \nPrincipal  components  analysis  is  not  expected  to  be  a  very  accurate  model;  with \none  scaling  iteration,  its  accuracy  is  .714.  Two  types  of classification  and  regres(cid:173)\nsion  trees  (CART),  pruned  and shrunk,  demonstrate  accuracies  of .753  and  .762, \nrespectively.  Logistic regression  with cubic  splines for  age  has an accuracy of .776. \nIn addition to the backpropagation neural network and the probabilistic neural net(cid:173)\nwork,  three  types  of neural  networks  are  tested.  Fuzzy  ARTMAP's  accuracy  is \nthe poorest  at  .738.  It was  too computationally intensive  to  be  a  practical  model. \nCascade-correlation and conjugate gradient descent  have the potential to do as  well \nas  backpropagation.  The  PNN  accuracy  is  .777.  The  PNN  has  many  interesting \nfeatures,  but it also has  several  drawbacks  including its storage requirements.  The \nbackpropagation neural  network's  accuracy  is  .784.4. \n\n4  DISCUSSION \n\nFor  predicting  five  year  breast  cancer-specific  survival,  several  computerized  pre(cid:173)\ndiction models  are more accurate than the TNM stage system,  and artificial neural \nnetworks  are  as good as  the best  traditional statistical models. \n\nReferences \n\nBamber D (1975).  The area above the ordinal dominance graph and the area below \nthe receiver  operating characteristic.  J  Math  Psych  12:387-415. \nBeahrs  OH,  Henson  DE,  Hutter  RVP,  Kennedy  BJ  (1992).  Manual  for  staging  of \n\n\fPrediction Accuracy of Models for Breast Cancer Survival \n\n1067 \n\ncancer,  4th ed.  Philadelphia:  JB  Lippincott. \n\nBurke  HB,  Henson  DE (1993) .  Criteria for  prognostic factors  and for  an enhanced \nprognostic system.  Cancer 72:3131-5. \n\nBreiman  L,  Friedman JH,  Olshen  RA  (1984).  Classification  and  Regression  Trees. \nPacific  Grove,  CA:  Wadsworth and  Brooks/Cole. \n\nCarpenter  GA,  Grossberg  S,  Rosen  DB  (1991).  Fuzzy  ART:  Fast  stable  learning \nand  categorization  of analog  patterns  by  an  adaptive  resonance  system.  Neural \nNetworks 4:759-77l. \n\nGabor  AJ,  M.  Seyal  M  (1992)  .  Automated  interictal  EEG  spike  detection  using \nartificial neural networks.  Electroencephalogr  Clin  Neurophysiology 83 :271-80. \n\nGoldberg  V,  Manduca A,  Ewert  DL  (1992).  Improvement  in  specificity  of ultra(cid:173)\nsonography for  diagnosis  of breast  tumors  by  means  of artificial  intelligence.  Med \nPhys 19:1275-8l. \n\nHanley J A, McNeil BJ (1982).  The meaning of the use of the area under the receiver \noperating  characteristic  (ROC)  curve.  Radiology  143:29-36. \n\nHarrell  FE,  Lee  KL,  Pollock  BG  (1988).  Regression  models  in  clinical  studies: \ndetermining  relationships  between  predictors  and  response.  J  Natl  Cancer  Instit \n80:1198-1202. \n\nJollife IT (1986).  Principal Component  Analysis.  New  York:  Springer-Verlag, 1986. \n\nLeong  PH,  J abri  MA  (1982).  MATIC  - an  intracardiac  tachycardia classification \nsystem.  PACE 15:1317-31,1982. \nO'Leary TJ, Mikel UV,  Becker  RL (1992).  Computer-assisted image interpretation: \nuse of a neural network to differentiate tubular carcinoma from sclerosing adenosis. \nModern  Pathol 5:402-5. \n\nPalombo SR (1992).  Connectivity and condensation in dreaming.  JAm Psychoanal \nAssoc 40:1139-59. \n\nS-PLUS (1991),  v 3.0.  Seattle,  WA;  Statistical Sciences,  Inc. \n\nSpecht  DF (1990).  Probabilistic neural  networks.  Neural  Networks  3:109-18. \n\nTourassi  GD,  Floyd  CE,  Sostman  HD,  Coleman  RE  (1993).  Acute  pulmonary \nembolism:  artificial neural network  approach for  diagnosis.  Radiology  189:555-58. \n\nWeinstein  IN,  Kohn  KW,  Grever  MR et.  al.  (1992)  Neural  computing  in  cancer \ndrug  development:  predicting mechanism of action.  Science 258:447-51. \nWestenskow  DR,  Orr JA,  Simon  FH  (1992) .  Intelligent  alarms reduce  anesthesiol(cid:173)\nogist's response  time to critical faults.  Anesthesiology  77:1074-9,  1992. \n\n\f\f", "award": [], "sourceid": 988, "authors": [{"given_name": "Harry", "family_name": "Burke", "institution": null}, {"given_name": "David", "family_name": "Rosen", "institution": null}, {"given_name": "Philip", "family_name": "Goodman", "institution": null}]}