{"title": "Confidence Intervals for the Area Under the ROC Curve", "book": "Advances in Neural Information Processing Systems", "page_first": 305, "page_last": 312, "abstract": null, "full_text": " Confidence Intervals for the Area under the\n ROC Curve\n\n\n\n Corinna Cortes Mehryar Mohri\n Google Research Courant Institute, NYU\n 1440 Broadway 719 Broadway\n New York, NY 10018 New York, NY 10003\n corinna@google.com mohri@cs.nyu.edu\n\n\n\n\n Abstract\n\n In many applications, good ranking is a highly desirable performance for\n a classifier. The criterion commonly used to measure the ranking quality\n of a classification algorithm is the area under the ROC curve (AUC). To\n report it properly, it is crucial to determine an interval of confidence for\n its value. This paper provides confidence intervals for the AUC based\n on a statistical and combinatorial analysis using only simple parameters\n such as the error rate and the number of positive and negative examples.\n The analysis is distribution-independent, it makes no assumption about\n the distribution of the scores of negative or positive examples. The results\n are of practical use and can be viewed as the equivalent for AUC of the\n standard confidence intervals given in the case of the error rate. They\n are compared with previous approaches in several standard classification\n tasks demonstrating the benefits of our analysis.\n\n\n1 Motivation\n\nIn many machine learning applications, the ranking quality of a classifier is critical. For\nexample, the ordering of the list of relevant documents returned by a search engine or\na document classification system is essential. The criterion widely used to measure the\nranking quality of a classification algorithm is the area under an ROC curve (AUC). But, to\nmeasure and report the AUC properly, it is crucial to determine an interval of confidence\nfor its value as it is customary for the error rate and other measures. It is also important to\nmake the computation of the confidence interval practical by relying only on a small and\nsimple number of parameters. In the case of the error rate, such intervals are often derived\nfrom just the sample size N .\n\nWe present an extensive theoretical analysis of the AUC and show that a similar confidence\ninterval can be derived for its value using only simple parameters such as the error rate k/N ,\nthe number of positive examples m, and the number of negative examples n = N - m.\nThus, our results extend to AUC the computation of confidence intervals from a small\nnumber of readily available parameters.\n\nOur analysis is distribution-independent in the sense that it makes no assumption about the\ndistribution of the scores of negative or positive examples. The use of the error rate helps\ndetermine tight confidence intervals. This contrasts with existing approaches presented in\nthe statistical literature [11, 5, 2] which are based either on weak distribution-independent\nassumptions resulting in too loose confidence intervals, or strong distribution-dependent\nassumptions leading to tight but unsafe confidence intervals.\n\n\f\nWe show that our results are of practical use. We also compare them with previous ap-\nproaches in several standard classification tasks demonstrating the benefits of our analysis.\nOur results are also useful for testing the statistical significance of the difference of the\nAUC values of two classifiers.\n\nThe paper is organized as follows. We first introduce the definition of the AUC, its con-\nnection with the Wilcoxon-Mann-Whitney statistic (Section 2), and briefly review some\nessential aspects of the existing literature related to the computation of confidence intervals\nfor the AUC. Our computation of the expected value and variance of the AUC for a fixed\nerror rate requires establishing several combinatorial identities. Section 4 presents some\nexisting identities and gives the proof of novel ones useful for the computation of the vari-\nance. Section 5 gives the reduced expressions for the expected value and variance of the\nAUC for a fixed error rate. These can be efficiently computed and used to determine our\nconfidence intervals for the AUC (Section 6). Section 7 reports the result of the comparison\nof our method with previous approaches, including empirical results for several standard\ntasks.\n\n\n2 Definition and Properties of the AUC\n\nThe Receiver Operating Characteristics (ROC) curves were originally introduced in signal\ndetection theory [6] in connection with the study of radio signals, and have been used\nsince then in many other applications, in particular for medical decision-making. Over the\nlast few years, they have found increased interest in the machine learning and data mining\ncommunities for model evaluation and selection [14, 13, 7, 12, 16, 3]. The ROC curve for\na binary classification problem plots the true positive rate as a function of the false positive\nrate. The points of the curve are obtained by sweeping the classification threshold from the\nmost positive classification value to the most negative. For a fully random classification,\nthe ROC curve is a straight line connecting the origin to (1, 1). Any improvement over\nrandom classification results in an ROC curve at least partially above this straight line. The\nAUC is defined as the area under the ROC curve.\n\nConsider a binary classification task with m positive examples and n negative examples.\nLet C be a fixed classifier that outputs a strictly ordered list for these examples. Let\nx1, . . . , xm be the output of C on the positive examples and y1, . . . , yn its output on the\nnegative examples and denote by 1X the indicator function of a set X. Then, the AUC, A,\nassociated to C is given by:\n\n m n 1x\n A = i=1 j=1 i >yj (1)\n mn\n\nwhich is the value of the Wilcoxon-Mann-Whitney statistic [10]. Thus, the AUC is closely\nrelated to the ranking quality of the classification. It can be viewed as a measure based on\npairwise comparisons between classifications of the two classes. It is an estimate of the\nprobability Pxy that the classifier ranks a randomly chosen positive example higher than\na negative example. With a perfect ranking, all positive examples are ranked higher than\nthe negative ones and A = 1. Any deviation from this ranking decreases the AUC, and the\nexpected AUC value for a random ranking is 0.5.\n\n\n3 Overview of Related Work\n\nThis section briefly describes some previous distribution-dependent approaches presented\nin the statistical literature to derive confidence intervals for the AUC and compares them\nto our method. The starting point for these analyses is a formula giving the variance of the\nAUC, A, for a fixed distribution of the scores Px of the positive examples and Py of the\nnegative examples [10, 1]:\n\n A(1\n 2 - A) + (m - 1)(Pxxy - A2) + (n - 1)(Pxyy - A2)\n A = (2)\n mn\n\n\f\nwhere Pxxy is the probability that the classifier ranks two randomly chosen positive exam-\nples higher than a negative one, and Pxyy the probability that it ranks two randomly chosen\nnegative examples lower than a positive one. To compute the variance exactly using Equa-\ntion 2, the distributions Px and Py must be known.\n\nHanley and McNeil [10] argue in favor of exponential distributions, loosely claiming that\nthis upper-bounds the variance of normal distributions with various means and ratios of\nvariances. They show that for exponential distributions Pxxy = A and .\n 2 P\n -A xyy = 2A2\n 1+A\nThe resulting confidence intervals are of course relatively tight, but their validity is ques-\ntionable since they are based on a strong assumption about the distributions of the positive\nand negative scores that may not hold in many cases.\n\nAn alternative considered by several authors to the exact computation of the variance is to\ndetermine instead the maximum of the variance over all possible continuous distributions\nwith the same expected value of the AUC. For all such distributions, one can fix m and\nn and compute the expected AUC and its variance. The maximum variance is denoted by\n2max and is given by [5, 2]:\n\n A(1 1\n 2 - A)\n max = (3)\n min {m,n} 4min{m,n}\nUnfortunately, this often yields loose confidence intervals of limited practical use.\n\nOur approach for computing the mean and variance of the AUC is distribution-independent\nand inspired by the machine learning literature where analyses typically center on the error\nrate. We require only that the error rate be measured and compute the mean and variance of\nthe AUC over all distributions Px and Py that maintain the same error rate. Our approach\nis in line with that of [5, 2] but it crucially avoids considering the maximum of the vari-\nance. We show that it is possible to compute directly the mean and variance of the AUC\nassigning equal weight to all the possible distributions. Of course, one could argue that not\nall distributions Px and Py are equally probable, but since these distributions are highly\nproblem-dependent, we find it risky to make any general assumption on the distributions\nand thereby limit the validity of our results. Our approach is further justified empirically\nby the experiments reported in the last section.\n\n\n4 Combinatorial Analysis\n\nThe analysis of the statistical properties of the AUC given a fixed error rate requires various\ncombinatorial calculations. This section describes several of the combinatorial identities\nthat are used in our computation of the confidence intervals. For all q 0, let Xq(k, m, n)\nbe defined by:\n k M M\n Xq(k, m, n) = xq (4)\n x x\n x=0\n\nwhere M = m - (k - x) + x, M = n + (k - x) - x, and x = k - x. In previous work,\nwe derived the following two identities which we used to compute the expected value of\nthe AUC [4]:\n\n k k\n n + m + 1 (k n + m + 1\nX - x)(m - n) + k\n 0(k, m, n) = X\n x 1(k, m, n) = 2 x\n x=0 x=0\n\nTo simplify the expression of the variance of the AUC, we need to compute X2(k, m, n).\n\nProposition 1 Let k, m, n be non-negative integers such that k min{m, n}, then:\n k m + n + 1\n X2(k, m, n) = P2(k, m, n, x) (5)\n x\n x=0\n\n\f\nwhere P2 is the following 4th-degree polynomial: P2(k, m, n, x) = (k - x)/12(-2x3 +\n2x2(2m - n + 2k - 4) + x(-3m2 + 3nm + 3m - 5km - 2k2 + 2 + k + nk + 6n) +\n(3(k - 1)m2 - 3nm(k - 1) + 6km + 5m + k2m + 8n + 8 - 9nk + 3k + k2 + k2n)).\nProof. The proof of the proposition is left to a longer version of this paper.\n\n\n5 Expectation and Variance of the AUC\n\nThis section presents the expression of the expectation and variance of the AUC for a fixed\nerror rate k/N assuming that all classifications or rankings with k errors are equiprobable.\nFor a given classification, there may be x, 0 x k, false positive examples. Since the\nnumber of errors is fixed, there are x = k -x false negative examples. The expression Xq\ndiscussed in the previous section represents the q-th moment of x over all classifications\nwith exactly k errors. In previous work, we gave the exact expression of the expectation of\nthe AUC for a fixed number of errors k:\n\nProposition 2 ([4]) Assume that a binary classification task with m positive examples\nand n negative examples is given. Then, the expected value of the AUC, A, over all\nclassifications with k errors is given by:\n\n k (n k k-1 m+n\n E[A] = 1 - - m)2(m + n + 1) x=0 x .\n m + n - 4mn m + n - k m+n+1\n x=0 x\n\nNote that the two sums in this expression cannot be further simplified since they are known\nnot to admit a closed form [9]. We also gave the expression of the variance of the AUC in\nterms of the function F defined for all Y by:\n\n k M M Y\n F (Y ) = x=0 x x . (6)\n k M M\n x=0 x x\n\nThe following proposition reproduces that result:\n\nProposition 3 ([4]) Assume that a binary classification task with m positive examples\nand n negative examples is given. Then, the variance of the AUC A over all classifica-\n x + k-x x + k-x\ntions with k errors is given by: 2(A) = F ((1 - n m )2) n m ))2 +\n 2 - F((1 - 2\nF ( mx2+n(k-x)2+(m(m+1)x+n(n+1)(k-x))-2x(k-x)(m+n+1) ).\n 12m2n2\n\nBecause of the products of binomial terms, the computation of the variance using this\nexpression is inefficient even for relatively small values of m and n. This expression can\nhowever be reduced using the identities presented in the previous section which leads to\nsignificantly more efficient computations that we have been using in all our experiments.\n\nCorollary 1 ([4]) Assume that a binary classification task with m positive examples and n\nnegative examples is given. Then, the variance of the AUC A over all classifications with\nk errors is given by: 2(A) = (m+n+1)(m+n)(m+n-1)T ((m+n-2)Z4-(2m-n+3k-10)Z3) +\n 72m2n2\n(m+n+1)(m+n)T (m2-nm+3km-5m+2k2-nk+12-9k)Z2\n 48m2n2 - (m+n+1)2(m-n)4Z21\n 16m2n2 -\n(m+n+1)Q1Z1 + kQ0 with:\n 72m2n2 144m2n2\n Pk-i (m+n+1-i)\nZ x=0 x\n i = , T = 3((m\n Pk (m+n+1) - n)2 + m + n) + 2, and:\n x=0 x\n\n\nQ0 = (m + n + 1)T k2 + ((-3n2 + 3mn + 3m + 1)T - 12(3mn + m + n) - 8)k + (-3m2 +\n7m + 10n + 3nm + 10)T - 4(3mn + m + n + 1)\n\nQ1 = T k3 + 3(m - 1)T k2 + ((-3n2 + 3mn - 3m + 8)T - 6(6mn + m + n))k + (-3m2 +\n7(m + n) + 3mn)T - 2(6mn + m + n)\n\n\f\nProof. The expression of the variance given in Proposition 3 requires the computation\nof Xq(k, m, n), q = 0, 1, 2. Using the identities giving the expressions of X0 and X1 and\nProposition 1, which provides the expression of X2, 2(A) can be reduced to the expression\ngiven by the corollary.\n\n\n6 Theory and Analysis\n\nOur estimate of the confidence interval for the AUC is based on a simple and natural as-\nsumption. The main idea for its computation is the following. Assume that a confidence\ninterval E = [e1, e2] is given for the error rate of a classifier C over a sample S, with the\nconfidence level 1 - . This interval may have have been derived from a binomial model of\nC, which is a standard assumption for determining a confidence interval for the error rate,\nor from any other model used to compute that interval. For a given error rate e E, or\nequivalently for a given number of misclassifications, we can use the expectation and vari-\nance computed in the previous section and Chebyshev's inequality to predict a confidence\ninterval Ae for the AUC at the confidence level 1 - . Since our equiprobable model for\nthe classifications is independent of the model used to compute the interval of confidence\nfor the error rate, we can use E and Ae, e E, to compute a confidence interval of the\nAUC at the level (1 - )(1 - ).\nTheorem 1 Let C be a binary classifier and let S be a data sample of size N with m\npositive examples and n negative examples, N = m + n. Let E = [e1, e2] be a confidence\ninterval for the error rate of C over S at the confidence level 1 - . Then, for any ,\n0 1, we can compute a confidence interval for the AUC value of the classifier C at\nthe confidence level (1 - )(1 - ) that depends only on , , m, n, and the interval E.\nProof. Let k1 = N e1 and k2 = N e2 be the number of errors associated to the error rates\ne1 and e2, and let IK be the interval IK = [k1, k2]. For a fixed k IK, by Propositions\n2 and Corollary 1, we can compute the exact value of the expectation E[Ak] and variance\n2(Ak) of the AUC Ak. Using Chebyshev's inequality, for any k IK and any k > 0,\n (Ak)\n P |Ak - E[Ak]| k (7)\n k\n\nwhere E[Ak] and (Ak) are the expressions given in Propositions 2 and Corollary 1, which\ndepend only on k, m, and n. Let 1 and 2 be defined by:\n (A (A\n k ) k )\n 1 = min E[Ak] - 2 = max E[Ak] + (8)\n kIK k kIK k\n1 and 2 only depend on IK (i.e., on e1 and e2), and on k, m, and n. Let IA be the\nconfidence interval defined by IA = [1, 2] and let k = for all k IK. Using the\nfact that the confidence interval E is independent of our equiprobability model for fixed-k\nAUC values and the Bayes' rule:\n\n P(A IA) = P (A IA | K = k)P(K = k) (9)\n kR+\n P (A IA | K = k)P(K = k) (10)\n kIK\n (1 - ) P (K = k) (1 - )(1 - ) (11)\n kIK\nwhere we used the property of Eq. 7 and the definitions of the intervals IK and IA. Thus,\nIA constitutes a confidence interval for the AUC value of C at the confidence level (1 -\n )(1 - ).\nIn practice, the confidence interval E is often determined as a result of the assumption that\nC follows a binomial law. This leads to the following theorem.\n\n\f\n .020\n .035\n\n\n .030\n\n .015\n\n .025\n\n\n .020\n .010\n\n .015\n Max Max\n Standard deviation Distribution-dependent Distribution-dependent\n .005 Standard deviation\n Distribution-independent .010 Distribution-independent\n\n .005\n\n\n\n 0.75 0.80 0.85 0.90 0.95 1.00 0.6 0.7 0.8 0.9 1.0\n\n AUC AUC\n\n (a) (b)\n\nFigure 1: Comparison of the standard deviations for three different methods with: (a) m = n = 500;\n(b) m = 400 and n = 200. The curves are obtained by computing the expected AUC and its standard\ndeviations for different values of the error rate using the maximum-variance approach (Eq. 3), our\ndistribution-independent method, and the distribution-dependent approach of Hanley [10].\n\n\nTheorem 2 Let C be a binary classifier, let S be a data sample of size N with m positive\nexamples and n negative examples, N = m + n, and let k0 be the number of misclassifica-\ntions of C on S. Assume that C follows a binomial law, then, for any , 0 1, we can\ncompute a confidence interval of the AUC value of the classifier C at the confidence level\n1 - that depends only on , k0, m, and n.\nProof. Assume that C follows a binomial law with coefficient p. Then, Chebyshev's\ninequality yields:\n p(1 1\n P(|C - E[C]| ) - p) (12)\n N 2 4N2\nThus, E = [ k0 , k0 + 1 ] forms a confidence interval for the\n N - 1\n \n 2 (1 N \n - 1- )N 2 (1- 1- )N\nerror rate of C at the confidence level 1 - . By Theorem 1, we can compute for the\nAUC value a confidence interval at the level (1 -(1-1 - ))(1-(1-1 - )) = 1-\ndepending only on , m, n, and the interval E, i.e., k0, N = m + n, and .\nFor large N , we can use the normal approximation of the binomial law to determine a finer\ninterval E. Indeed, for large N ,\n \n P(|C - E[C]| ) 2(2 N) (13)\n 1 \n - ) 1- )\nwith (u) = e-x2/2 dx. Thus, E = [k0 2 , k0 + -1( 1- 2 ] is the\n u 2 N - -1( 1-\n 2N N 2N\nconfidence interval for the error rate at the confidence level 1 - .\nFor simplicity, in the proof of Theorem 2, k was chosen to be a constant ( k = ) but, in\ngeneral, it can be another function of k leading to tighter confidence intervals. The results\npresented in the next section were obtained with k = a0 exp((k - k0)2/2a21), where a0\nand a1 are constants selected so that the inequality 11 be verified.\n\n\n7 Experiments and Comparisons\n\nThe analysis in the previous section provides a principled method for computing a confi-\ndence interval of the AUC value of a classier C at the confidence level 1 - that depends\nonly on k, n and m. As already discussed, other expressions found in the statistical liter-\nature lead to either too loose or unsafely narrow confidence intervals based on question-\nable assumptions on the probability functions Px and Py [10, 15]. Figure 1 shows a\ncomparison of the standard deviations obtained using the maximum-approach (Eq. 3), the\ndistribution-dependent expression from [10], and our distribution-independent method for\n\n\f\n NAME m + n n AUC k \n m+n m+n indep A dep max\n\n pima 368 0.63 0.70 0.24 0.0297 0.0440 0.0269 0.0392\n yeast 700 0.67 0.63 0.26 0.0277 0.0330 0.0215 0.0317\n credit 303 0.54 0.87 0.13 0.0176 0.0309 0.0202 0.0281\n internet-ads 1159 0.17 0.85 0.05 0.0177 0.0161 0.0176 0.0253\n page-blocks 2473 0.10 0.84 0.03 0.0164 0.0088 0.0161 0.0234\n ionosphere 201 0.37 0.85 0.13 0.0271 0.0463 0.0306 0.0417\n\n\n\nTable 1: Accuracy and AUC values for AdaBoost [8] and estimated standard deviations for several\ndatasets from the UC Irvine repository. indep is a distribution-independent standard deviation ob-\ntained using our method (Theorem 2). A is given by Eq. (2) with the values of A, P , and\n xxy Pxyy\nderived from data. dep is the distribution-dependent standard deviation of Hanley [10], which is\nbased on assumptions that may not always hold. max is defined by Eq. (3). All results were obtained\non a randomly selected test set of size m + n.\n\n\n\nvarious error rates. For m = n = 500, our distribution-independent method consistently\nleads to tighter confidence intervals (Fig. 1 (a)). It also leads to tighter confidence inter-\nvals for AUC values more than .75 for the uneven distribution m = 400 and n = 200\n(Fig. 1 (b)). For lower AUC values, the distribution-dependent approach produces tighter\nintervals, but its underlying assumptions may not hold.\n\nA different comparison was made using several datasets available from the UC Irvine repos-\nitory (Table 1). The table shows that our estimates of the standard deviations (indep) are in\ngeneral close to or tighter than the distribution-dependent standard deviation dep of Hanley\n[10]. This is despite we do not make any assumption about the distributions of positive\nand negative examples. In contrast, Hanley's method is based on specific assumptions\nabout these distributions. Plots of the actual ranking distribution demonstrate that these\nassumptions are often violated however. Thus, the relatively good performance of Han-\nley's approach on several data sets can be viewed as fortuitous and is not general. Our\ndistribution-independent method provides tight confidence intervals, in some cases tighter\nthan those derived from A, in particular because it exploits the information provided by\nthe error rate. Our analysis can also be used to determine if the AUC values produced by\ntwo classifiers are statistically significant by checking if the AUC value of one falls within\nthe confidence interval of the other.\n\n\n8 Conclusion\n\nWe presented principled techniques for computing useful confidence intervals for the AUC\nfrom simple parameters: the error rate, and the negative and positive sample sizes. We\ndemonstrated the practicality of these confidence intervals by comparing them to previous\napproaches in several tasks. We also derived the exact expression of the variance of the\nAUC for a fixed k, which can be of interest in other analyses related to the AUC.\n\nThe Wilcoxon-Mann-Whitney statistic is a general measure of the quality of a ranking that\nis an estimate of the probability that the classifier ranks a randomly chosen positive ex-\nample higher than a negative example. One could argue that accuracy at the top or the\nbottom of the ranking is of higher importance. This, however, contrarily to some belief,\nis already captured to a certain degree by the definition of the Wilcoxon-Mann-Whitney\nstatistic which penalizes more errors at the top or the bottom of the ranking. It is how-\never an interesting research problem to determine how to incorporate this bias in a stricter\nway in the form of a score-specific weight in the ranking measure, a weighted Wilcoxon-\nMann-Whitney statistic, or how to compute the corresponding expected value and standard\ndeviation in a general way and design machine learning algorithms to optimize such a mea-\n\n\f\nsure. A preliminary analysis suggests, however, that the calculation of the expectation and\nthe variance are likely to be extremely complex in that case. Finally, it could also be in-\nteresting but difficult to adapt our results to the distribution-dependent case and compare\nthem to those of [10].\n\n\nAcknowledgments\n\nWe thank Rob Schapire for pointing out to us the problem of the statistical significance\nof the AUC, Daryl Pregibon for the reference to [11], and Saharon Rosset for various\ndiscussions about the topic of this paper.\n\n\nReferences\n\n [1] D. Bamber. The Area above the Ordinal Dominance Graph and the Area below the\n Receiver Operating Characteristic Graph. Journal of Math. Psychology, 12, 1975.\n\n [2] Z. W. Birnbaum and O. M. Klose. Bounds for the Variance of the Mann-Whitney\n Statistic. Annals of Mathematical Statistics, 38, 1957.\n\n [3] J-H. Chauchat, R. Rakotomalala, M. Carloz, and C. Pelletier. Targeting Customer\n Groups using Gain and Cost Matrix; a Marketing Application. Technical report, ERIC\n Laboratory - University of Lyon 2, 2001.\n\n [4] Corinna Cortes and Mehryar Mohri. AUC Optimization vs. Error Rate Minimization.\n In Advances in Neural Information Processing Systems (NIPS 2003), volume 16, Van-\n couver, Canada, 2004. MIT Press.\n\n [5] D. Van Dantzig. On the Consistency and Power of Wilcoxon's Two Sample Test. In\n Koninklijke Nederlandse Akademie van Weterschappen, Series A, volume 54, 1915.\n\n [6] J. P. Egan. Signal Detection Theory and ROC Analysis. Academic Press, 1975.\n\n [7] C. Ferri, P. Flach, and J. Hernandez-Orallo. Learning Decision Trees Using the Area\n Under the ROC Curve. In Proceedings of the 19th International Conference on Ma-\n chine Learning. Morgan Kaufmann, 2002.\n\n [8] Yoav Freund and Robert E. Schapire. A Decision Theoretical Generalization of On-\n Line Learning and an Application to Boosting. In Proceedings of the Second Euro-\n pean Conference on Computational Learning Theory, volume 2, 1995.\n\n [9] Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. Concrete Mathematics.\n Addison-Wesley, Reading, Massachusetts, 1994.\n\n[10] J. A. Hanley and B. J. McNeil. The Meaning and Use of the Area under a Receiver\n Operating Characteristic (ROC) Curve. Radiology, 1982.\n\n[11] E. L. Lehmann. Nonparametrics: Statistical Methods Based on Ranks. Holden-Day,\n San Francisco, California, 1975.\n\n[12] M. C. Mozer, R. Dodier, M. D. Colagrosso, C. Guerra-Salcedo, and R. Wolniewicz.\n Prodding the ROC Curve: Constrained Optimization of Classifier Performance. In\n Neural Information Processing Systems (NIPS 2002). MIT Press, 2002.\n\n[13] C. Perlich, F. Provost, and J. Simonoff. Tree Induction vs. Logistic Regression: A\n Learning Curve Analysis. Journal of Machine Learning Research, 2003.\n\n[14] F. Provost and T. Fawcett. Analysis and Visualization of Classifier Performance:\n Comparison under Imprecise Class and Cost Distribution. In Proceedings of the Third\n International Conference on Knowledge Discovery and Data Mining. AAAI, 1997.\n\n[15] Saharon Rosset. Ranking-Methods for Flexible Evaluation and Efficient Comparison\n of 2-Class Models. Master's thesis, Tel-Aviv University, 1999.\n\n[16] L. Yan, R. Dodier, M. C. Mozer, and R. Wolniewicz. Optimizing Classifier Perfor-\n mance via the Wilcoxon-Mann-Whitney Statistics. In Proceedings of the Interna-\n tional Conference on Machine Learning, 2003.\n\n\f\n", "award": [], "sourceid": 2645, "authors": [{"given_name": "Corinna", "family_name": "Cortes", "institution": null}, {"given_name": "Mehryar", "family_name": "Mohri", "institution": null}]}