{"title": "History distribution matching method for predicting effectiveness of HIV combination therapies", "book": "Advances in Neural Information Processing Systems", "page_first": 424, "page_last": 432, "abstract": "This paper presents an approach that predicts the effectiveness of HIV combination therapies by simultaneously addressing several problems affecting the available HIV clinical data sets: the different treatment backgrounds of the samples, the uneven representation of the levels of therapy experience, the missing treatment history information, the uneven therapy representation and the unbalanced therapy outcome representation. The computational validation on clinical data shows that, compared to the most commonly used approach that does not account for the issues mentioned above, our model has significantly higher predictive power. This is especially true for samples stemming from patients with longer treatment history and samples associated with rare therapies. Furthermore, our approach is at least as powerful for the remaining samples.", "full_text": "History distribution matching method for predicting\n\neffectiveness of HIV combination therapies\n\nJasmina Bogojeska\n\nMax-Planck Institute for Computer Science\n\nCampus E1 4\n\n66123 Saarbr\u00a8ucken, Germany\n\njasmina@mpi-inf.mpg.de\n\nAbstract\n\nThis paper presents an approach that predicts the effectiveness of HIV combina-\ntion therapies by simultaneously addressing several problems affecting the avail-\nable HIV clinical data sets: the different treatment backgrounds of the samples, the\nuneven representation of the levels of therapy experience, the missing treatment\nhistory information, the uneven therapy representation and the unbalanced ther-\napy outcome representation. The computational validation on clinical data shows\nthat, compared to the most commonly used approach that does not account for\nthe issues mentioned above, our model has signi\ufb01cantly higher predictive power.\nThis is especially true for samples stemming from patients with longer treatment\nhistory and samples associated with rare therapies. Furthermore, our approach is\nat least as powerful for the remaining samples.\n\n1\n\nIntroduction\n\nAccording to [18], more than 33 million people worldwide are infected with the human immunod-\ne\ufb01ciency virus (HIV), for which there exists no cure. HIV patients are treated by administration of\ncombinations of antiretroviral drugs, which succeed in suppressing the virus much longer than the\nmonotherapies based on a single drug. Eventually, the drug combinations also become ineffective\nand need to be replaced. On such occasion, the very large number of potential therapy combinations\nmakes the manual search for an effective therapy increasingly impractical. The search is particulary\nchallenging for patients in the mid to late stages of antiretroviral therapy because of the accumulated\ndrug resistance from all previous therapies. The availability of large clinical data sets enables the\ndevelopment of statistical methods that offer an automated procedure for predicting the outcome\nof potential antiretroviral therapies. An estimate of the therapy outcome can assist physicians in\nchoosing a successful regimen for an HIV patient.\nHowever, the HIV clinical data sets suffer from several problems. First of all, the clinical data\ncomprise therapy samples that originate from patients with different treatment backgrounds. Also\nthe various levels of therapy experience ranging from therapy-na\u00a8\u0131ve to heavily pretreated are repre-\nsented with different sample abundances. Second, the samples on different combination therapies\nhave widely differing frequencies. In particular, many therapies are only represented with very few\ndata points. Third, the clinical data do not necessarily have the complete information on all admin-\nistered HIV therapies for all patients and the information on whether all administered therapies is\navailable or not is also missing for many of the patients. Finally, the imbalance between the effec-\ntive and the ineffective therapies is increasing over time: due to the knowledge acquired from HIV\nresearch and clinical practice the quality of treating HIV patients has largely increased in the recent\nyears rendering the amount of effective therapies in recently collected data samples much larger\nthan the amount of ineffective ones. These four problems create bias in the data sets which might\nnegatively affect the usefulness of the derived statistical models.\n\n1\n\n\fIn this paper we present an approach that addresses all these problems simultaneously. To tackle the\nissues of the uneven therapy representation and the different treatment backgrounds of the samples,\nwe use information on both the current therapy and the patient\u2019s treatment history. Additionally, our\nmethod uses a distribution matching approach to account for the problems of missing information in\nthe treatment history and the growing gap between the abundances of effective and ineffective HIV\ntherapies over time. The performance of our history distribution matching approach is assessed by\ncomparing it with two common reference methods in the so called time-oriented validation scenario,\nwhere all models are trained on data from the more distant past, while their performance is assessed\non data from the more recent past. In this way we account for the evolving trends in composing drug\ncombination therapies for treating HIV patients.\n\nRelated work. Various statistical learning methods, including arti\ufb01cial neural networks, decision\ntrees, random forests, support vector machines (SVMs) and logistic regression [19, 11, 14, 10, 16,\n1, 15], have been used to predict the effectiveness of HIV combination therapies from clinical data.\nNone of these methods considers the problems affecting the available clinical data sets: different\ntreatment backgrounds of the samples, uneven representations of therapies and therapy outcomes,\nand incomplete treatment history information. Some approaches [2, 4] deal with the uneven therapy\nrepresentation by training a separate model for each combination therapy on all available samples\nwith properly derived sample weights. The weights re\ufb02ect the similarities between the target therapy\nand all training therapies. However, the therapy-speci\ufb01c approaches do not address the bias orig-\ninating from the different treatment backgrounds of the samples, or the missing treatment history\ninformation.\n\n2 Problem setting\n\nLet z denote a therapy sample that comprises the viral genotype g represented as a binary vector in-\ndicating the occurrence of a set of resistance-relevant mutations, the therapy combination z encoded\nas a binary vector that indicates the individual drugs comprising the current therapy, the binary vec-\ntor h representing the drugs administered in all known previous therapies, and the label y indicating\nthe success (1) or failure (\u22121) of the therapy z. Let D = {(g1, z1, h1, y1), . . . , (gm, zm, hm, ym)}\ndenote the training set and let s refer to the therapy sample of interest. Let start(s) refer to the point\nof time when the therapy s was started and patient(s) refer to the patient identi\ufb01er corresponding\nto the therapy sample s. Then:\n\nr(s) = {z | (start(z) \u2264 start(s)) and (patient(z) = patient(s))}\n\ndenotes the complete treatment data associated with the therapy sample s and will be referred to as\ntherapy sequence. It contains all known therapies administered to patient(s) not later than start(s)\nordered by their corresponding starting times. We point out that each therapy sequence also contains\nthe current therapy, i.e., the most recent therapy in the therapy sequence r(s) is s. Our goal is to train\na model f (g, s, h) that addresses the different types of bias associated with the available clinical data\nsets when predicting the outcome of the therapy s. In the rest of the paper we denote the set of input\nfeatures (g, s, h) by x.\n\n3 History distribution matching method\n\nThe main idea behind the history distribution matching method we present in this paper is that the\npredictions for a given patient should originate from a model trained using samples from patients\nwith treatment backgrounds similar as the one of the target patient. The details of this method are\nsummarized in Algorithm 1. In what follows, we explain each step of this algorithm.\n\n3.1 Clustering based on similarities of therapy sequences\n\nClustering partitions a set of objects into clusters, such that the objects within each cluster are more\nsimilar to one another than to the objects assigned to a different cluster [7].\nIn the \ufb01rst step of\nAlgorithm 1, all available training samples are clustered based on the pairwise dissimilarity of their\ncorresponding therapy sequences. In the following, we \ufb01rst describe a similarity measure for therapy\nsequences and then present the details of the clustering.\n\n2\n\n\fAlgorithm 1: History distribution matching method\n\n1. Cluster the training samples by using the pairwise dissimilarities of their corresponding\n\ntherapy sequences.\n\n2. For each (target) cluster:\n\n\u2022 Compute sample weights that match the distribution of all available training\n\u2022 Train a sample-weighted logistic regression model using the sample weights\n\nsamples to the distribution of samples in the target cluster.\n\ncomputed in the previous distribution matching step.\n\nSimilarity of therapy sequences.\nIn order to quantify the pairwise similarity of therapy sequences\nwe use a slightly modi\ufb01ed version of the alignment similarity measure introduced in [5]. It adapts\nsequence alignment techniques [13] to the problem of aligning therapy sequences by considering the\nspeci\ufb01c therapies given to a patient, their respective resistance-relevant mutations, the order in which\nthey were applied and the length of the therapy history. The alphabet used for the therapy sequence\nalignment comprises all distinct drug combinations making up the clinical data set. The pairwise\nsimilarities between the different drug combinations are quanti\ufb01ed with the resistance mutations\nkernel [5], which uses the table of resistance-associated mutations of each drug afforded by the\nInternational AIDS society [8]. First, binary vectors indicating resistance-relevant mutations for the\nset of drugs occurring in a combination are calculated for each therapy. Then, the similarity score\nof two therapies of interest is computed as normalized inner product between their corresponding\nresistance mutation vectors. In this way, the therapy similarity also accounts for the similarity of the\ngenetic \ufb01ngerprint of the potential latent virus populations of the compared therapies. Each therapy\nsequence ends with the current (most recent) therapy \u2013 the one that determines the label of the sample\nand the sequence alignment is adapted such that the most recent therapies are always matched.\nTherefore, it also accounts for the problem of uneven representation of the different therapies in the\nclinical data. It has one parameter that speci\ufb01es the linear gap cost penalty.\nFor the history distribution matching method, we modi\ufb01ed the alignment similarity kernel described\nin the paragraph above such that it also takes the importance of the different resistance-relevant mu-\ntations into account. This is achieved by updating the resistance mutations kernel, where instead of\nusing binary vectors that indicate the occurrence of a set of resistance-relevant mutations, we use\nvectors that indicate their importance. If two or more drugs from a certain drug group, that com-\nprise a target therapy share a resistance mutation, then we consider its maximum importance score.\nImportance scores for the resistance-relevant mutations are derived from in-vivo experiments and\ncan be obtained from the Stanford University HIV Drug Resistance Database [12]. Furthermore, we\nwant to keep the cluster similarity measure parameter-free, such that in the process of model selec-\ntion the clustering Step 1 in Algorithm 1 is decoupled from the Step 2 and is computed only once.\nThis is achieved by computing the alignments with zero gap costs and ensures time-ef\ufb01cient model\nselection procedure. However, in this case only the similarities of the matched therapies comprising\nthe two compared therapy sequences contribute to the similarity score and thus the differing lengths\nof the therapy sequences are not accounted for. Having a clustering similarity measure that addresses\nthe differing therapy lengths is important for tackling the uneven sample representation with respect\nto the level of therapy experience. In order to achieve this we normalize each pairwise similarity\nscore with the length of the longer therapy sequence. This yields pairwise similarity values in the\ninterval [0, 1] which can easily be converted to dissimilarity values in the same range by subtracting\nthem from 1.\n\nClustering. Once we have a measure of dissimilarity of therapy sequences, we cluster our data\nusing the most popular version of K-medoids clustering [7], referred to as partitioning around\nmedoids (PAM) [9]. The main reason why we choose this approach instead of the simpler K-means\nclustering [7] is that it can use any precomputed dissimilarity matrix. We select the number of\nclusters with the silhouette validation technique [17], which uses the so-called silhouette value to\nassess the quality of the clustering and select the optimal number of clusters.\n\n3\n\n\f3.2 Cluster distribution matching\n\nThe clustering step of our method groups the training data into different bins based on their therapy\nsequences. However, the complete treatment history is not necessarily available for all patients in\nour clinical data set. Therefore, by restricting the prediction model for a target sample only to the\ndata from its corresponding cluster, the model might ignore relevant information from the other\nclusters. The approach we use to deal with this issue is inspired by the multi-task learning with\ndistribution matching method introduced in [2].\nIn our current problem setting, the goal is to train a prediction model fc : x \u2192 y for each cluster\nc of similar treatment sequences, where x denotes the input features and y denotes the label. The\nstraightforward approach to achieve this is to train a prediction model by using only the samples\nin cluster c. However, since the available treatment history for some samples might be incomplete,\ntotally excluding the samples from all other clusters ((cid:54)= c) ignores relevant information about the\nmodel fc. Furthermore, the cluster-speci\ufb01c tasks are related and the samples from the other clusters\n\u2013 especially those close to the cluster boundaries of cluster c \u2013 also carry valuable information\nfor the model fc. Therefore, we use a multi-task learning approach where a separate model is\ntrained for each cluster by not only using the training samples from the target cluster, but also the\navailable training samples from the remaining clusters with appropriate sample-speci\ufb01c weights.\nThese weights are computed by matching the distribution of all samples to the distribution of the\nsamples of the target cluster and they thereby re\ufb02ect the relevance of each sample for the target\ncluster. In this way, the model for the target cluster uses information from the input features to\nextract relevant knowledge from the other clusters.\nMore formally, let D = {(x1, y1, c1), . . . , (xm, ym, cm)} denote the training data, where ci denotes\nthe cluster associated with the training sample (xi, yi) in the history-based clustering. The training\nc p(c)p(x, y|c). The most accurate model for a\ngiven target cluster t minimizes the loss with respect to the conditional probability p(x, y|t) referred\nto as the target distribution. In [2] it is shown that:\n\ndata are governed by the joint training distribution(cid:80)\nE(x,y)\u223cp(x,y|t)[(cid:96)(ft(x))] = E(x,y)\u223c(cid:80)\nc p(c)p(x,y|c)[rt(x, y)(cid:96)(ft(x))],\n(cid:80)\np(x, y|t)\nc p(c)p(x, y|c)\n\n(cid:80)\nIn other words, by using sample-speci\ufb01c weights rt(x, y) that match the training distribution\nc p(c)p(x, y|c) to the target distribution p(x, y|t) we can minimize the expected loss with respect\nto the target distribution by minimizing the expected loss with respect to the training distribution.\nThe weighted training data are governed by the correct target distribution p(x, y|t) and the sample\nweights re\ufb02ect the relevance of each training sample for the target model. The weights are derived\nbased on information from the input features. If a sample was assigned to the wrong cluster due to\nthe incompleteness of the treatment history, by matching the training to the target distribution it can\nstill receive high sample weight for the model of its correct cluster.\nIn order to avoid the estimation of the high-dimensional densities p(x, y|t) and p(x, y|c) in Equa-\ntion 2, we follow the example of [3, 2] and compute the sample weights rt(x, y) using a discrimina-\ntive model for a conditional distribution with a single variable:\n\nrt(x, y) =\n\nwhere:\n\n(2)\n\n(1)\n\n.\n\nrt(x, y) =\n\n(3)\nwhere p(t|x, y) quanti\ufb01es the probability that a sample (x, y) randomly drawn from the training set\nD belongs to the target cluster t. p(t) is the prior probability which can easily be estimated from the\ntraining data.\nAs in [2], p(t|x, y) is modeled for all clusters jointly using a kernelized version of multi-class logistic\nregression with a feature mapping that separates the effective from the ineffective therapies:\n\np(t)\n\n,\n\n(4)\nwhere \u03b4 is the Kronecker delta (\u03b4(a, b) = 1, if a = b, and \u03b4(a, b) = 0, if a (cid:54)= b). In this way, we can\ntrain the cluster-discriminative models for the effective and the ineffective therapies independently,\n\n\u03b4(y,\u22121)x\n\n\u03a6(x, y) =\n\n,\n\np(t|x, y)\n\n(cid:20) \u03b4(y, +1)x\n\n(cid:21)\n\n4\n\n\fand thus, by proper time-oriented model selection address the increasing imbalance in their repre-\nsentation over time. Formally, the multi-class model is trained by maximizing the log-likelihood\nover the training data using a Gaussian prior on the model parameters:\n\narg max\n\nv\n\n(xi,yi,ci)\u2208Dc\n\nlog(p(ci|xi, yi, v)) + vT\u03a3\u22121v,\n\n(cid:88)\n\n(cid:88)\n\nwhere v are the model parameters (a concatenation of the cluster speci\ufb01c parameters vc), and \u03a3 is\nthe covariance matrix of the Gaussian prior.\n\n3.3 Sample-weighted logistic regression method\n\nAs described in the previous subsection, we use a multi-task distribution matching procedure to\nobtain sample-speci\ufb01c weights for each cluster, which re\ufb02ect the relevance of each sample for the\ncorresponding cluster. Then, a separate logistic regression model that uses all available training data\nwith the proper sample weights is trained for each cluster. More formally, let t denote the target\ncluster and let rt(x, y) denote the weight of the sample (x, y) for the cluster t. Then, the prediction\nmodel for the cluster t that minimizes the loss over the weighted training samples is given by:\n\narg min\nwt\n\n1\n|D|\n\n(xi,yi)\u2208D\n\nrt(xi, y)\u03b3 \u00b7 (cid:96)(ft(xi), yi) + \u03c3wT\n\nt wt,\n\n(5)\n\nwhere wt are the model parameters, \u03c3 is the regularization parameter, \u03b3 is a smoothing parameter\nfor the sample-speci\ufb01c weights and (cid:96)(f (x, wt), y) = ln(1 + exp(\u2212ywT\nt x)) is the loss of linear\nlogistic regression.\nAll in all, our method \ufb01rst clusters the training data based on their corresponding therapy sequences\nand then learns a separate model for each cluster by using relevant data from the remaining clusters.\nBy doing so it tackles the problems of the different treatment backgrounds of the samples and the\nuneven sample representation in the clinical data sets with respect to the level of therapy experience.\nSince the alignment kernel considers the most recent therapy and the drugs comprising this therapy\nare encoded as a part of the input feature space, our method also deals with the differing therapy\nabundances in the clinical data sets. Once we have the models for each cluster, we use them to\npredict the label of a given test sample x as follows: First of all, we use the therapy sequence of the\ntarget sample to calculate its dissimilarity to the therapy sequences of each of the cluster centers.\nThen, we assign the sample x to the cluster c with the closest cluster center. Finally, we use the\nlogistic regression model trained for cluster c to predict the label y for the target sample x.\n\n4 Experiments and results\n\n4.1 Data\n\nThe clinical data for our model are extracted from the EuResist [16] database that contains informa-\ntion on 93014 antiretroviral therapies administered to 18325 HIV (subtype B) patients from several\ncountries in the period from 1988 to 2008. The information employed by our model is extracted\nfrom these data: the viral sequence g assigned to each therapy sample is obtained shortly before\nthe respective therapy was started (up to 90 days before); the individual drugs of the currently ad-\nministered therapy z; all available (known) therapies administered to each patient h, r(z); and the\nresponse to a given therapy quanti\ufb01ed with a label y (success or failure) based on the virus load val-\nues (copies of viral RNA per ml blood plasma) measured during its course (for more details see [4]\nand the Supplementary material). Finally, our training set comprises 6537 labeled therapy samples\nfrom 690 distinct therapy combinations.\n\n4.2 Validation setting\n\nTime-oriented validation scenario. The trends of treating HIV patients change over time as a\nresult of the gathered practical experience with the drugs and the introduction of new antiretroviral\ndrugs. In order to account for this phenomenon we use the time-oriented validation scenario [4]\nwhich makes a time-oriented split when selecting the training and the test set. First, we order all\n\n5\n\n\favailable training samples by their corresponding therapy starting dates. We then make a time-\noriented split by selecting the most recent 20% of the samples as the test set and the rest as the\ntraining set. For the model selection we split the training set further in a similar manner. We take\nthe most recent 25% of the training set for selecting the best model parameters (see Supplementary\nmaterial) and refer to this set as tuning set. In this way, our models are trained on the data from the\nmore distant past, while their performance is measured on the data from the more recent past. This\nscenario is more realistic than other scenarios since it captures how a given model would perform on\nthe recent trends of combining the drugs. The details of the data sets resulting from this scenario are\ngiven in Table 1, where one can also observe the large gap between the abundances of the effective\nand ineffective therapies, especially for the most recent data.\n\nTable 1: Details on the data sets generated in the time-oriented validation scenario.\n\nData set\n\nSample count\nSuccess rate\n\ntraining\n3596\n69%\n\ntuning\n1634\n79%\n\ntest\n1307\n83%\n\nThe search for an effective HIV therapy is particulary challenging for patients in the mid to late\nstages of antiretroviral therapy when the number of therapy options is reduced and effective ther-\napies are increasingly hard to \ufb01nd because of the accumulated drug resistance mutations from all\nprevious therapies. The therapy samples gathered in the HIV clinical data sets are associated with\npatients whose treatment histories differ in length: while some patients receive their \ufb01rst antiretrovi-\nral treatment, others are heavily pretreated. These different sample groups, from treatment na\u00a8\u0131ve to\nheavily pretreated, are represented unevenly in the HIV clinical data with fewer samples associated\nto therapy-experienced patients (see Figure 1 (a) in the Supplementary material). In order to assess\nthe ability of a given target model to address this problem, we group the therapy samples in the test\nset into different bins based on the number of therapies administered prior to the therapy of interest\n\u2013 the current therapy (see Table 1 in the Supplementary material). Then, we assess the quality of a\ngiven target model by reporting its performance for each of the bins. In this way we can assess the\npredictive power of the models in dependence on the level of therapy experience.\nAnother important property of an HIV model is its ability to address the uneven representation of\nthe different therapies (see Figure 1 (b) in the Supplementary material). In order to achieve this we\ngroup the therapies in the test set based on the number of samples they have in the training set, and\nthen we measure the model performance on each of the groups. The details on the sample counts in\neach of the bins are given in Table 2 of the Supplementary material. In this manner we can evaluate\nthe performance of the models for the rare therapies. Due to the lack of data and practical experience\nfor the rare HIV combination therapies, predicting their ef\ufb01ciency is more challenging compared to\nestimating the ef\ufb01ciency of the frequent therapies.\n\nReference methods.\nIn our computational experiments we compare the results of our history dis-\ntribution matching approach, denoted as transfer history clustering validation scenario, to those of\nthree reference approaches, namely the one-for-all validation scenario, the history-clustering val-\nidation scenario, and the therapy-speci\ufb01c validation scenario. The one-for-all method mimics the\nmost common approaches in the \ufb01eld [16, 1, 19] that train a single model (here logistic regression)\non all available therapy samples in the data set. The information on the individual drugs comprising\nthe target (most recent) therapy and the drugs administered in all its available preceding therapies\nare encoded in a binary vector and supplied as input features. The history-clustering method imple-\nments a modi\ufb01ed version of Algorithm 1 that skips the distribution matching step. In other words, a\nseparate model is trained for each cluster by using only the data from the respective cluster. We intro-\nduce this approach to assess the importance of the distribution matching step. The therapy-speci\ufb01c\nscenario implements the drugs kernel therapy similarity model described in [4]. It represents the\napproaches that train a separate model for each combination therapy by using not only the sam-\nples from the target therapy but also the available samples from similar therapies with appropriate\nsample-importance weights.\n\nPerformance measures. The performance of all considered methods is assessed by reporting their\ncorresponding accuracies (ACC) and AUCs (Area Under the ROC Curve). The accuracy re\ufb02ects the\nability of the methods to make correct predictions, i.e., to discriminate between successful and fail-\ning HIV combination therapies. With the AUC we are able to assess the quality of the ranking based\n\n6\n\n\fon the probability of therapy success. For this reason, we carry out the model selection based on\nboth accuracy and AUC and then use accuracy or AUC, respectively, to assess the model perfor-\nmance. In order to compare the performance of two methods on a separate test set, the signi\ufb01cance\nof the difference of two accuracies as well as their standard deviations are calculated based on a\npaired t-test. The standard deviations of the AUC values and the signi\ufb01cance of the difference of\ntwo AUCs used for the pairwise method comparison are estimated as described in [6].\n\n4.3 Experimental results\n\nAccording to the results from the silhouette validation technique [17] displayed in Figure 2 in the\nSupplementary material, the \ufb01rst clustering step of Algorithm 1 divides our training data into two\nclusters \u2013 one comprises the samples with longer therapy sequences (with average treatment history\nlength of 5.507 therapies), and the other one those with shorter therapy sequences (with average\ntreatment history length of 0.308 therapies). Thus, the transfer history distribution matching method\ntrains two models, one for each cluster. The clustering results are depicted in Figure 3 in the Sup-\nplementary material. In what follows, we \ufb01rst present the results of the time-oriented validation\nscenario strati\ufb01ed for the length of treatment history, followed by the results strati\ufb01ed for the abun-\ndance of the different therapies. In both cases we report both the accuracies and the AUCs for all\nconsidered methods.\nThe computational results for the transfer history method and the three reference methods strati\ufb01ed\nfor the length of the therapy history are summarized in Figure 1, where (a) depicts the accuracies,\nand (b) depicts the AUCs. For samples with a small number (\u2264 5) of previously administered ther-\napies, i.e., with short treatment histories, all considered models have comparable accuracies. For\ntest samples from patients with longer (> 5) treatment histories, the transfer history clustering ap-\nproach achieves signi\ufb01cantly better accuracy (p-values \u2264 0.004) compared to those of the reference\nmethods. According to the paired difference test described in [6], the transfer history approach has\nsigni\ufb01cantly better AUC performance for test samples with longer (> 5) treatment histories com-\npared to the one-for-all (p-value = 0.043) and the history-clustering (p-value = 0.044) reference\nmethods. It also has better AUC performance compared to the one of the therapy-speci\ufb01c model, yet\nthis improvement is not signi\ufb01cant (p-value = 0.253). Furthermore, the transfer history approach\nachieves better AUCs for test samples with less than \ufb01ve previously administered therapies com-\npared to all reference methods. However, the improvement is only signi\ufb01cant for the one-for-all\nmethod (p-value = 0.007). The corresponding p-values for the history-clustering method and the\ntherapy-speci\ufb01c method are 0.080 and 0.178, respectively.\n\n(a)\n\n(b)\n\nFigure 1: Accuracy (a) and AUC (b) results of the different models obtained on the test set in the\ntime-oriented validation scenario. Error bars indicate the standard deviations of each model. The\ntest samples are grouped based on their corresponding number of known previous therapies.\n\nThe experimental results, strati\ufb01ed for the abundance of the therapies summarizing the accuracies\nand AUCs for all considered methods, are depicted in Figure 2 (a) and (b), respectively. As can\n\n7\n\n0\u22125>5transfer history clusteringhistory clusteringtherapy specificone\u2212for\u2212allNumber of preceding treatmentsACC0.50.60.70.80.90\u22125>5transfer history clusteringhistory clusteringtherapy specificone\u2212for\u2212allNumber of preceding treatmentsAUC0.450.500.550.600.650.700.750.80\fbe observed from Figure 2 (a), all considered methods have comparable accuracies for the test\ntherapies with more than seven samples. The transfer history method achieves signi\ufb01cantly better\naccuracy (p-values \u2264 0.0001) compared to all reference methods for the test therapies with few\n(0 \u2212 7) available training samples. Considering the AUC results in Figure 2 (b), the transfer history\napproach outperforms all the reference models for the rare test therapies (with 0 \u2212 7 training sam-\nples) with estimated p-values of 0.05 for the one-for-all, 0.042 for the therapy-speci\ufb01c and 0.1 for\nthe history-clustering model. The one-for-all and the therapy-speci\ufb01c models have slightly better\nAUC performance compared to the transfer history and the history-clustering approaches for test\ntherapies with 8 \u2212 30 available training samples. However, according to the paired difference test\ndescribed in [6], the improvements are not signi\ufb01cant with p-values larger than 0.141 for all pair-\nwise comparisons. Moreover, considering the test therapies with more than 30 training samples the\ntransfer history approach signi\ufb01cantly outperforms the one-for-all approach with estimated p-value\nof 0.037. It also has slightly better AUC performance than the history-clustering model and the\ntherapy-speci\ufb01c model, however these improvements are not signi\ufb01cant with estimated p-values of\n0.064 and 0.136, respectively.\n\n(a)\n\n(b)\n\nFigure 2: Accuracy (a) and AUC (b) results of the different models obtained on the test set in the\ntime-oriented validation scenario. Error bars indicate the standard deviations of each model. The\ntest samples are grouped based on the number of available training examples for their corresponding\ntherapy combinations.\n\n5 Conclusion\n\nThis paper presents an approach that simultaneously considers several problems affecting the avail-\nable HIV clinical data sets: the different treatment backgrounds of the samples, the uneven repre-\nsentation of the different levels of therapy experience, the missing treatment history information,\nthe uneven therapy representation and the unbalanced therapy outcome representation especially\npronounced in recently collected samples. The transfer history clustering model has its prime ad-\nvantage for samples stemming from patients with long treatment histories and for samples associated\nwith rare therapies. In particular, for these two groups of test samples it achieves signi\ufb01cantly better\naccuracy than all considered reference approaches. Moreover, the AUC performance of our method\nfor these test samples is also better than all reference methods and signi\ufb01cantly better compared\nto the one-for-all method. For the remaining test samples both the accuracy and the AUC perfor-\nmance of the transfer history method are at least as good as the corresponding performances of all\nconsidered reference methods.\n\nAcknowledgments\n\nWe gratefully acknowledge the EuResist EEIG for providing the clinical data. We thank Thomas Lengauer\nfor the helpful comments and for supporting this work. We also thank Levi Valgaerts for the constructive\nsuggestions. This work was funded by the Cluster of Excellence (Multimodal Computing and Interaction).\n\n8\n\n0\u221278\u221230>30transfer history clusteringhistory clusteringtherapy specificone\u2212for\u2212allNumber of available training samplesACC0.50.60.70.80.91.00\u221278\u221230>30transfer history clusteringhistory clusteringtherapy specificone\u2212for\u2212allNumber of available training samplesAUC0.50.60.70.8\fReferences\n[1] A. Altmann, M. D\u00a8aumer, N. Beerenwinkel, E. Peres, Y. Sch\u00a8ulter, A. B\u00a8uch, S. Rhee,\nA. S\u00a8onnerborg, WJ. Fessel, M. Shafer, WR. Zazzi, R. Kaiser, and T. Lengauer. Predicting\nresponse to combination antiretroviral therapy: retrospective validation of geno2pheno-THEO\non a large clinical database. Journal of Infectious Diseases, 199:999\u20131006, 2009.\n\n[2] S. Bickel, J. Bogojeska, T. Lengauer, and T. Scheffer. Multi-task learning for HIV therapy\n\nscreening. In Proceedings of the International Conference on Machine Learning, 2008.\n\n[3] S. Bickel, M. Br\u00a8uckner, and T. Scheffer. Discriminative learning for differing training and test\n\ndistributions. In Proceedings of the International Conference on Machine Learning, 2007.\n\n[4] J. Bogojeska, S. Bickel, A. Altmann, and T. Lengauer. Dealing with sparse data in predicting\n\noutcomes of HIV combination therapies. Bioinformatics, 26:2085\u20132092, 2010.\n\n[5] J. Bogojeska, D. St\u00a8ockel, M. Zazzi, R. Kaiser, F. Incardona, M. Rosen-Zvi, and T. Lengauer.\nHistory-alignment models for bias-aware prediction of virological response to HIV combina-\ntion therapy. submitted, 2011.\n\n[6] J. Hanley and B. McNeil. A method of comparing the areas under receiver operating charac-\n\nteristic curves derived from the same cases. Radiology, 148:839\u2013843, 1983.\n\n[7] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer,\n\n2009.\n\n[8] VA. Johnson, F. Brun-Vezinet, B. Clotet, HF. G\u00a8unthrad, DR. Kuritzkes, D. Pillay, JM.\nSchapiro, and DD. Richman. Update of the drug resistance mutations in HIV-1: December\n2008. Topics in HIV Medicine, 16:138\u2013145, 2008.\n\n[9] L. Kaufman and PJ. Rousseeuw. Finding Groups in Data. An introduction to cluster analysis.\n\nJohn Wiley and Sons, Inc., 1990.\n\n[10] B. Larder, D. Wang, A. Revell, J. Montaner, R. Harrigan, F. De Wolf, J. Lange, S. Wegner,\nL. Ruiz, MJ. Prez-Elas, S. Emery, J. Gatell, A. DArminio Monforte, C. Torti, M. Zazzi, and\nC. Lane. The development of arti\ufb01cial neural networks to predict virological response to com-\nbination HIV therapy. Antiviral Therapy, 12:15\u201324, 2007.\n\n[11] RH. Lathrop and MJ. Pazzani. Combinatorial optimization in rapidly mutating drug-resistant\n\nviruses. Journal of Combinatorial Optimization, 3:301\u2013320, 1999.\n\n[12] TF. Liu and Shafer RW. Web resources for HIV type 1 genotypic-resistance test interpretation.\n\nClinical Infectious Diseases, 42, 2006.\n\n[13] S. Needleman and C. Wunsch. A general method applicable to the search for similarities in the\n\namino acid sequence of two proteins. Journal of Molecular Biology, 48(3):443\u2013453, 1970.\n\n[14] DA. Ouattara. Mathematical analysis of the HIV-1 infection: parameter estimation, therapies\neffectiveness and therapeutical failures. In Engineering in Medicine and Biology Society, 2005.\n[15] M. Prosperi, A. Altmann, M. Rosen-Zvi, E. Aharoni, G. Borgulya, F. Bazso, A. S\u00a8onnerborg,\nE. Sch\u00a8ulter, D. Struck, G. Ulivi, A. Vandamme, J. Vercauteren, and M. Zazzi. Investigation of\nexpert rule bases, logistic regression, and non-linear machine learning techniques for predicting\nresponse to antiretroviral treatment. Antiviral Therapy, 14:433\u2013442, 2009.\n\n[16] M. Rosen-Zvi, A. Altmann, M. Prosperi, E. Aharoni, H. Neuvirth, A. S\u00a8onnerborg, E. Sch\u00a8ulter,\nD. Struck, Y. Peres, F. Incardona, R. Kaiser, M. Zazzi, and T. Lengauer. Selecting anti-HIV\ntherapies based on a variety of genomic and clinical factors. Proceedings of the ISMB, 2008.\n[17] P. J. Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster\n\nanalysis. Journal of Computational and Applied Mathematics, 20:53\u201365, 1987.\n\n[18] UNAIDS/WHO. Report on the global aids epidemic: 2010. 2010.\n[19] D. Wang, BA. Larder, A. Revell, R. Harrigan, and J. Montaner. A neural network model\nusing clinical cohort data accurately predicts virological response and identi\ufb01es regimens with\nincreased probability of success in treatment failures. Antiviral Therapy, 8:U99\u2013U99, 2003.\n\n9\n\n\f", "award": [], "sourceid": 316, "authors": [{"given_name": "Jasmina", "family_name": "Bogojeska", "institution": null}]}