{"title": "Multi-Instance Multi-Label Learning with Application to Scene Classification", "book": "Advances in Neural Information Processing Systems", "page_first": 1609, "page_last": 1616, "abstract": "", "full_text": "Multi-Instance Multi-Label Learning with\n\nApplication to Scene Classi\ufb01cation\n\nZhi-Hua Zhou\n\nMin-Ling Zhang\n\nNational Laboratory for Novel Software Technology\n\nNanjing University, Nanjing 210093, China\n\n{zhouzh,zhangml}@lamda.nju.edu.cn\n\nAbstract\n\nIn this paper, we formalize multi-instance multi-label learning, where each train-\ning example is associated with not only multiple instances but also multiple class\nlabels. Such a problem can occur in many real-world tasks, e.g. an image usually\ncontains multiple patches each of which can be described by a feature vector, and\nthe image can belong to multiple categories since its semantics can be recognized\nin different ways. We analyze the relationship between multi-instance multi-label\nlearning and the learning frameworks of traditional supervised learning, multi-\ninstance learning and multi-label learning. Then, we propose the MIMLBOOST\nand MIMLSVM algorithms which achieve good performance in an application to\nscene classi\ufb01cation.\n\n1 Introduction\n\nIn traditional supervised learning, an object is represented by an instance (or feature vector) and\nassociated with a class label. Formally, let X denote the instance space (or feature space) and Y\nthe set of class labels. Then the task is to learn a function f : X \u2192 Y from a given data set\n{(x1, y1), (x2, y2), \u00b7 \u00b7 \u00b7 , (xm, ym)}, where xi \u2208 X is an instance and yi \u2208 Y the known label of xi.\nAlthough the above formalization is prevailing and successful, there are many real-world problems\nwhich do not \ufb01t this framework well, where a real-world object may be associated with a number of\ninstances and a number of labels simultaneously. For example, an image usually contains multiple\npatches each can be represented by an instance, while in image classi\ufb01cation such an image can\nbelong to several classes simultaneously, e.g. an image can belong to mountains as well as Africa.\nAnother example is text categorization, where a document usually contains multiple sections each of\nwhich can be represented as an instance, and the document can be regarded as belonging to different\ncategories if it was viewed from different aspects, e.g. a document can be categorized as scienti\ufb01c\nnovel, Jules Verne\u2019s writing or even books on travelling. Web mining is a further example, where\neach of the links can be regarded as an instance while the web page itself can be recognized as news\npage, sports page, soccer page, etc.\n\nIn order to deal with such problems, in this paper we formalize multi-instance multi-label learning\n(abbreviated as MIML). In this learning framework, a training example is described by multiple\ninstances and associated with multiple class labels. Formally, let X denote the instance space and\nY the set of class labels. Then the task is to learn a function fM IM L : 2X \u2192 2Y from a given data\n(i)\n(i)\nni },\nset {(X1, Y1), (X2, Y2), \u00b7 \u00b7 \u00b7 , (Xm, Ym)}, where Xi \u2286 X is a set of instances {x\n2 , \u00b7 \u00b7 \u00b7 , x\n2 , \u00b7 \u00b7 \u00b7 , y(i)\nj \u2208 X (j = 1, 2, \u00b7 \u00b7 \u00b7 , ni), and Yi \u2286 Y is a set of labels {y(i)\nk \u2208 Y (k =\nx\n1, 2, \u00b7 \u00b7 \u00b7 , li). Here ni denotes the number of instances in Xi and li the number of labels in Yi.\nAfter analyzing the relationship between MIML and the frameworks of traditional supervised learn-\ning, multi-instance learning and multi-label learning, we propose two MIML algorithms, MIML-\n\n(i)\n1 , x\n}, y(i)\n\n1 , y(i)\n\n(i)\n\nli\n\n\fBOOST and MIMLSVM. Application to scene classi\ufb01cation shows that, solving some real-world\nproblems in the MIML framework can achieve better performance than solving them in existing\nframeworks such as multi-instance learning and multi-label learning.\n\n2 Multi-Instance Multi-Label Learning\n\nWe start by investigating the relationship between MIML and the frameworks of traditional super-\nvised learning, multi-instance learning and multi-label learning, and then we develop some solutions.\n\nMulti-instance learning [4] studies the problem where a real-world object described by a number of\ninstances is associated with one class label. Formally, the task is to learn a function fM IL : 2X \u2192\n{\u22121, +1} from a given data set {(X1, y1), (X2, y2), \u00b7 \u00b7 \u00b7 , (Xm, ym)}, where Xi \u2286 X is a set of\nj \u2208 X (j = 1, 2, \u00b7 \u00b7 \u00b7 , ni), yi \u2208 {\u22121, +1} is the label of Xi.1\ninstances {x\nMulti-instance learning techniques have been successfully applied to diverse applications including\nscene classi\ufb01cation [3, 7].\n\n(i)\n2 , \u00b7 \u00b7 \u00b7 , x\n\n(i)\nni }, x\n\n(i)\n1 , x\n\n(i)\n\nMulti-label learning [8] studies the problem where a real-world object described by one instance is\nassociated with a number of class labels. Formally, the task is to learn a function fM LL : X \u2192 2Y\nfrom a given data set {(x1, Y1), (x2, Y2), \u00b7 \u00b7 \u00b7 , (xm, Ym)}, where xi \u2208 X is an instance and Yi \u2286 Y\na set of labels {y(i)\nk \u2208 Y (k = 1, 2, \u00b7 \u00b7 \u00b7 , li).2 Multi-label learning techniques\nhave also been successfully applied to scene classi\ufb01cation [1].\n\n2 , \u00b7 \u00b7 \u00b7 , y(i)\n\n1 , y(i)\n\n}, y(i)\n\nli\n\nIn fact, the multi- learning frameworks result from the ambiguity in representing real-world objects.\nMulti-instance learning studies the ambiguity in the input space (or instance space), where an object\nhas many alternative input descriptions, i.e.\ninstances; multi-label learning studies the ambiguity\nin the output space (or label space), where an object has many alternative output descriptions, i.e.\nlabels; while MIML considers the ambiguity in the input and output spaces simultaneously. We\nillustrate the differences among these learning frameworks in Figure 1.\n\n(a) Traditional supervised learning\n\n(b) Multi-instance learning\n\n(c) Multi-label learning\n\n(d) Multi-instance multi-label learning\n\nFigure 1: Four different learning frameworks\n\nTraditional supervised learning is evidently a degenerated version of multi-instance learning as well\nas a degenerated version of multi-label learning, while traditional supervised learning, multi-instance\nlearning and multi-label learning are all degenerated versions of MIML. Thus, we can tackle MIML\nby identifying its equivalence in the traditional supervised learning framework, using multi-instance\nlearning or multi-label learning as the bridge.\n\n1According to notions used in multi-instance learning, (Xi, yi) is a labeled bag while Xi an unlabeled bag.\n2Although most works on multi-label learning assume that an instance can be associated with multiple valid\nlabels, there are also works assuming that only one of the labels associated with an instance is correct [6]. We\nadopt the former assumption in this paper.\n\n\fj=1 fSISL(x\n\nto learn a function fM IM L : 2X \u2192 2Y, into a multi-instance learning task, i.e.\n\nSolution 1: Using multi-instance learning as the bridge: We can transform a MIML learning task,\ni.e.\nto learn a\nfunction fM IL : 2X \u00d7 Y \u2192 {\u22121, +1}. For any y \u2208 Y, fM IL(Xi, y) = +1 if y \u2208 Yi and\n\u22121 otherwise. The proper labels for a new example X \u2217 can be determined according to Y \u2217 =\n{y| argy\u2208Y [fM IL(X \u2217, y) = +1]}. We can transform this multi-instance learning task further into\na traditional supervised learning task, i.e. to learn a function fSISL : X \u00d7 Y \u2192 {\u22121, +1}, under\n(i)\na constraint specifying how to derive fM IL(Xi, y) from fSISL(x\nj , y) (j = 1, \u00b7 \u00b7 \u00b7 , ni). For any\n(i)\ny \u2208 Y, fSISL(x\nj , y) = +1 if y \u2208 Yi and \u22121 otherwise. Here the constraint can be fM IL(Xi, y) =\n(i)\nj , y)] which has been used in transforming multi-instance learning tasks into\ntraditional supervised learning tasks [9].3 Note that other kinds of constraint can also be used here.\nSolution 2: Using multi-label learning as the bridge: We can also transform a MIML learning task,\ni.e. to learn a function fM IM L : 2X \u2192 2Y, into a multi-label learning task, i.e. to learn a function\nfM LL : Z \u2192 2Y. For any zi \u2208 Z, fM LL(zi) = fM IM L(Xi) if zi = \u03c6(Xi), \u03c6 : 2X \u2192 Z.\nThe proper labels for a new example X \u2217 can be determined according to Y \u2217 = fM LL(\u03c6(X \u2217)). We\ncan transform this multi-label learning task further into a traditional supervised learning task, i.e. to\nlearn a function fSISL : Z \u00d7 Y \u2192 {\u22121, +1}. For any y \u2208 Y, fSISL(zi, y) = +1 if y \u2208 Yi and\n\u22121 otherwise. That is, fM LL(zi) = {y| argy\u2208Y [fSISL(zi, y) = +1]}. Here the mapping \u03c6 can be\nimplemented with constructive clustering which has been used in transforming multi-instance bags\ninto traditional single-instances [11]. Note that other kinds of mapping can also be used here.\n\nsign[Pni\n\n3 Algorithms\n\nIn this section, we propose two algorithms for solving MIML problems: MIMLBOOST works along\nthe \ufb01rst solution described in Section 2, while MIMLSVM works along the second solution.\n\n3.1 MIMLBOOST\n\n(u)\nnu , yv)}, and \u03a8(Xu, yv) \u2208 {+1, \u22121} is the label of this bag.\n\nGiven any set \u2126, let |\u2126| denote its size, i.e. the number of elements in \u2126; given any predicate \u03c0, let\n[[\u03c0]] be 1 if \u03c0 holds and 0 otherwise; given (Xi, Yi), for any y \u2208 Y, let \u03a8(Xi, y) = +1 if y \u2208 Yi\nand \u22121 otherwise, where \u03a8 is a function \u03a8 : 2X \u00d7 Y \u2192 {\u22121, +1}. The MIMLBOOST algorithm is\npresented in Table 1.\nIn the \ufb01rst step, each MIML example (Xu, Yu) (u = 1, 2, \u00b7 \u00b7 \u00b7 , m) is transformed into a set of |Y|\nnumber of multi-instance bags, i.e. {[(Xu, y1), \u03a8(Xu, y1)], [(Xu, y2), \u03a8(Xu, y2)], \u00b7 \u00b7 \u00b7 , [(Xu, y|Y|),\n\u03a8(Xu, y|Y|)]}. Note that [(Xu, yv), \u03a8(Xu, yv)] (v = 1, 2, \u00b7 \u00b7 \u00b7 , |Y|) is a labeled multi-instance\n(u)\nbag where (Xu, yv) is a bag containing nu number of instances, i.e. {(x\n2 , yv), \u00b7 \u00b7 \u00b7 ,\n(x\nThus, the original MIML data set is transformed into a multi-instance data set containing m \u00d7 |Y|\nnumber of bags, i.e. {[(X1, y1), \u03a8(X1, y1)], \u00b7 \u00b7 \u00b7 , [(X1, y|Y|), \u03a8(X1, y|Y|)], [(X2, y1), \u03a8(X2, y1)],\n\u00b7 \u00b7 \u00b7 , [(Xm, y|Y|), \u03a8(Xm, y|Y|)]}. Let [(X (i), y(i)), \u03a8(X (i), y(i))] denote the ith of these m \u00d7 |Y|\nnumber of bags, that is, (X (1), y(1)) denotes (X1, y1), \u00b7 \u00b7 \u00b7 , (X (|Y|), y(|Y|)) denotes (X1, y|Y|), \u00b7 \u00b7 \u00b7 ,\n(X (m\u00d7|Y|), y(m\u00d7|Y|)) denotes (Xm, y|Y|), where (X (i), y(i)) contains ni number of instances, i.e.\n{(x\nThen, from the data set a multi-instance learning function fM IL can be learned, which can accom-\nplish the desired MIML function because fM IM L(X \u2217) = {y| argy\u2208Y (sign[fM IL (X \u2217, y)] = +1)}.\nHere we use MIBOOSTING [9] to implement fM IL.\nFor convenience, let (B, g) denote the bag [(X, y), \u03a8(X, y)]. Then, here the goal is to learn a func-\ntion F(B) minimizing the bag-level exponential loss EBEG|B[exp(\u2212gF(B))], which ultimately\n\n(i)\n2 , y(i)), \u00b7 \u00b7 \u00b7 , (x\n\n(u)\n1 , yv), (x\n\n(i)\n1 , y(i)), (x\n\n(i)\nni , y(i))}.\n\n3This constraint assumes that all instances contribute equally and independently to a bag\u2019s label, which is\ndifferent from the standard multi-instance assumption that there is one \u2018key\u2019 instance in a bag that triggers\nwhether the bag\u2019s class label will be positive or negative. Nevertheless, it has been shown that this assumption\nis reasonable and effective [9]. Note that the standard multi-instance assumption does not always hold, e.g. the\nlabel Africa of an image is usually triggered by several patches jointly instead of by only one patch.\n\n\fTable 1: The MIMLBOOST algorithm\n\n1\n\n2\n\nTransform each MIML example (Xu, Yu) (u = 1, 2, \u00b7 \u00b7 \u00b7 , m) into |Y| number of multi-\ninstance bags {[(Xu, y1), \u03a8(Xu, y1)], \u00b7 \u00b7 \u00b7 , [(Xu, y|Y|), \u03a8(Xu, y|Y|)]}. Thus, the original\ndata set is transformed into a multi-instance data set containing m \u00d7 |Y| number of\nmulti-instance bags, denoted by {[(X (i), y(i)), \u03a8(X (i), y(i))]} (i = 1, 2, \u00b7 \u00b7 \u00b7 , m \u00d7 |Y|).\n\nInitialize weight of each bag to W (i) = 1\n\nm\u00d7|Y| (i = 1, 2, \u00b7 \u00b7 \u00b7 , m \u00d7 |Y|).\n\n3 Repeat for t = 1, 2, \u00b7 \u00b7 \u00b7 , T iterations:\n\nj = W (i)/ni (i = 1, 2, \u00b7 \u00b7 \u00b7 , m \u00d7 |Y|), assign the bag\u2019s label \u03a8(X (i), y(i))\n\nSet W (i)\nto each of its instances (x(i)\npredictor ht[(x(i)\nFor the ith bag, compute the error rate e(i) \u2208 [0, 1] by counting the number of\n\nj , y(i)) (j = 1, 2, \u00b7 \u00b7 \u00b7 , ni), and build an instance-level\n\nj , y(i))] \u2208 {\u22121, +1}.\n\n3a\n\n3b\n\n3c\n\n3d\n3e\n3f\n\n[[ht[(x\n\n(i)\nj\n\n,y(i))]6=\u03a8(X(i),y(i))]]\n\nni\n\n.\n\nj=1\n\nIf e(i) < 0.5 for all i \u2208 {1, 2, \u00b7 \u00b7 \u00b7 , m \u00d7 |Y|}, go to Step 4.\n\nmisclassi\ufb01ed instances within the bag, i.e. e(i) = Pni\nCompute ct = arg minctPm\u00d7|Y|\nthat 0 \u2264 W (i) \u2264 1 andPm\u00d7|Y|\n\ni=1 W (i) = 1.\n\ni=1 W (i) exp[(2e(i) \u2212 1)ct].\n\nIf ct \u2264 0, go to Step 4.\nSet W (i) = W (i) exp[(2e(i) \u2212 1)ct] (i = 1, 2, \u00b7 \u00b7 \u00b7 , m \u00d7 |Y|) and re-normalize such\n\n4 Return Y \u2217 = {y| argy\u2208Y sign\u00b3PjPt ctht[(x\u2217\n\nj , y)]\u00b4 = +1} (x\u2217\n\nj is X \u2217\u2019s jth instance).\n\n2 log P r(g=1|B)\n\nnB Pj h(bj) can be derived, where h(bj) \u2208 {\u22121, +1} is the prediction of the\n\nestimates the bag-level log-odds function 1\nP r(g=\u22121|B) . In each boosting round, the aim is to\nexpand F(B) into F(B) + cf (B), i.e. adding a new weak classi\ufb01er, so that the exponential loss\nis minimized. Assuming all instances in a bag contribute equally and independently to the bag\u2019s\nlabel, f (B) = 1\ninstance-level classi\ufb01er h(\u00b7) for the jth instance in bag B, and nB is the number of instances in B.\nIt has been shown by [9] that the best f (B) to be added can be achieved by seeking h(\u00b7) which\n(i)\nj )], given the bag-level weights W = exp(\u2212gF(B)). By\nassigning each instance the label of its bag and the corresponding weight W (i)/ni, h(\u00b7) can be\nlearned by minimizing the weighted instance-level classi\ufb01cation error. This actually corresponds to\nthe Step 3a of MIMLBOOST. When f (B) is found, the best multiplier c > 0 can be got by directly\noptimizing the exponential loss:\n\nmaximizes PiPni\n\nW (i)g(i)h(b\n\nj=1[ 1\nni\n\nEBEG|B[exp(\u2212gF(B) + c(\u2212gf (B)))] = Xi\n= Xi\n\nW (i) exp[c\u00c3\u2212\n\ng(i)Pj h(b\n\nni\n\n(i)\nj )\n\n!]\n\nW (i) exp[(2e(i) \u2212 1)c]\n\n(i)\nwhere e(i) = 1\nj ) 6= g(i))]] (computed in Step 3b). Minimization of this expectation ac-\ntually corresponds to Step 3d, where numeric optimization techniques such as quasi-Newton method\ncan be used. Finally, the bag-level weights are updated in Step 3f according to the additive structure\nof F(B).\n\nni Pj[[(h(b\n\n3.2 MIMLSVM\n\nGiven (Xi, Yi) and zi = \u03c6(Xi) where \u03c6 : 2X \u2192 Z, for any y \u2208 Y, let \u03a6(zi, y) = +1 if y \u2208 Yi\nand \u22121 otherwise, where \u03a6 is a function \u03a6 : Z \u00d7 Y \u2192 {\u22121, +1}. The MIMLSVM algorithm is\npresented in Table 2.\nIn the \ufb01rst step, the Xu of each MIML example (Xu, Yu) (u = 1, 2, \u00b7 \u00b7 \u00b7 , m) is collected and put\ninto a data set \u0393. Then, in the second step, k-medoids clustering is performed on \u0393. Since each\n\n\fTable 2: The MIMLSVM algorithm\n\n1\n\nFor MIML examples (Xu, Yu) (u = 1, 2, \u00b7 \u00b7 \u00b7 , m), \u0393 = {Xu|u = 1, 2, \u00b7 \u00b7 \u00b7 , m}.\n\n2 Randomly select k elements from \u0393 to initialize the medoids Mt (t = 1, 2, \u00b7 \u00b7 \u00b7 , k),\n\nrepeat until all Mt do not change:\n2a\n2b\n\n\u0393t = {Mt} (t = 1, 2, \u00b7 \u00b7 \u00b7 , k).\nRepeat for each Xu \u2208 (\u0393 \u2212 {Mt|t = 1, 2, \u00b7 \u00b7 \u00b7 , k}):\n\nindex = arg mint\u2208{1,\u00b7\u00b7\u00b7,k} dH (Xu, Mt), \u0393index = \u0393index \u222a {Xu}.\n\n2c Mt = arg min\n\ndH (A, B) (t = 1, 2, \u00b7 \u00b7 \u00b7 , k).\n\nA\u2208\u0393t PB\u2208\u0393t\n\n3\n\n4\n\nTransform (Xu, Yu) into a multi-label example (zu, Yu) (u = 1, 2, \u00b7 \u00b7 \u00b7 , m), where\nzu = (zu1, zu2, \u00b7 \u00b7 \u00b7 , zuk) = (dH (Xu, M1), dH (Xu, M2), \u00b7 \u00b7 \u00b7 , dH (Xu, Mk)).\n\nFor each y \u2208 Y, derive a data set Dy = {(zu, \u03a6 (zu, y)) |u = 1, 2, \u00b7 \u00b7 \u00b7 , m}, and then\ntrain an SVM hy = SV M T rain(Dy).\n\n5 Return Y \u2217 = {arg max\ny\u2208Y\n\ndH (X \u2217, M2), \u00b7 \u00b7 \u00b7 , dH (X \u2217, Mk)).\n\nhy(z\u2217)} \u222a {y|hy(z\u2217) \u2265 0, y \u2208 Y}, where z\u2217 = (dH (X \u2217, M1),\n\ndata item in \u0393, i.e. Xu, is an unlabeled multi-instance bag instead of a single instance, we employ\nHausdorff distance [5] to measure the distance. In detail, given two bags A = {a1, a2, \u00b7 \u00b7 \u00b7 , anA }\nand B = {b1, b2, \u00b7 \u00b7 \u00b7 , bnB }, the Hausdorff distance between A and B is de\ufb01ned as\n\ndH (A, B) = max{max\na\u2208A\n\nmin\nb\u2208B\n\nka \u2212 bk, max\nb\u2208B\n\nmin\na\u2208A\n\nkb \u2212 ak}\n\nwhere ka \u2212 bk measures the distance between the instances a and b, which takes the form of\nEuclidean distance here.\nAfter the clustering process, we divide the data set \u0393 into k partitions whose medoids are Mt (t =\n1, 2, \u00b7 \u00b7 \u00b7 , k), respectively. With the help of these medoids, we transform the original multi-instance\nexample Xu into a k-dimensional numerical vector zu, where the ith (i = 1, 2, \u00b7 \u00b7 \u00b7 , k) component\nof zu is the distance between Xu and Mi, that is, dH (Xu, Mi). In other words, zui encodes some\nstructure information of the data, that is, the relationship between Xu and the ith partition of \u0393.\nThis process reassembles the constructive clustering process used by [11] in transforming multi-\ninstance examples into single-instance examples except that in [11] the clustering is executed at the\ninstance level while here we execute it at the bag level. Thus, the original MIML examples (Xu, Yu)\n(u = 1, 2, \u00b7 \u00b7 \u00b7 , m) have been transformed into multi-label examples (zu, Yu) (u = 1, 2, \u00b7 \u00b7 \u00b7 , m),\nwhich corresponds to the Step 3 of MIMLSVM. Note that this transformation may lose information,\nnevertheless the performance of MIMLSVM is still good. This suggests that MIML is a powerful\nframework which has captured more original information than other learning frameworks.\nThen, from the data set a multi-label learning function fM LL can be learned, which can accom-\nplish the desired MIML function because fM IM L(X \u2217) = fM LL(z\u2217). Here we use MLSVM [1] to\nimplement fM LL.\nConcretely, MLSVM decomposes the multi-label learning problem into multiple independent binary\nclassi\ufb01cation problems (one per class), where each example associated with the label set Y is re-\ngarded as a positive example when building SVM for any class y \u2208 Y , while regarded as a negative\nexample when building SVM for any class y /\u2208 Y , as shown in the Step 4 of MIMLSVM. In making\npredictions, the T-Criterion [1] is used, which actually corresponds to the Step 5 of the MIMLSVM\nalgorithm. That is, the test example is labeled by all the class labels with positive SVM scores, ex-\ncept that when all the SVM scores are negative, the test example is labeled by the class label which\nis with the top (least negative) score.\n\n4 Application to Scene Classi\ufb01cation\n\nThe data set consists of 2,000 natural scene images belonging to the classes desert, mountains, sea,\nsunset, and trees, as shown in Table 3. Some images were from the COREL image collection while\nsome were collected from the Internet. Over 22% images belong to multiple classes simultaneously.\n\n\fTable 3: The image data set (d: desert, m: mountains, s: sea, su: sunset, t: trees)\n\nlabel\nd\nm\ns\nsu\nt\n\n# images\n\n340\n268\n341\n216\n378\n\nlabel\nd + m\nd + s\nd + su\nd + t\nm + s\n\n# images\n\n19\n5\n21\n20\n38\n\nlabel\nm + su\nm + t\ns + su\ns + t\nsu + t\n\n# images\n\n19\n106\n172\n14\n28\n\nlabel\nd + m + su\nd + su + t\nm + s + t\nm + su + t\ns + su + t\n\n# images\n\n1\n3\n6\n1\n4\n\n4.1 Comparison with Multi-Label Learning Algorithms\n\nSince the scene classi\ufb01cation task has been successfully tackled by multi-label learning algo-\nrithms [1], we compare the MIML algorithms with established multi-label learning algorithms AD-\nABOOST.MH [8] and MLSVM [1]. The former is the core of a successful multi-label learning system\nBOOSTEXTER [8], while the latter has achieved excellent performance in scene classi\ufb01cation [1].\n\nFor MIMLBOOST and MIMLSVM, each image is represented as a bag of nine instances generated\nby the SBN method [7]. Here each instance actually corresponds to an image patch, and better\nperformance can be expected with better image patch generation method. For ADABOOST.MH and\nMLSVM, each image is represented as a feature vector obtained by concatenating the instances of\nMIMLBOOST or MIMLSVM. Gaussian kernel LIBSVM [2] is used to implement MLSVM, where\nthe cross-training strategy is used to build the classi\ufb01ers while the T-Criterion is used to label the\nimages [1]. The MIMLSVM algorithm is also realized with a Gaussian kernel, while the parameter\nk is set to be 20% of the number of training images.4 Note that the instance-level predictor used in\nStep 3a of MIMLBOOST is also a Gaussian kernel LIBSVM (with default parameters).\n\nSince ADABOOST.MH and MLSVM make multi-label predictions, here the performance of the\ncompared algorithms are evaluated according to \ufb01ve multi-label evaluation metrics, as shown in\nTables 4 to 7, where \u2018\u2193\u2019 indicates \u2018the smaller the better\u2019 while \u2018\u2191\u2019 indicates \u2018the bigger the better\u2019.\nDetails of these evaluation metrics can be found in [8]. Tenfold cross-validation is performed and\n\u2018mean \u00b1 std\u2019 is presented in the tables, where the best performance achieved by each algorithm\nis bolded. Note that since in each boosting round MIMLBOOST performs more operations than\nADABOOST.MH does, for fair comparison, the boosting rounds used by ADABOOST.MH are set to\nten times of that used by MIMLBOOST such that the time cost of them are comparable.\n\nTable 4: The performance of MIMLBOOST with different boosting rounds\n\nboosting\nrounds\n\n5\n10\n15\n20\n25\n\nevaluation metric\n\nhamm. loss \u2193\n\n.202\u00b1.011\n.197\u00b1.010\n.195\u00b1.009\n.193\u00b1.008\n.189\u00b1.009\n\none-error \u2193\n.373\u00b1.045\n.362\u00b1.040\n.361\u00b1.034\n.355\u00b1.037\n.351\u00b1.039\n\ncoverage \u2193\n1.026\u00b1.093\n1.013\u00b1.109\n1.004\u00b1.101\n.996\u00b1.102\n.989\u00b1.103\n\nrank. loss \u2193\n.208\u00b1.028\n.191\u00b1.027\n.186\u00b1.025\n.183\u00b1.025\n.181\u00b1.026\n\nave. prec. \u2191\n.764\u00b1.027\n.770\u00b1.026\n.772\u00b1.023\n.775\u00b1.024\n.777\u00b1.025\n\nTable 5: The performance of ADABOOST.MH with different boosting rounds\n\nboosting\nrounds\n50\n100\n150\n200\n250\n\nevaluation metric\n\nhamm. loss \u2193\n\n.228\u00b1.013\n.234\u00b1.019\n.233\u00b1.020\n.232\u00b1.012\n.231\u00b1.018\n\none-error \u2193\n.473\u00b1.031\n.465\u00b1.042\n.465\u00b1.053\n.453\u00b1.031\n.451\u00b1.046\n\ncoverage \u2193\n1.299\u00b1.099\n1.292\u00b1.138\n1.279\u00b1.140\n1.269\u00b1.107\n1.258\u00b1.137\n\nrank. loss \u2193\n.263\u00b1.022\n.259\u00b1.030\n.255\u00b1.032\n.253\u00b1.022\n.250\u00b1.031\n\nave. prec. \u2191\n.695\u00b1.022\n.698\u00b1.033\n.700\u00b1.033\n.706\u00b1.020\n.708\u00b1.030\n\n4In preliminary experiments, several percentage values have been tested ranging from 20% to 100% with an\ninterval of 20%. The results show that these values do not signi\ufb01cantly affect the performance of MIMLSVM.\n\n\fTable 6: The performance of MIMLSVM with different \u03b3 used in Gaussian kernel\n\nGaussian\nkernel\n\u03b3 = .1\n\u03b3 = .2\n\u03b3 = .3\n\u03b3 = .4\n\u03b3 = .5\n\nevaluation metric\n\nhamm. loss \u2193\n\n.181\u00b1.017\n.180\u00b1.017\n.188\u00b1.016\n.193\u00b1.014\n.196\u00b1.014\n\none-error \u2193\n.332\u00b1.036\n.327\u00b1.033\n.344\u00b1.032\n.358\u00b1.030\n.370\u00b1.033\n\ncoverage \u2193\n1.024\u00b1.089\n1.022\u00b1.085\n1.065\u00b1.094\n1.080\u00b1.099\n1.109\u00b1.101\n\nrank. loss \u2193\n.187\u00b1.018\n.187\u00b1.018\n.196\u00b1.020\n.202\u00b1.022\n.209\u00b1.023\n\nave. prec. \u2191\n.780\u00b1.021\n.783\u00b1.020\n.772\u00b1.020\n.764\u00b1.021\n.757\u00b1.023\n\nTable 7: The performance of MLSVM with different \u03b3 used in Gaussian kernel\n\nGaussian\nkernel\n\n\u03b3 = 1\n\u03b3 = 2\n\u03b3 = 3\n\u03b3 = 4\n\u03b3 = 5\n\nevaluation metric\n\nhamm. loss \u2193\n\n.200\u00b1.014\n.196\u00b1.013\n.195\u00b1.015\n.196\u00b1.016\n.202\u00b1.015\n\none-error \u2193\n.379\u00b1.032\n.368\u00b1.032\n.370\u00b1.034\n.372\u00b1.034\n.388\u00b1.032\n\ncoverage \u2193\n1.125\u00b1.115\n1.115\u00b1.122\n1.129\u00b1.113\n1.151\u00b1.122\n1.181\u00b1.128\n\nrank. loss \u2193\n.214\u00b1.020\n.211\u00b1.023\n.214\u00b1.022\n.220\u00b1.024\n.229\u00b1.026\n\nave. prec. \u2191\n.751\u00b1.022\n.756\u00b1.022\n.754\u00b1.023\n.751\u00b1.023\n.741\u00b1.023\n\nComparing Tables 4 to 7 we can \ufb01nd that both MIMLBOOST and MIMLSVM are apparently better\nthan ADABOOST.MH and MLSVM. Impressively, pair-wise t-tests with .05 signi\ufb01cance level reveal\nthat the worst performance of MIMLBOOST (with 5 boosting rounds) is even signi\ufb01cantly better than\nthe best performance of ADABOOST.MH (with 250 boosting rounds) on all the evaluation metrics,\nand is signi\ufb01cantly better than the best performance of MLSVM (with \u03b3 = 2) in terms of coverage\nwhile comparable on the remaining metrics; the worse performance of MIMLSVM (with \u03b3 = .5)\nis even comparable to the best performance of MLSVM and is signi\ufb01cantly better than the best\nperformance of ADABOOST.MH on all the evaluation metrics. These observations con\ufb01rm that for-\nmalizing the scene classi\ufb01cation task as a MIML problem to solve by MIMLBOOST or MIMLSVM is\nbetter than formalizing it as a multi-label learning problem to solve by ADABOOST.MH or MLSVM.\n\n4.2 Comparison with Multi-Instance Learning Algorithms\n\nSince the scene classi\ufb01cation task has been successfully tackled by multi-instance learning algo-\nrithms [7], we compare the MIML algorithms with established multi-instance learning algorithms\nDIVERSE DENSITY [7] and EM-DD [10]. The former is one of the most in\ufb02uential multi-instance\nlearning algorithm and has achieved excellent performance in scene classi\ufb01cation [7], while the\nlatter has achieved excellent performance on multi-instance benchmark tests [10].\n\nHere all the compared algorithms use the same input representation. That is, each image is repre-\nsented as a bag of nine instances generated by the SBN method [7]. The parameters of DIVERSE\nDENSITY and EM-DD are set according to the settings that resulted in the best performance [7, 10].\nThe MIMLBOOST and MIMLSVM algorithms are implemented as described in Section 4.1, with 25\nboosting rounds for MIMLBOOST while \u03b3 = .2 for MIMLSVM.\nSince DIVERSE DENSITY and EM-DD make single-label predictions, here the performance of the\ncompared algorithms are evaluated according to predictive accuracy, i.e. classi\ufb01cation accuracy\non test set. Note that for MIMLBOOST and MIMLSVM, the top ranked class is regarded as the\nsingle-label prediction. Tenfold cross-validation is performed and \u2018mean \u00b1 std\u2019 is presented in\nTable 8, where the best performance on each image class is bolded. Note that besides the predictive\naccuracies on each class, the overall accuracy is also presented, which is denoted by \u2018overall\u2019.\n\nWe can \ufb01nd from Table 8 that MIMLBOOST achieves the best performance on image classes desert\nand trees while MIMLSVM achieves the best performance on the remaining image classes. Overall,\nMIMLSVM achieves the best performance. Pair-wise t-tests with .05 signi\ufb01cance level reveal that\nthe overall performance of MIMLSVM is comparable to that of MIMLBOOST, both are signi\ufb01cantly\nbetter than that of DIVERSE DENSITY and EM-DD. These observations con\ufb01rm that formalizing the\nscene classi\ufb01cation task as a MIML problem to solve by MIMLBOOST or MIMLSVM is better than\nformalizing it as a multi-instance learning problem to solve by DIVERSE DENSITY or EM-DD.\n\n\fTable 8: Compare predictive accuracy of MIMLBOOST, MIMLSVM, DIVERSE DENSITY and EM-DD\n\nImage\nclass\ndesert\nmountains\nsea\nsunset\ntrees\noverall\n\nCompared algorithms\n\nMIMLBOOST MIMLSVM DIVERSE DENSITY\n\nEM-DD\n\n.869\u00b1.014\n.791\u00b1.024\n.729\u00b1.026\n.864\u00b1.033\n.801\u00b1.015\n.811\u00b1.022\n\n.868\u00b1.026\n.820\u00b1.022\n.730\u00b1.030\n.883\u00b1.023\n.798\u00b1.017\n.820\u00b1.024\n\n.768\u00b1.037\n.721\u00b1.030\n.587\u00b1.038\n.841\u00b1.036\n.781\u00b1.028\n.739\u00b1.034\n\n.751\u00b1.047\n.717\u00b1.036\n.639\u00b1.063\n.815\u00b1.063\n.632\u00b1.060\n.711\u00b1.054\n\n5 Conclusion\n\nIn this paper, we formalize multi-instance multi-label learning where an example is associated with\nmultiple instances and multiple labels simultaneously. Although there were some works investi-\ngating the ambiguity of alternative input descriptions or alternative output descriptions associated\nwith an object, this is the \ufb01rst work studying both these ambiguities simultaneously. We show that\nan MIML problem can be solved by identifying its equivalence in the traditional supervised learn-\ning framework, using multi-instance learning or multi-label learning as the bridge. The proposed\nalgorithms, MIMLBOOST and MIMLSVM, have achieved good performance in the application to\nscene classi\ufb01cation. An interesting future issue is to develop MIML versions of other popular ma-\nchine learning algorithms. Moreover, it remains an open problem that whether MIML can be tackled\ndirectly, possibly by exploiting the connections between the instances and the labels. It is also in-\nteresting to discover the relationship between the instances and labels. By unravelling the mixed\nconnections, maybe we can get deeper understanding of ambiguity.\n\nAcknowledgments\n\nThis work was supported by the National Science Foundation of China (60325207, 60473046).\n\nReferences\n[1] M. R. Boutell, J. Luo, X. Shen, and C. M. Brown. Learning multi-label scene classi\ufb01cation. Pattern\n\nRecognition, 37(9):1757\u20131771, 2004.\n\n[2] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. Technical report, Department\n\nof Computer Science and Information Engineering, National Taiwan University, Taipei, 2001.\n\n[3] Y. Chen and J. Z. Wang. Image categorization by learning and reasoning with regions. Journal of Machine\n\nLearning Research, 5:913\u2013939, 2004.\n\n[4] T. G. Dietterich, R. H. Lathrop, and T. Lozano-P\u00b4erez. Solving the multiple-instance problem with axis-\n\nparallel rectangles. Arti\ufb01cial Intelligence, 89(1-2):31\u201371, 1997.\n\n[5] G. A. Edgar. Measure, Topology, and Fractal Geometry. Springer, Berlin, 1990.\n[6] R. Jin and Z. Ghahramani. Learning with multiple labels. In S. Becker, S. Thrun, and K. Obermayer,\neditors, Advances in Neural Information Processing Systems 15, pages 897\u2013904. MIT Press, Cambridge,\nMA, 2003.\n\n[7] O. Maron and A. L. Ratan. Multiple-instance learning for natural scene classi\ufb01cation. In Proceedings of\n\nthe 15th International Conference on Machine Learning, pages 341\u2013349, Madison, MI, 1998.\n\n[8] R. E. Schapire and Y. Singer. BoosTexter: A boosting-based system for text categorization. Machine\n\nLearning, 39(2-3):135\u2013168, 2000.\n\n[9] X. Xu and E. Frank. Logistic regression and boosting for labeled bags of instances. In H. Dai, R. Srikant,\nand C. Zhang, editors, Lecture Notes in Arti\ufb01cial Intelligence 3056, pages 272\u2013281. Springer, Berlin,\n2004.\n\n[10] Q. Zhang and S. A. Goldman. EM-DD: An improved multi-instance learning technique. In T. G. Diet-\nterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14,\npages 1073\u20131080. MIT Press, Cambridge, MA, 2002.\n\n[11] Z.-H. Zhou and M.-L. Zhang. Solving multi-instance problems with classi\ufb01er ensemble based on con-\n\nstructive clustering. Knowledge and Information Systems, in press.\n\n\f", "award": [], "sourceid": 3047, "authors": [{"given_name": "Zhi-Li", "family_name": "Zhang", "institution": ""}, {"given_name": "Min-ling", "family_name": "Zhang", "institution": null}]}