{"title": "MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features", "book": "Advances in Neural Information Processing Systems", "page_first": 841, "page_last": 848, "abstract": "We present a new co-clustering problem of images and visual features. The problem involves a set of non-object images in addition to a set of object images and features to be co-clustered. Co-clustering is performed in a way of maximising discrimination of object images from non-object images, thus emphasizing discriminative features. This provides a way of obtaining perceptual joint-clusters of object images and features. We tackle the problem by simultaneously boosting multiple strong classifiers which compete for images by their expertise. Each boosting classifier is an aggregation of weak-learners, i.e. simple visual features. The obtained classifiers are useful for multi-category and multi-view object detection tasks. Experiments on a set of pedestrian images and a face data set demonstrate that the method yields intuitive image clusters with associated features and is much superior to conventional boosting classifiers in object detection tasks.", "full_text": "MCBoost: Multiple Classi\ufb01er Boosting for Perceptual\n\nCo-clustering of Images and Visual Features\n\nTae-Kyun Kim\u2217\n\nSidney Sussex College\nUniversity of Cambridge\nCambridge CB2 3HU, UK\n\ntkk22@cam.ac.uk\n\nRoberto Cipolla\n\nDepartment of Engineering\nUniversity of Cambridge\nCambridge CB2 1PZ, UK\ncipolla@cam.ac.uk\n\nAbstract\n\nWe present a new co-clustering problem of images and visual features. The prob-\nlem involves a set of non-object images in addition to a set of object images and\nfeatures to be co-clustered. Co-clustering is performed in a way that maximises\ndiscrimination of object images from non-object images, thus emphasizing dis-\ncriminative features. This provides a way of obtaining perceptual joint-clusters\nof object images and features. We tackle the problem by simultaneously boost-\ning multiple strong classi\ufb01ers which compete for images by their expertise. Each\nboosting classi\ufb01er is an aggregation of weak-learners, i.e. simple visual features.\nThe obtained classi\ufb01ers are useful for object detection tasks which exhibit multi-\nmodalities, e.g. multi-category and multi-view object detection tasks. Exper-\niments on a set of pedestrian images and a face data set demonstrate that the\nmethod yields intuitive image clusters with associated features and is much su-\nperior to conventional boosting classi\ufb01ers in object detection tasks.\n\n1 Introduction\n\nIt is known that visual cells (visual features) selectively respond to imagery patterns in perception.\nLearning process may be associated with co-clusters of visual features and imagery data in a way\nof facilitating image data perception. We formulate this in the context of boosting classi\ufb01ers with\nsimple visual features for object detection task [3]. There are two sets of images: a set of object\nimages and a set of non-object images, labelled as positive and negative class members respectively.\nThere are also a huge number of simple image features, only a small fraction of which are selected to\ndiscriminate the positive class from the negative class by H(x) = Pt \u03b1tht(x) where x is an input\nvector, \u03b1t, ht are the weight and the score of t-th weak-learner using a single feature. As object\nimages typically exhibit multi-modalities, a single aggregation of simple features often does not\ndichotomise all object images from non-object images. Our problem is to \ufb01nd out subsets of object\nimages, each of which is associated with a set of features for maximising classi\ufb01cation. Note that\nimage clusters to be obtained are coupled with selected features and likewise features to be selected\nare dependent on image clusters, requiring a concurrent clustering of images and features.\n\nSee Figure 1 for an example where subsets of face images are pose-wise obtained with associated\nfeatures by the proposed method (Section 3). Features are placed around eyes, nose, mouth and etc.\nas the cues for discriminating faces from background. As such facial features are distributed dif-\nferently mainly according to face pose, the obtained pose-wise face clusters are, therefore, intuitive\nand desirable in perception. Note the challenges in achieving this: The input set of face images are\nmixed up by different faces, lighting conditions as well as pose. Some are photographs of real-faces\nand the others are drawings. Desired image clusters are not observable in input space. See Figure 2\n\n\u2217Webpage: http://mi.eng.cam.ac.uk/\u223ctkk22\n\n1\n\n\fFace image set\n\nVisual feature set\n\nFace cluster-1\n\nFeature set-1\n\n...\n\n...\n\nRandom image set\n\nFace cluster-2\n\nFeature set-2\n\nFigure 1: Perceptual co-clusters of images and visual features. For given a set of face and random images\nand simple visual features, the proposed method \ufb01nds perceptual joint-clusters of face images and features,\nwhich facilitates classi\ufb01cation of face images from random images. Face clusters are pose-wise obtained.\n\nfor the result of the traditional unsupervised method (k-means clustering) applied to the face images.\nImages of the obtained clusters are almost random with respect to pose. To obtain perceptual face\nclusters, a method requires a discriminative process and part-based representations (like the simple\nfeatures used). Technically, we must be able to cope with an arbitrary initialisation of image clusters\n(as target clusters are hidden) and feature selection among a huge number of simple visual features.\n\nThe proposed method (Section 3) has potential for wide-applications\nin perceptual data exploration. It generally solves a new co-clustering\nproblem of a data set (e.g. a set of face images) and a feature set (e.g.\nsimple visual features) in a way to maximise discrimination of the\ndata set from another data set (e.g. a set of random images). The\nmethod is also useful for object detection tasks. Boosting a classi\ufb01er\nwith simple features [3] is a state-of-the-art in object detection tasks.\nIt delivers high accuracy and is very time-ef\ufb01cient. Conventionally,\nmultiple boosting classi\ufb01ers are separately learnt for multiple cate-\ngories and/or multiple views of object images [6].\nIt is, however,\ntedious to manually label category/pose for a large data set and, im-\nportantly, it is not clear to de\ufb01ne object categories and scopes of each\npose. Would there be a better partitioning for learning multiple boost-\ning classi\ufb01ers? We let this be a part of automatic learning in the proposed method. It simultaneously\nboosts multiple strong classi\ufb01ers, each of which has expertise on a particular set of object images by\na set of weak-learners.\n\nFigure 2:\nImage sets ob-\ntained by the k-means clus-\ntering method.\n\nFace cluster-1\n\nFace cluster-2\n\nThe remainder of this paper is arranged as follows: we brie\ufb02y review the previous work in Section 2\nand present our solution in Section 3. Experiments and conclusions are drawn in Section 4 and\nSection 5 respectively.\n\n2 Related work\n\nExisting co-clustering work (e.g. [1]) is formulated as an unsupervised learning task. It simultane-\nously clusters rows and columns of a co-occurrence table by e.g. maximising mutual information\nbetween the cluster variables. Conversely, we make use of class labels for discriminative learning.\nUsing a co-occurrence table in prior work is also prohibitive due to a huge number of visual features\nthat we consider.\n\nMixture of Experts [2] (MoE) jointly learns multiple classi\ufb01ers and data partitions. It much em-\nphasises local experts and is suitable when input data can be naturally divided into homogeneous\nsubsets, which is, however, often not possible as observed in Figure 2. In practice, it is dif\ufb01cult to\nestablish a good initial data partition and to perform expert selection based on localities. Note that\nEM in MoE resorts to a local optimum. Furthermore, the data partitions of MoE could be undesir-\nably affected by a large background class in our problem and the linear transformations used in MoE\nare limited for delivering a meaningful part-based representation of images.\n\n2\n\n\fClassifier 1\n\nClassifier 2\n\nClassifier 3\n\nB\n\nC\n\nB\n\nC\n\nA\n\nB\n\nC\n\nA\n\nC\n\nCC\n\nA\n\nA\n\nA\n\nB\n\nA\n\nA\n\nA\n\nA\n\nB\n\nC\n\nBBB\n\nC\n\nBB\n\nB\n\nC\n\nBBB\n\nStep 1\n\nStep 2\n\nStep 3\n\nStep 4\n\nCC\n\nAA\n\nBBB\n\nStep 5\n\nFigure 3: (left) Risk map for given two class data (circle and cross). The weak-learners (either a vertical or\nhorizontal line) found by Adaboost method [7] are placed on high risk regions. (right) State diagram for the\nconcept of MCBoost.\n\nBoosting [7] is a sequential method of aggregating multiple (weak) classi\ufb01ers. It \ufb01nds weak-learners\nto correctly classify erroneous samples in previous weak-learners. While MoE makes a decision by\ndynamically selected local experts, all weak-learners contribute to a decision with learnt weights in\nboosting classi\ufb01er. As afore-mentioned, expert selection is a dif\ufb01cult problem when an input space\nis not naturally divided into sub-regions (clusters). Boosting classi\ufb01er solves various non-linear\nclassi\ufb01cation problems but cannot solve XOR problems where only half the data can be correctly\nclassi\ufb01ed by a set of weak-learners. Two disjointed sets of weak-learners, i.e. two boosting classi-\n\ufb01ers, are required to conquer each half of data by a set of weak-learners.\n\nTorralba et al. have addressed joint-learning of multiple boosting classi\ufb01ers for multiple category\nand multiple view object detection [4]. The complexity of resulting classi\ufb01ers is reduced by sharing\nvisual features among classi\ufb01ers. Each classi\ufb01er in their method is based on each of category-wise\nor pose-wise clusters of object images, which requires manual labels for cateogry/pose, whereas we\noptimise image clusters and boosting classi\ufb01ers simultaneously.\n\n3 MCBoost: multiple strong classi\ufb01er boosting\n\nOur formulation considers K strong classi\ufb01ers, each of which is represented by a linear combination\nof weak-learners as\n\nHk(x) = X\n\n\u03b1kthkt(x),\n\nk = 1, ...K,\n\nt\n\n(1)\n\nwhere \u03b1kt and hkt are the weight and the score of t-th weak-learner of k-th strong classi\ufb01er. Each\nstrong classi\ufb01er is devoted to a subset of input patterns allowing repetition and each weak-learner\nin a classi\ufb01er comprises of a single visual feature and a threshold. For aggregating multiple strong\nclassi\ufb01ers, we formulate Noisy-OR as\n\nP (x) = 1 \u2212 Y\n\n(1 \u2212 Pk(x)),\n\nk\n\n(2)\n\n1\n\nwhere Pk(x) =\n1+exp(\u2212Hk(x)) . It assigns samples to a positive class if any of classi\ufb01ers does and\nassigns samples to a negative class if every classi\ufb01er does. Conventional design in object detection\nstudy [6] also favours OR decision as it does not require classi\ufb01er selection. An individual classi\ufb01er\nis learnt from a subset of positive samples and all negative samples, enforcing a positive sample\nto be accepted by one of the classi\ufb01ers and a negative sample to be rejected by all. Our derivation\nbuilds on the previous Noisy-OR Boost algorithm [5], which has been proposed for multiple instance\nlearning.\nThe sample weights are initialised by random partitioning of positive samples, i.e. wki = 1 if xi \u2208 k\nand wki = 0 otherwise, where i and k denote i-th sample and k-th classi\ufb01er respectively. We set\nwki = 1/K for all k\u2019s for negative samples. For given weights, the method \ufb01nds K weak-learners\n\n3\n\n\fAlgorithm 1. MCBoost\n\nInput: A data set (xi, yi) and a set of pre-de\ufb01ned weak-learners\nOutput: Multiple boosting classi\ufb01ers Hk(x) = PT\n1.Compute a reduced set of weak-learners H by risk map (4) and randomly initialise the\n\nt=1 \u03b1kthkt(x), k = 1..., K\n\nweights wki\n\n2.Repeat for t = 1, ..., T :\n3. Repeat for k = 1, ..., K:\n4.\n5.\n6.\n7. End\n8.End\n\nFind weak-learners hkt that maximise Pi wki \u00b7 hkt(xi), hkt \u2208 H.\nFind the weak-learner weights \u03b1kt that maximise J(H + \u03b1kthkt).\nUpdate the weights by wki = yi\u2212P (xi)\n\n\u00b7 Pk(xi).\n\nP (xi)\n\nFigure 4: Pseudocode of MCBoost algorithm\n\nat t-th round of boosting, to maximise\nX\n\nwki \u00b7 hkt(xi),\n\nhkt \u2208 H,\n\ni\n\n(3)\n\nwhere hkt \u2208 {\u22121, +1} and H is a reduced set of weak-learners for speeding up the proposed\nmultiple classi\ufb01er boosting. The reduced set is obtained by restricting the location of weak-learners\naround the expected decision boundary. Each weak-learner, h(x) = sign(aT x + b), where a and b\nrepresent a simple feature and its threshold respectively, can be represented by aT (x \u2212 xo), where\nxo is interpreted as the location of the weak-learner. By limiting xo to the data points that have\nhigh risk to be misclassi\ufb01ed, the complexity of searching weak-learners at each round of boosting is\ngreatly reduced. The risk is de\ufb01ned as\n\nR(xi) = exp{\u2212\n\nPj\u2208N B\n\ni\n\nkxi \u2212 xjk2\n\nkxi \u2212 xjk2 }\n\n1 + Pj\u2208N W\n\ni\n\n(4)\n\ni\n\nand N W\n\nwhere N B\nare the set of prede\ufb01ned number of nearest neighbors of xi in the opposite\ni\nclass and the same class of xi (See Figure 3). The weak-learner weights \u03b1kt, k = 1, ..., K are then\nfound to maximise J(H + \u03b1kthkt) by a line search. Following the AnyBoost method [8], we set the\nsample weights as the derivative of the cost function with respect to the classi\ufb01er score. For the cost\nfunction J = log Qi P (xi)yi(1 \u2212 P (xi))(1\u2212yi), where yi \u2208 {0, 1} is the label of i-th sample, the\nweight of k-th classi\ufb01er over i-th sample is updated by\n\nP (xi)\nSee Figure 4 for the pseudocode of the proposed method.\n\n\u2202Hk(xi)\n\nwki =\n\n=\n\n\u2202J\n\nyi \u2212 P (xi)\n\n\u00b7 Pk(xi).\n\n(5)\n\n3.1 Data clustering\n\nWe propose a new data clustering method which assigns a positive sample xi to a classi\ufb01er (or\ncluster) that has the highest Pk(xi).\nThe sample weight of k-th classi\ufb01er in (5) is determined by the joint probability P (x) and the\nprobability of k-th classi\ufb01er Pk(x). For a negative class (yi = 0), the weights only depend on the\nprobability of k-th classi\ufb01er. The classi\ufb01er gives high weights to the negative samples that are mis-\nclassi\ufb01ed by itself, independently of other classi\ufb01ers. For a positive class, high weights are assigned\nto the samples that are misclassi\ufb01ed jointly (i.e. the left term in (5)) but may be correctly classi\ufb01ed\nby the k-th classi\ufb01er at next rounds (i.e. high Pk(x)). That is, classi\ufb01ers concentrate on samples in\ntheir expertise through the rounds of boosting. This can be interpreted as data partitioning.\n\n3.2 Examples\n\nFigure 3 (right) illustrates the concept of the MCBoost algorithm. The method iterates two main\nsteps: learning weak-learners and updating sample weights. States in the \ufb01gure represent the sam-\n\n4\n\n\f31\n\nt\nh\ng\ni\ne\nw\n\n \nr\ne\nn\nr\na\ne\nl\nk\na\ne\nw\n\n1.2\n\n1.1\n\n1\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n31\n\n1\n\n1\n\n1\n\n31\n\nclassifier 1\n\n10\n\n30\nboosting round\n\n20\n\n1.2\n\n1.1\n\n1\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\nclassifier 2\n\nclassifier 3\n\n1.3\n\n1.2\n\n1.1\n\n1\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n10\n\n20\n\n30\n\n10\n\n20\n\n30\n\nFigure 5: Example of learning on XOR classi\ufb01cation problem. For a given random initialisation (three\ndifferent color blobs in the left), the method learns three classi\ufb01ers that nicely settle into desired clusters and\ndecision boundaries (middle). The weak-learner weights (right) show the convergence.\n\nples that are correctly classi\ufb01ed by weak-learners at each step. The sample weighting (5) is repre-\nsented by data re-allocation. Assume that a positive class has samples of three target clusters denoted\nby A, B and C. Samples of more than two target clusters are initially assigned to every classi\ufb01er.\nWeak-learners are found to classify dominant samples (bold letter) in each classi\ufb01er (step 1). Clas-\nsi\ufb01ers then re-assign samples according to their expertise (step 2): Samples C that are misclassi\ufb01ed\nby all are given more importance (bold letter). Samples B are moved to the third classi\ufb01er as the\nexpert on B. The \ufb01rst classi\ufb01er learns next weak-learners for classifying sample C while the second\nand third classi\ufb01ers focus on samples A and B respectively (step 3). Similarly, samples A, C are\nmoved into the respective most experts (step 4) and all re-allocated samples are correctly classi\ufb01ed\nby weak-learners (step 5).\n\nWe present an example of XOR classi\ufb01cation problems (See Figure 5). The positive class (circle)\ncomprising the three sub-clusters and the negative class (cross) in background make the XOR con-\n\ufb01guration. Any single or double boosting classi\ufb01ers, therefore, cannot successfully dichotomise the\nclasses. We exploit vertical or horizontal lines as weak-learners and set the number of classi\ufb01ers K\nto be three. We performed random partitioning of positive samples (shown in the left by three differ-\nent color blobs) for initialising the sample weights. The \ufb01nal decision boundaries and the tracks of\ndata cluster centres of the three boosting classi\ufb01ers are shown in the middle. Despite the mixed-up\ninitialisation, the method learns the three classi\ufb01ers that nicely settle into the target clusters after a\nbit of jittering in the \ufb01rst few rounds. The weak-learner weights (in the right) show the convergence\nof the three classi\ufb01ers. Note that the method does not exploit any distance information between input\ndata points, by which conventional clustering methods can apparently yield the same data clusters\nin this example. As exempli\ufb01ed in Figure 2, obtaining desired data clusters by conventional ways\nare, however, dif\ufb01cult in practice. The proposed method works well with random initialisations and\ndesirably exhibits quicker convergence when a better initialisation is given.\n\n3.3 Discussion on mixture of experts and future work\n\nThe existing local optimisation method, MoE, suffers from the absence of a good initialisation so-\nlution, but has nice properties once a good initialisation exists. We have implemented MoE in the\nAnyboost framework. The sample probability in MoE is\n\nP (xi) = 1/(1 + exp(\u2212 X\n\nQk(xi) \u00b7 Hk(xi)))\n\nk\n\nwhere Qk(xi) is the responsibility of k-th classi\ufb01er over xi. Various clustering methods can de\ufb01ne\nthe function Qk(xi). By taking the derivative of the cost function, the sample weight of k-th classi-\n\ufb01er is given as wki = (yi \u2212 P (xi)) \u00b7 Qk(xi). An EM-like algorithm iterates each round of boosting\nand the update of Qk(xi). Dynamic selection of local experts helps time-ef\ufb01cient classi\ufb01cation as it\ndoes not use all experts.\n\nUseful future studies on the MCBoost method include development of a method to automatically\ndetermine K, the number of classi\ufb01ers. At the moment, we \ufb01rst try a large K and decide the right\nnumber as the number of visually heterogeneous clusters obtained (See Section 4). A post-corrective\nstep of initial weak-learners would be useful for more ef\ufb01cient classi\ufb01cation. When the classi\ufb01ers\nstart from wrong initial clusters and oscillate between clusters until settling down, some initial weak-\n\n5\n\n\fRandom images and simple visual features\n\nImage cluster centres\n\ns\ne\ng\na\nm\n\ni\n \nn\na\ni\nr\nt\ns\ne\nd\ne\nP\n\ns\ne\ng\na\nm\n\ni\n \ne\nc\na\nF\n\nK=5\n\nK=3\n\nK=9\n\nFigure 6: Perceptual clusters of pedestrian and face images. Clusters are found to maximise discrimination\npower of pedestrian and face images from random images by simple visual features.\n\nlearners are wrong and others may be wasted to make up for the wrong ones. Once the classi\ufb01ers\n\ufb01nd right clusters, they exhibit convergence by decreasing the weak-learner weights.\n\n4 Experiments\n\nWe performed experiments using a set of INRIA pedestrian data [10] and PIE face data [9]. The\nINRIA set contains 618 pedestrian images as a positive class and 2436 random images as a negative\nclass in training and 589 pedestrian and 9030 random images in testing. The pedestrian images\nshow wide-variations in background, human pose and shapes, clothes and illuminations (Figure 6).\nThe PIE data set involves 900 face images as a positive class (20 persons, 9 poses and 5 lighting\nconditions) and 2436 random images as a negative class in training and 900 face and 12180 random\nimages in testing. The 9 poses are distributed form left pro\ufb01le to right pro\ufb01le of face, and the 5\nlighting conditions make sharp changes on face appearance as shown in Figure 6. Some facial parts\nare not visible depending on both pose and illumination. All images are cropped and resized into\n24\u00d724 pixel images. A total number of 21780 simple rectangle features (as shown in Figure 1) were\nexploited.\n\nMCBoost learning was performed with the initial weights that were obtained by the k-means clus-\ntering method. Avoiding the case that any of the k-means clusters is too small (or zero) in size\nhas helped quick convergence in the proposed method. We set the portion of high risk data as\n20% of total samples for speeding up. The number of classi\ufb01ers was set as K \u2208 {2, 3, 4, 5} and\nK \u2208 {3, 5, 7, 9} for the INRIA and PIE data set respectively. For all cases, every classi\ufb01er converged\nwithin 50 boosting rounds.\n\nFigure 6 shows the cluster centers obtained by the proposed method. The object images were parti-\ntioned into K clusters (or classi\ufb01ers) by assigning them to the classi\ufb01er that has the highest Pk(x).\nFor the given pedestrian images, the \ufb01rst three cluster centres look unique and the last two are rather\nredundant. The three pedestrian clusters obtained are intuitive. They emphasise the direction of\nintensity changes at contours of the human body as discriminating cues of pedestrian images from\nrandom images. It is interesting to see distinction of upper and lower body in the second cluster,\nwhich may be due to different clothes. For the PIE data set, the obtained face clusters re\ufb02ect both\npose and illumination changes, which is somewhat different from our initial expectation of getting\npurely pose-wise clusters as the case in Figure 1. This result is, however, also reasonable when con-\nsidering the strong illumination conditions that cause shadowing of face parts. For example, frontal\nfaces whose right-half side is not visible by the lighting cannot share any features with those having\nleft-half side not visible. Certain pro\ufb01le faces rather share more facial features (e.g. one eye, eye\nbrow and a half mouth) with the half-shadowed frontal faces, jointly making a cluster. All 9 face\nclusters seem to capture unique characteristics of the face images.\n\nWe have also evaluated the proposed method in terms of classi\ufb01cation accuracy. Figure 7 shows\nfalse-negative and false-positive curves of MCBoost method and AdaBoost method [7]. We set all\n\n6\n\n\f \n\nMCBoost\nAdaBoost\n\nK=2\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\ns\ne\nv\ni\nt\na\ng\ne\nn\n\n \ne\ns\nl\na\nF\n\n0\n\n \n0\n\n0.1 0.2 0.3 0.4 0.5\nFalse positives\n\n \n\nAdaBoost\nMCBoost\n\nK=3\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\ns\ne\nv\ni\nt\na\ng\ne\nn\n\n \ne\ns\nl\na\nF\n\n0\n\n \n0\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n \n0\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n \n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\nK=3\n\nK=4\n\n0.1 0.2 0.3 0.4 0.5\n\n0\n\n0\n\n0.1 0.2 0.3 0.4 0.5\n\nK=5\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\nK=7\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n0\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n \n0\n\nK=5\n\n0.1 0.2 0.3 0.4 0.5\n\n \n\nAdaBoost\nMCBoost\nPose label\n\nK=9\n\n0.1 0.2 0.3 0.4 0.5\n\n0.1 0.2 0.3 0.4 0.5\nFalse positives\n\n0\n\n0\n\n0.1 0.2 0.3 0.4 0.5\n\n0\n\n0\n\n0.1 0.2 0.3 0.4 0.5\n\nFigure 7: ROC curves for the pedestrian data (top four) and face data (bottom four). MCBoost signif-\nicantly outperformed AdaBoost method for both data sets and different cluster numbers K. MCBoost is also\nmuch superior to AdaBoost method learnt with manual pose label (bottom right).\n\nconditions (e.g. number of weak-learners) equivalent in both methods. The k-means clustering\nmethod was applied to positive samples. Boosting classi\ufb01ers were individually learnt by the positive\nsamples of each cluster and all negative samples in AdaBoost method. The clusters obtained by the\nk-means method were exploited as the initialisation in MCBoost method. For the PIE data set, we\nalso performed data partitioning by the manual pose label and learnt boosting classi\ufb01ers separately\nfor each pose in AdaBoost method. For both pedestrian and face experiments and all different\nnumber of classi\ufb01ers K, MCBoost signi\ufb01cantly outperformed AdaBoost method by \ufb01nding optimal\ndata clusters and associated feature sets. Our method is also much superior to the Adaboost learnt\nwith manual pose labels (bottom right).\n\n \n\nIn the AdaBoost method, increasing number\nof clusters deteriorated the accuracy for the\npedestrian data, whereas it increased the per-\nformance for the face data. This may be\nexplained by the number of meaningful data\nclusters. We observed in Figure 6 that there\nare only three heterogenous pedestrian clusters\nwhile there are more than nine face clusters. In\ngeneral, a smaller number of positive samples\nin each classi\ufb01er (i.e. a larger K) causes per-\nformance degradation, if it is not counteracted\nby \ufb01nding meaningful clusters. We deduce, by a similar reason, that the performance of our method\nwas not much boosted when the number of classi\ufb01ers was increased (although it tended to gradually\nimprove the accuracy for both data sets).\n\nFigure 8: Example pedestrian detection result.\n\n0.4\n\n0.8\n\n0.6\n\n0.2\n\n \n\nFigure 8 shows an example pedestrian detection result. Scanning the example image yields a total\nnumber of 172,277 image patches to classify. Our method ran in 3.6 seconds by non-optimised\nMatlab codes in a 3GHz CPU PC.\n\n5 Conclusions\n\nWe have introduced a discriminative co-clustering problem of images and visual features and have\nproposed a method of multiple classi\ufb01er boosting called MCBoost. It simultaneously learns image\nclusters and boosting classi\ufb01ers, each of which has expertise on an image cluster. The method\nworks well with either random initialisation or initialisation by conventional unsupervised clustering\n\n7\n\n\fmethods. We have shown in the experiments that the proposed method yields perceptual co-clusters\nof images and features.\nIn object detection tasks, it signi\ufb01cantly outperforms two conventional\ndesigns that individually learn multiple boosting classi\ufb01ers by the clusters obtained by the k-means\nclustering method and pose-labels.\n\nWe will apply MCBoost to various other co-clustering problems in the future. Some useful studies\non MCBoost method have also been discussed in Section 3.3. Learning with a more exhaustive\ntraining set would improve the performance of the method in object detection tasks.\n\nAcknowledgements\n\nThe authors are grateful to many people who have helped by proofreading drafts and providing\ncomments and suggestions. They include Z. Ghahramani, B. Stenger, T. Woodley, O. Arandjelovic,\nF. Viola and J. Kittler. T-K. Kim is \ufb01nancially supported by the research fellowship of the Sidney\nSussex College of the University of Cambridge.\n\nReferences\n\n[1] I.S. Dhillon, S. Mallela and D.S. Modha, Information-theoretic co-clustering, Proc. ACM SIGKDD Int\u2019l\n\nConf. on Knowledge discovery and data mining, pages 89\u201398, 2003.\n\n[2] M.I. Jordan and R.A. Jacobs, Hierarchical mixture of experts and the EM algorithm, Neural Computation,\n\n6(2):181\u2013214, 1994.\n\n[3] P. Viola and M. Jones, Robust real-time object detection, Int\u2019l J. Computer Vision, 57(2):137\u2013154, 2002.\n[4] A. Torralba, K. P. Murphy and W. T. Freeman, Sharing visual features for multiclass and multiview object\n\ndetection, IEEE Trans. on Pattern Analysis and Machine Intelligence, 29(5):854\u2013869, 2007.\n\n[5] P. Viola, J.C. Platt and C. Zhang, Multiple Instance Boosting for Object Detection, Proc. Advances in\n\nNeural Information Processing Systems, pages 1417\u20131426, 2006.\n\n[6] S.Z. Li and Z. Zhang, Floatboost learning and statistical face detection, IEEE Trans. on Pattern Analysis\n\nand Machine Intelligence, 26(9):1112\u20131123, 2004.\n\n[7] R. Schapire, The strength of weak learnability, Machine Learning, 5(2):197\u2013227, 1990.\n[8] L. Mason, J. Baxter, P. Bartlett and M. Frean, Boosting algorithms as gradient descent, Proc. Advances in\n\nNeural Information Processing Systems, pages 512\u2013518, 2000.\n\n[9] T. Sim, S. Baker, and M. Bsat, The CMU Pose, Illumination, and Expression Database, IEEE Trans. on\n\nPattern Analysis and Machine Intelligence, 25(12):1615\u20131618, 2003.\n\n[10] N. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, Proc. IEEE Conf.\n\nComputer Vision and Pattern Recognition, pages 886\u2013893, 2005.\n\n8\n\n\f", "award": [], "sourceid": 637, "authors": [{"given_name": "Tae-kyun", "family_name": "Kim", "institution": null}, {"given_name": "Roberto", "family_name": "Cipolla", "institution": null}]}