{"title": "Robot Docking Using Mixtures of Gaussians", "book": "Advances in Neural Information Processing Systems", "page_first": 945, "page_last": 951, "abstract": null, "full_text": "Robot Docking using Mixtures of Gaussians \n\nMatthew Williamson* \n\nRoderick Murray-Smith t \n\nVolker Hansent \n\nAbstract \n\nThis  paper applies  the  Mixture  of Gaussians  probabilistic model,  com(cid:173)\nbined  with  Expectation Maximization  optimization  to  the  task  of sum(cid:173)\nmarizing three dimensional range data for a mobile robot.  This provides \na flexible way of dealing with uncertainties in sensor information, and al(cid:173)\nlows the introduction of prior knowledge into low-level perception mod(cid:173)\nules.  Problems with the basic approach were solved in  several ways:  the \nmixture of Gaussians was reparameterized to reflect the types of objects \nexpected  in  the  scene,  and  priors  on  model  parameters  were  included \nin  the  optimization  process.  Both approaches force  the  optimization  to \nfind  'interesting'  objects,  given the  sensor and object characteristics.  A \nhigher level  classifier  was  used  to  interpret the  results  provided by  the \nmodel, and to reject spurious solutions. \n\n1  Introduction \n\nThis paper concerns an application of the Mixture of Gaussians (MoG) probabilistic model \n(Titterington  et  aI.,  1985)  for  a  robot  docking  application.  We  use  the  Expectation(cid:173)\nMaximization (EM) approach (Dempster et aI.,  1977) to fit Gaussian sub-models to a sparse \n3d representation of the robot's environment, finding  walls, boxes, etc ..  We  have modified \nthe MoG formulation in three ways to incorporate prior knowledge about the task, and the \nsensor characteristics:  the parameters of the Gaussians are recast to constrain how they fit \nthe data, priors on these parameters are calculated and incorporated into the EM algorithm, \nand a higher level processing stage is included which interprets the fit of the Gaussians on \nthe  data,  detects  misclassifications,  and  providing prior information to  guide  the  model(cid:173)\nfitting. \n\nThe robot is  equipped with  a LIDAR 3d laser range-finder (PIAP,  1995) which  it uses  to \nidentify possible docking objects.  The range-finder calculates the time of flight for  a light \npulse reflected off objects in  the scene.  The particular LIDAR used is  not very  powerful, \nmaking objects with poor reflectance (e.g., dark, shiny, or surfaces not perpendicular to the \n\n*Corresponding author:  MIT AI Lab, Cambridge, MA, USA. rna t t@ai . rni t  . edu \ntDept. of Mathematical Modelling, Technical University of Denmark.  rod@imm. dtu. dk \ntDaimlerChrysler, Alt-Moabit 96a, Berlin, Germany. hansen@dbag.bIn. dairnierbenz . com \n\n\f946 \n\nM  M  Williamson,  R.  Murray-Smith and V. Hansen \n\nlaser beam) invisible.  The scan pattern is also very sparse, especially in the vertical direc(cid:173)\ntion, as shown in the scan of a wall in Figure 1.  However, if an object is detected, the range \nreturned is accurate (\u00b11-2cm). When the range data is plotted in Cartesian space it forms \na number of sparse clusters, leading naturally to the use  of MoG clustering algorithms to \nmake sense of the scene. While the Gaussian assumption is not an ideal model of the data, \nthe generality of MoG, and its ease of implementation and analysis motivated its use over a \nmore specialized approach.  The sparse nature of the data inspired the modifications to the \nMoG formulation described in this paper. \n\nModel-based  object recognition  from dense range images  has  been  widely  reported (see \n(Arman and Aggarwal, 1993) for a review), but is not relevant in this case given the sparse(cid:173)\nness of the data.  Denser range images could be collected by combining multiple scans, but \nthe poor visibility of the sensor hampers the application of these techniques. The advantage \nof the MoG technique is that the segmentation is \"soft\", and perception proceeds iteratively \nduring learning.  This  is  especially useful for mobile robots  where evidence accumulates \nover time, and the allocation of attention is time and state-dependent. The EM algorithm is \nuseful since it is guaranteed to converge to a local maximum. \n\nThe following  sections of the  paper describe the  re-parameterization of the Gaussians  to \nmodel  plane-like clusters,  the  formulation  of the  priors,  and  the  higher level  processing \nwhich interprets the clustered data in order to both move the robot and provide prior infor(cid:173)\nmation to the model-fitting algorithm. \n\n-<>.2 \n-0. \ne \n~ -0.6 \n~-08 \n-, \n-, .. \n\n2 \n\nFigure 1:  Plot showing data from a LIDAR scan of a wall, plotted in Cartesian space.  The \nrobot is located at the origin, with the y axis pointing forward, x to the right, and z up.  The \nsparse scan pattern is  visible, as  well  as  the visibility constraint:  the wall extends beyond \nwhere the scan ends, but is invisible to the LIDAR due to the orientation of the wall \n\n2  Mixture of Gaussians model \n\nThe range-finder returns a set of data, each of which is a position in Cartesian space Xi  = \n(Xi, Yi, Zi).  The complete set of data D  = {Xl ... XN} is modeled as being generated by a \nmixture density \n\nM \n\nP(xn) = L P(xn Ii, JLi, Ei , 1l'i)P( i), \n\nwhere we use a Gaussian as the sub-model, with mean JLi, variance Ei and weight 1l'i' which \nmakes the probability of a particular data point: \n\ni=l \n\nP(xnIJL, E, 1l')  =  ~ (21l')3/:jEi I1/2 exp ( -~(Xn - JLi)TE;l(xn - JLi)) \n\nM \n\n\fRobot Docking Using Mixtures of Gaussians \n\n947 \n\nGiven a set of data D, the most likely set of parameters is  found using the EM algorithm. \nThis  algorithm  has  a  number  of advantages,  such  as  guaranteed  convergence  to  a  local \nminimum, and efficient computational performance. \n\nIn 3D Cartesian space, the Gaussian sub-models form ellipsoids, where the size and orien(cid:173)\ntation are determined by the covariance matrix  ~~.  In the general case, the EM algorithm \ncan be used  to  learn  all  the parameters of ~i.  The sparseness  of the LIDAR data makes \nthis parameterization inappropriate, as various odd collections of points could be clustered \ntogether.  By changing the parameterization of ~~ to better model plane-like structures, the \nsystem can be improved.  The reparameterization is  most readily expressed in terms of the \neigenvalues Ai  and eigenvectors ~ of the covariance matrix ~i =  ~Ai ~ -I. \n\nThe  covariance  matrix  of a  normal  approximation to  a  plane-like  vertical  structure  will \nhave a  large eigenvalue in  the  z direction,  and  in  the  x-y plane one large  and  one small \neigenvalue.  Since ~i is  symmetrical, the eigenvectors are orthogonal, v:- I  =  ~T =  ~, \n\nand ~i can be written: \n\no \n\nwhere  Oi  is  the  angle  of orientation of the  ith  sub-model  in  the  x-y plane,  ai  scales  the \ncluster in  the x  and y directions,  and bi  scales in  the  z direction.  The constant, controls \nthe aspect ratio of the ellipsoid in the x-y plane. I \nThe  optimal  values  of these  parameters  (a, b)  are  found  using  EM,  first  calculating  the \nprobability that data point Xn  is modeled by Gaussian i, (htn ) for every data point Xn  and \nevery Gaussian i, \n\n7ril~il-1/2 exp (-~(Xn - fli)T~il(Xn -\n\nfli)) \n\nhin  =  --~M~--------~~~--------~--------~--\n\nLi==1 7ril~~I - 1/2exp (-~(Xn - fldT~il(Xn -\n\nfli))' \n\nThis \"responsibility\" is  then used as a weighting for the updates to the other parameters, \n\nfli \n\nLn hinxn \nLn htn ' \n\n2 Ln htn(Xnl  - flil)(Xn 2 -\n(Xn2  -\n\nLn htn[(Xnl  - fl~I)2 -\n\n) \n\nfli2) \nfli2)2] \n\n-I  ( \n\n{)  _  ~ t \nt  - 2  an \nflid sin 0 + (Xn2  -\nb _  Ln hin (Xn3 -\nLn h tn \n\nt  -\n\n(r - l)((xnl  -\nLn hin( \n2, Ln hin ' \n\nfl~2) COSO)2  + (Xnl  -\nfln3)2 \n\n' \n\nflid 2 + (Xn2  -\n\nfli2)2 \n\nwhere Xnl  is the first element of Xn  etc.  and (  corresponds to the projection of the data into \nthe plane ofthe cluster. It is im~ortant to update the means fli  first,  and use the new values \nto update the other parameters.  Figure 2 shows a typical model response on real LIDAR \ndata. \n\n2.1  Practicalities of application, and results \n\nStarting values for the model parameters are important, as EM is only guaranteed to find a \nlocal optimum.  The Gaussian mixture components are initialized with a large covariance, \nallowing them to  pick up data and move to the correct positions.  We  found that initializing \nthe  means  fli  to  random data points,  rather than  randomly  in  the  input space,  tended  to \n\n1 By experimentation, a value of'Y of 0.01  was found to be reasonable for this application. \n2Intuition for the Oi  update can be obtained by considering that (Xnl - fltl) is the x component of \nthe distance between Xn  and /.Li,  which is IXn - /.Ld  cos e, and similarly (Xn2 - /.Li2)  is IXn - /.Li I sin e, \nso tan 2() = sin 20  =  2 sin 0 cos 0  =  2(xn1 -1'.1 )(xn 2 -1'.2) \n(X n 1-l'i1 )2 -(Xn2 -1'.2)2 \n\ncos2 0-sin2 0 \n\ncos 20 \n\n. \n\n\f948 \n\nM. M.  Williamson,  R.  Murray-Smith  and V. Hansen \n\nO'+ ~~ 1  ;Ui?h \u2022\n----..-~ \n\" \n\n+ \u2022 \n\n... \n\n\u2022 \n\nFigure 2:  Example of clustering of the 3d data points.  The left hand graph shows the view \nfrom  above  (the  x-y plane),  and  the  right graph  shows  the  view  from  the  side  (the  y-z \nplane), with the robot positioned at the origin. The scene shows a box at an oblique angle, \nwith  a wall behind.  The extent of the  plane-like Gaussian  sub-models is  illustrated using \nthe ellipses, which are drawn at a probability of 0.5. \n\nwork  better,  especially  given  the  sensor  characteristics-if the  LIDAR returned  a  range \nmeasurement, it was likely to be part of an  interesting object. \n\nDespite the accuracy of measurement, there are still outlying data points, and it is  impos(cid:173)\nsible to  fully  segment the  data into  separate  objects.  One simple solution  we  found  was \nto  define a \"junk\" Gaussian.  This  is  a sub-model placed in the  center of the data,  with a \nlarge covariance ~. This Gaussian then becomes responsible for the outliers in the data (i.e. \nsparsely distributed data over the whole scene, none of which are associated with a specific \nobject), allowing the object-modeling Gaussians to work undistracted. \nThe use of EM with the a, b, e parameterization found and represented plane-like data clus(cid:173)\nters better than models where all the elements of the covariance matrix were free  to adapt. \nIt also tended to converge faster, probably due to the reduced numbers of parameters in the \ncovariance matrix (3 as opposed to 6).  Although the algorithm is constrained to find planes, \nthe parameterization was flexible enough to model other objects such as thin vertical lines \n(say from a table leg).  The only problem with the algorithm was that it occasionally found \npoor local minimum solutions, such as  illustrated in Figure 3.  This is  a common problem \nwith least squares based clustering methods (Duda and Hart,  1973) . \n\nO. \n\nOB \n\n07 \n\n06 \n\nos \n\n04 \n\n03 \n\n02 \n\n01 \n\n0 -, \n\n\u2022 \n\n-o.s \n\n\u2022 \u2022 \n\nos \n\n.. \n\nO. \no. \n\n07 \n\n06 \n\nos \n\n04 \n\n03 \n\n0.2 \n\n01 \n\nI  ..%.6 \n\n-04 \n\n-02 \n\n02 \n\n04 \n\n06 \n\n08 \n\nFigure  3:  Two  examples  of 'undesirable'  local  minimum  solutions  found  by  EM.  Both \ngraphs show the top view of a scene of a box in front of a wall.  The algorithm has incor(cid:173)\nrectly clustered the box with the left hand side of the wall. \n\n\fRobot Docking Using Mixtures ofGaussians \n\n949 \n\n3  Incorporating prior information \n\nAs well as reformulating the Gaussian models to  suit our application, we also incorporated \nprior  knowledge  on  the  parameters  of the  sub-models.  Sensor  characteristics  are  often \nwell-defined, and it makes sense to use these as early as possible in perception, rather than \ndealing with their side-effects at higher levels of reasoning.  Here, e.g.,  the  visibility con(cid:173)\nstraint,  by which only planes which are almost perpendicular to  the lidar rays are visible, \ncould be included by  writing  P(xn)  =  I:~~l P(xnli, f3t)P(i)P(visiblelf3i),  the updates \ncould be recalculated, and the feature immediately brought into the modeling process.  In \naddition, prior knowledge about the locations and sizes of objects, maybe from other sen(cid:173)\nsors,  can  be  used  to  influence  the  modeling procedure.  This  allows  the  sensor to  make \nbetter use of the sparse data. \nFor a model with parameters f3  and data D, Bayes rule gives: \n\nP(f3)  II \n\nP(,8ID)  =  P(D) \n\nP(xnlf3)\u00b7 \n\nNormally  the  logarithm  of this  is  taken,  to  give  the  log-likelihood,  which  in  the case  of \nmixtures of Gaussians is \nL(DIf3)  = log(p({/-li, 7ri,ai,bi ,6Q)) -log(p(D)) + LlogLp(xnli,/-li,7ri,ai,bi,Oi) \n\nn \n\nTo  include  the  parameter priors  in  the  EM  algorithm,  distributions  for  the  different  pa(cid:173)\nrameters  are  chosen,  then  the  log-likelihood  is  differentiated  as  usual  to  find  the  up(cid:173)\ndates  to  the  parameters  (McMichael,  1995).  The  calculations  are  simplified  if  the \npriors  on  all  the  parameters  are  assumed  to  be  independent,  p( {/-li,  7rt ,  ai, bt , Od)  = \nIt p(/-ldp( 7ri)P( ai)p(bdp( Od\u00b7 \nThe  exact  form  of the  prior  distributions  varies  for  different  parameters,  both  to  cap(cid:173)\nture  different  behavior  and  for  ease  of implementation.  For  the  element  means  (/-li), \na  flat  distribution  over  the  data  is  used,  specifying  that  the  means  should  be  among \nthe  data  points.  For  the  element  weights,  a  multinomial  Dirichlet  prior  can  be  used, \np(7ri la)  =  n::~1J n~l 7rf.  When  the  hyperparameter a  >  0,  the  algorithm favours \nweights around 1/ NI,  and  when  -1  <  a  <  0,  weights  close  to  0  or  1.3  The expected \nvalue  of ai  (written  as  ai)  can  be  encoded  using  a  truncated  inverse  exponential  prior \n(McMichael,  1995),  setting  p(ailai)  =  Kexp(-at/(2ai)),  where  K  is  a  normalizing \nfactor. 4  The prior for bi  has the same form.  Priors for Ot  were not used, but could be useful \nto capture the visibility constraint.  Given these distributions, the updates to the parameters \nbecome \n\nI:n hin(/, + a;  bt  =  I:n hin (Xn3 - /-ln3)  + bt . \n\n2 \n\n-\n\n2 I:n hin \n\nI:n hin \n\nThe update for /-li  is the same as  before, the prior having no effect.  The update for at  and \nbt  forces them to be near ai  and bi , and the update for 7ri  is affected by the hyperparameter \na. \nThe priors on ai  and bi had noticeable effects on the models obtained.  Figure 4 shows the \nresults from two fits,  starting from identical initial conditions.  By adjusting the size of the \nprior, the algorithm can be guided into finding different sized clusters.  Large values of the \nprior are shown here to demonstrate its effect. \n\n3In  this  paper we  make  little use  of the  Q  priors,  but  introducing  separate  Q;'S  for  each  object \n\ncould be a useful next step for scenes with varying object sizes. \n\n4To deal  with the case when a,  = 0, the prior is truncated, setting p(a;!a,) =  0 when a,  < Perit . \n\n/-li \n\no'i \n\nI:n hin + a \nI:n I:j  h jn + a \n\n\f950 \n\nM  M  Williamson.  R.  Murray-Smith and V.  Hansen \n\n.. \n\n~  \"  6JiiZC3!' \n\n.'  . \n\\....t \n\n.....  . \n. ~ \n, \n\n1,  4'  ~.  . , \n\n\u2022 \n\n~ \n\n~ \n\nf \nf.1J \n\n~  . \n\n; \n\n~ \n\n~ \n) \n\n. \n\\....t \n\n..  ~.'  ., \n\n\u2022 \n\n.,  ~ \n\n~, . \n~ \n, \n-\n~ . \n... \n\n~'. \n\n@ \n,. \n\n) \n., \n\n'.' \n\n.. \n\nf \nf:> \n\n~ . \n.. \n\n; \n\nFigure  4:  Example  of the  action  of the  priors  on  ai  and  bi .  The  photograph  shows  a \nvisual image of the scene:  a box in front of a wall,  and the priors were chosen to prefer a \ndistribution matching the wall.  The two left hand graphs show the top and side view of the \nscene clustered without priors, while the two right hand graphs use priors on ai and bi .  The \npriors give a preference for large values of ai and bi ,  so biasing the optimization to  find a \nmixture component matching the whole wall as opposed to just the top of it. \n\n4  Classification and diagnosis \n\nFEATURES \n\nSENSOR \n\nMODEL FITIING \n\nHIGHER LEVEL  MOVE COMMAND \n\nDATA \n\nEM ALGORITHM \n\nPRIOR \n\nPROCESSING \n\nFOR ROBOT \n\nINFORMATION \n\nFigure 5:  Schematic of system \n\nThis section  describes  how higher-level processing can  be  used to  not only  interpret the \nclusters fitted by the EM algorithm, but also affect the model-fitting using prior information. \nThe processes of model-fitting and analysis are thus coupled, and not sequential. \n\nThe results of the model fitting are primarily processed to steer the robot.  Once the cluster \nhas  been  recognized as  a boxlwaIVetc.,  the location  and  orientation  are  used  to  calculate \na  move  command.  To  perform  the  object-recognition,  we  used  a  simple  classifier  on  a \nfeature vector extracted from the clustered data.  The labels used were specific to docking, \nand commonly clustered objects - boxes, walls, thin vertical lines. but also included labels \nfor clustering errors (like those shown in Figure 3). The features used were the values of the \nparameters ai, bi ,  giving the  size of the clusters,  but also measures of the visibility of the \nclusters, and the skewness of the within-cluster data. The classification used simple models \nof the  probability  distributions  of the  features  fi'  given  the  objects  OJ  (i.e.  P(hIOj)), \nusing a set of training data.  In  addition to moving the robot, the classifier can modify the \nbehavior of the model fitting  algorithm.  If a poor clustering solution is  found, EM can be \nre-run with slightly different initial conditions. If the probable locations or sizes of objects \nare known from previous scans, or indeed from other sensors, then these can constrain the \nclustering through priors, or provide initial means. \n\n\fRobot Docking Using Mixtures ofGaussians \n\n951 \n\n5  Summary \n\nThis paper shows that the Mixture of Gaussians architecture combined with EM optimiza(cid:173)\ntion and the use of parameter priors can be used to segment and analyze real data from the \n3D range-finder of a mobile robot.  The approach was successfully used  to  guide a mobile \nrobot towards a docking object, using only its range-finder for perception. \n\nFor the  learning  community  this  provides more  than  an  example  of the  application  of a \nprobabilistic  model  to  a  real  task.  We  have  shown  how  the  usual  Mixture  of Gaussians \nmodel can be parameterized to include expectations about the environment in a way which \ncan  be  readily extended.  We  have  included prior knowledge at  three  different levels:  1. \nThe  use  of problem-specific  parameterization  of the  covariance  matrix  to  find  expected \npatterns (e.g.  planes at particular angles).  2.  The use of problem-specific parameter priors \nto automatically rule-out unlikely objects at the lowest level of perception.  3. The results of \nthe clustering process were post-processed by higher-level classification algorithms which \ninterpreted the parameters of the mixture components, diagnosed typical misclassification, \nprovided new priors for future perception, and gave the robot control system new targets. \n\nIt is  expected that the  basic  approach can  be fruitfully  applied  to  other sensors,  to  prob(cid:173)\nlems which track dynamically changing scenes, or to problems which require relationships \nbetween  objects  in  the  scene  to  be  accounted  for  and  interpreted.  A  problem  common \nto  all  modeling  approaches  is  that  it  is  not trivial  to  determine the  number and types  of \nclusters needed to represent a given scene.  Recent work with Markov-Chain Monte-Carlo \napproaches has been successfully applied to mixtures of Gaussians (Richardson and Green, \n1997), allowing a Bayesian solution to  this problem, which could provide control systems \nwith  even  richer probabilistic  information  (a  series  of models conditioned on  number of \nclusters). \n\nAcknowledgements \n\nAll  authors  were  employed by  Daimler-Benz AG  during stages of the  work.  R.  Murray(cid:173)\nSmith gratefully acknowledges the support of Marie Curie TMR grant FMBICT96 I 369. \n\nReferences \nArman,  F.  and  Aggarwal,  J.  K.  (1993).  Model-based  object recognition  in  dense-range \n\nimages-a review.  ACM Computing Surveys, 25 (1), 5-43. \n\nDempster, A. P., Laird, N.  M., and Rubin, D.  B. (1977).  Maximum likelihood from incom(cid:173)\n\nplete data via the EM algorithm.  J.  Royal Statistical Society Series B, 39,  1-38. \n\nDuda, R.  O.  and Hart, P.  E.  (1973).  Pattern  Classification and Scene Analysis.  New York, \n\nWiley. \n\nMcMichael,  D.  W.  (1995).  Bayesian  growing  and  pruning  strategies  for  MAP-optimal \nIn  4th  lEE International  Con!  on  Artificial \n\nestimation  of gaussian  mixture  models. \nNeural Networks, pp.  364-368. \n\nPIAP  (1995).  PIAP  impact report on  TRC  lidar  performance.  Technical  Report  1,  In(cid:173)\ndustrial  Research  Institute  for  Automation  and Measure ments,  02-486 Warszawa,  AI. \nJerozolimskie 202, Poland. \n\nRichardson, S.  and Green, P.  J.  (1997).  On Bayesian anaysis of mixtures with an unknown \n\nnumber of components. Journal of the Royal Statistical Society B,  50 (4), 700-792. \n\nTitterington,  D.,  Smith, A.,  and Makov,  U.  (1985).  Statistical Analysis of Finite  Mixture \n\nDistributions.  Chichester, John Wiley &  Sons. \n\n\f", "award": [], "sourceid": 1538, "authors": [{"given_name": "Matthew", "family_name": "Williamson", "institution": null}, {"given_name": "Roderick", "family_name": "Murray-Smith", "institution": null}, {"given_name": "Volker", "family_name": "Hansen", "institution": null}]}