Bayesian Robustification for Audio Visual Fusion

Part of Advances in Neural Information Processing Systems 10 (NIPS 1997)

Bibtex Metadata Paper


Javier Movellan, Paul Mineiro


We discuss the problem of catastrophic fusion in multimodal recog(cid:173) nition systems. This problem arises in systems that need to fuse different channels in non-stationary environments. Practice shows that when recognition modules within each modality are tested in contexts inconsistent with their assumptions, their influence on the fused product tends to increase, with catastrophic results. We ex(cid:173) plore a principled solution to this problem based upon Bayesian ideas of competitive models and inference robustification: each sensory channel is provided with simple white-noise context mod(cid:173) els, and the perceptual hypothesis and context are jointly esti(cid:173) mated. Consequently, context deviations are interpreted as changes in white noise contamination strength, automatically adjusting the influence of the module. The approach is tested on a fixed lexicon automatic audiovisual speech recognition problem with very good results.