Discriminative Keyword Selection Using Support Vector Machines

Part of Advances in Neural Information Processing Systems 20 (NIPS 2007)

Bibtex Metadata Paper

Authors

Fred Richardson, William Campbell

Abstract

Many tasks in speech processing involve classification of long term characteristics of a speech segment such as language, speaker, dialect, or topic. A natural technique for determining these characteristics is to first convert the input speech into a sequence of tokens such as words, phones, etc. From these tokens, we can then look for distinctive phrases, keywords, that characterize the speech. In many applications, a set of distinctive keywords may not be known a priori. In this case, an automatic method of building up keywords from short context units such as phones is desirable. We propose a method for construction of keywords based upon Support Vector Machines. We cast the problem of keyword selection as a feature selection problem for n-grams of phones. We propose an alternating filter-wrapper method that builds successively longer keywords. Application of this method on a language recognition task shows that the technique produces interesting and significant qualitative and quantitative results.