Part of Advances in Neural Information Processing Systems 16 (NIPS 2003)
Jun Suzuki, Yutaka Sasaki, Eisaku Maeda
This paper devises a novel kernel function for structured natural language data. In the ﬁeld of Natural Language Processing, feature extraction consists of the following two steps: (1) syntactically and semantically analyzing raw data, i.e., character strings, then representing the results as discrete structures, such as parse trees and dependency graphs with part-of-speech tags; (2) creating (possibly high-dimensional) numerical feature vectors from the discrete structures. The new kernels, called Hier- archical Directed Acyclic Graph (HDAG) kernels, directly accept DAGs whose nodes can contain DAGs. HDAG data structures are needed to fully reﬂect the syntactic and semantic structures that natural language data inherently have. In this paper, we deﬁne the kernel function and show how it permits efﬁcient calculation. Experiments demonstrate that the proposed kernels are superior to existing kernel functions, e.g., se- quence kernels, tree kernels, and bag-of-words kernels.