Sub-Microwatt Analog VLSI Support Vector Machine for Pattern Classification and Sequence Estimation

Part of Advances in Neural Information Processing Systems 17 (NIPS 2004)

Bibtex Metadata Paper

Authors

Shantanu Chakrabartty, Gert Cauwenberghs

Abstract

An analog system-on-chip for kernel-based pattern classification and se- quence estimation is presented. State transition probabilities conditioned on input data are generated by an integrated support vector machine. Dot product based kernels and support vector coefficients are implemented in analog programmable floating gate translinear circuits, and probabil- ities are propagated and normalized using sub-threshold current-mode circuits. A 14-input, 24-state, and 720-support vector forward decod- ing kernel machine is integrated on a 3mm3mm chip in 0.5m CMOS technology. Experiments with the processor trained for speaker verifica- tion and phoneme sequence estimation demonstrate real-time recognition accuracy at par with floating-point software, at sub-microwatt power.

1 Introduction

The key to attaining autonomy in wireless sensory systems is to embed pattern recognition intelligence directly at the sensor interface. Severe power constraints in wireless integrated systems incur design optimization across device, circuit, architecture and system levels [1]. Although system-on-chip methodologies have been primarily digital, analog integrated sys- tems are emerging as promising alternatives with higher energy efficiency and integration density, exploiting the analog sensory interface and computational primitives inherent in device physics [2]. Analog VLSI has been chosen, for instance, to implement Viterbi [3] and HMM-based [4] sequence decoding in communications and speech processing. Forward-Decoding Kernel Machines (FDKM) [5] provide an adaptive framework for gen- eral maximum a posteriori (MAP) sequence decoding, that avoid the need for backward recursion over the data in Viterbi and HMM-based sequence decoding [6]. At the core of FDKM is a support vector machine (SVM) [7] for large-margin trainable pattern classifi- cation, performing noise-robust regression of transition probabilities in forward sequence estimation. The achievable limits of FDKM power-consumption are determined by the number of support vectors (i.e., regression templates), which in turn are determined by the complexity of the discrimination task and the signal-to-noise ratio of the sensor inter- face [8].

                                   MVM                                                                              MVM                         24


               2                1


                          SUPPORT VECTORS               KERNEL

                                                                                                              s               x                                                                                                                     s                                                                                              i1                                                    30x24                    30x24                                                                                   K(x,x


                                                                       s )

                              x                                                                      f (x)                                         14                                                                               24x24                                                                                                          i1

                               INPUT

                                                                                                  NORMALIZATION

                                                                                       P                                      P                                                                                                 i1                                i24                                                                                                                        24x24                                                                                                                                        24                                                                                                       FORWARD DECODING                 j[n-1]                                                                                                           24      i[n]

                                   Figure 1: FDKM system architecture.

In this paper we describe an implementation of FDKM in silicon, for use in adaptive se- quence detection and pattern recognition. The chip is fully configurable with parameters directly downloadable onto an array of floating-gate CMOS computational memory cells. By means of calibration and chip-in-loop training, the effect of mismatch and non-linearity in the analog implementation is significantly reduced. Section 2 reviews FDKM formulation and notations. Section 3 describes the schematic details of hardware implementation of FDKM. Section 4 presents results from experiments conducted with the fabricated chip and Section 5 concludes with future directions.

2 FDKM Sequence Decoding

FDKM recognition and sequence decoding are formulated in the framework of MAP (max- imum a posteriori) estimation, combining Markovian dynamics with kernel machines. The MAP forward decoder receives the sequence X[n] = {x[1], x[2], . . . , x[n]} and pro- duces an estimate of conditional probability measure of state variables q[n] over all classes i 1, .., S, i[n] = P (q[n] = i | X[n]). Unlike hidden Markov models, the states directly encode the symbols, and the observations x modulate transition probabilities be- tween states [6]. Estimates of the posterior probability i[n] are obtained from estimates of local transition probabilities using the forward-decoding procedure [6]

                                                        S                                                                      P                                                i[n] =                      ij [n] j [n - 1]                                                      (1)                                                          j=1

where Pij[n] = P (q[n] = i | q[n - 1] = j, x[n]) denotes the probability of making a transition from class j at time n - 1 to class i at time n, given the current observation vector x[n]. Forward decoding (1) expresses first order Markovian sequential dependence of state probabilities conditioned on the data. The transition probabilities Pij[n] in (1) attached to each outgoing state j are obtained by normalizing the SVM regression outputs fij(x):

                                          Pij[n] = [fij(x[n]) - zj[n]]+                                                                       (2)



                                                              Vdd


                                                               M4

                                                                      A                                          V                                                         V                                               g ref                                                 g                                V                       M1                             V                      M2                                     c                                                      c

                                                               M3                                                        C                                                            B                                          V                                                         V                     I                                               tunn                                                  tunn                 out

                   Iin                                                                                (a)

                                                                                             (x.x )2                                                                                    Vdd                  s                               x                                                              M7                       M9

                                                                                               M10

                                                         M8                                                                           Vbias                               M5               M6

                                                                           (b)                            sK(x, x )                                                                                                               ij              s

Figure 2: Schematic of the SVM stage. (a) Multiply accumulate cell and reference cell for the MVM blocks in Figure 1. (b) Combined input, kernel and MVM modules.

where [.]+ = max(., 0). The normalization mechanism is subtractive rather than divisive, with normalization offset factor zj[n] obtained using a reverse-waterfilling criterion with respect to a probability margin [10],

                                                [fij(x[n]) - zj[n]]+ = .                                                       (3)                                                i

Besides improved robustness [8], the advantage of the subtractive normalization (3) is its amenability to current mode implementation as opposed to logistic normalization [11] which requires exponentiation of currents. The SVM outputs (margin variables) fij(x) are given by: N f s K ij (x) = (x, x ij s) + bij (4) s

where K(, ) denotes a symmetric positive-definite kernel1 satisfying the Mercer condi- tion, such as a Gaussian radial basis function or a polynomial spline [7], and xs[m], m = 1, .., N denote the support vectors. The parameters s in (4) and the support vectors x ij s[m] are determined by training on a labeled training set using a recursive FDKM procedure de- scribed in [5].

3 Hardware Implementation

A second order polynomial kernel K(x, y) = (x.y)2 was chosen for convenience of im- plementation. This inner-product based architecture directly maps onto an analog compu- tational array, where storage and computation share common circuit elements. The FDKM

 1K(x, y) = (x).(y). The map () need not be computed explicitly, as it only appears in inner-product form.



                          f [n]                                        Vdd     Vdd    Vdd    Vdd                                                                                        ij                             i[n]

                                                                 M6                            M9                  Aij                   P [n]                                              ij                                   M7       M8                                                                                                                               M4


                          M2       M3                                                                                                         M5                         M1                                                                              Vref                                                                              j[n-1]

                Figure 3: Schematic of the margin propagation block.

system architecture is shown in Figure 1. It consists of several SVM stages that generates state transition probabilities Pij[n] modulated by input data x[n], and a forward decoding block that performs maximum a posteriori (MAP) estimation of the state sequence i[n].