{"title": "Speech Production Using A Neural Network with a Cooperative Learning Mechanism", "book": "Advances in Neural Information Processing Systems", "page_first": 232, "page_last": 239, "abstract": null, "full_text": "232 \n\nSPEECH PRODUCTION USING A NEURAL \n\nNETWORK WITH A COOPERATIVE \n\nLEARNING MECHANISM \n\nInternational Institute for Advanced Study of Social Information Science, \n\nMitsuo Komura \n\nAkio Tanaka \n\n140 Miyamoto, Numazu-shi Shizuoka, 410-03 Japan \n\nFujitsu Limited \n\nABSTRACT \n\nWe propose a new neural network model and its learning \nalgorithm. The proposed neural network consists of four layers \n- input, hidden, output and final output layers. The hidden and \noutput layers are multiple. Using the proposed SICL(Spread \nPattern Information and Cooperative Learning) algorithm, it \nis possible to learn analog data accurately and to obtain \nsmooth outputs. Using this neural network, we have developed \na speech production system consisting of a phonemic symbol \nproduction subsystem and a speech parameter production \nsubsystem. We have succeeded in producing natural speech \nwaves with high accuracy. \n\nINTRODUCTION \n\nOur purpose is to produce natural speech waves. In general, speech synthesis \nby rule is used for producing speech waves. However, there are some \ndifficulties in speech synthesis by rule. First, the rules are very complicated. \nSecond, extracting a generalized rule is difficult. Therefore, it is hard to \nsynthesize a natural speech wave by using rules. We use a neural network for \nproducing speech waves. Using a neural network, it is possible to learn speech \nparameters without rules. (Instead of describing rules explicitly, selecting a \ntraining data set becomes an important subject.) In this paper, we propose a \nnew neural network model and its learning algorithm. Using the proposed \nneural network, it is possible to learn and produce analog data accurately. We \napply the network to a speech production system and examine the system \nperformance. \n\nPROPOSED NEURAL NETWORK AND ITS LEARNING \n\nALGORITHM \n\nWe use an analog neuron-like element in a neural network. The element has a \nlogistic activation function presented by equation (3). As a learning algorithm, \n\n\fSpeech Production Using A Neural Network \n\n233 \n\nthe BP(Back Propagation) method is widely used. By using this method it is \npossible to learn the weighting coefficients of the units whose target values are \nnot given directly. However, there are disadvantages. First, there are singular \npoints at 0 and 1 (outputs of the neuron-like element). Second, finding the \noptimum values of learning constants is not easy. We have proposed a new \nneural network model and its learning algorithm to solve this problem. The \nproposed SICL(Spread Pattern Information and Cooperative Learning) method \nhas the following features. \n(a)The singular points of the BP method are removed. (Outputs are not simply \no or 1.) This improves the convergence rate. \n(b)A spread pattern information(SI) learning algorithm is proposed. In the SI \nlearning algorithm, the weighting coefficients from the hidden layers to the \noutput layers are fixed to random values. Pattern information is spread over \nthe memory space of the weighting coefficients. As a result, the network can \nlearn analog data accurately. \n(c)A cooperative learning(CL) algorithm is proposed. This algorithm makes it \npossible to obtain smooth and stable output. The CL system is shown in Fig.1 \nwhere D(L) is a delay line which delays L time units. \nIn the following sections, we define a three-layer network, introduce the BP \nmethod, and propose the SICL method . \n\n. ' \n.. ' \n\n\", \n..... , ..\u2022.\u2022. \n\n.... \". '. -'. \n\nk=K \n\n~,;\" \n\nI ' ; ; \n\n,:yi \n\n,/ \n\nInput layar \n\nHlddan layar. \n\nOutput layar. \n\nFinal output layar \n\nFigure 1. Cooperative Learning System \n\n(Speech Parameter I Phonemic Symbol Production Subsystem) \n\n\f234 \n\nKomura and Tanaka \n\nTHREE\u00b7LA YER NETWORK \nWe define a three-layer network that has an input layer, a hidden layer, and \nan output layer. The propagation of a signal from the input layer to the hidden \nlayer is represented as follows. \n\nUj = 'EiwIHijXi, \n\nYj = f(Uj -OJ) , \n\n(1) \n\nwhere i= 1,2, ... ,1; j= 1,2, ... ,J and Xi is an input, Yj the output of the hidden \nlayer, and OJ a threshold value. The propagation of a signal from the hidden \nlayer to the output layer is represented as follows. \n\n(2) \nwhere Zk is the output of the output layer and k= 1,2, ... ,K. The activation \nfunction f(u) is defined by \n\nf(u) = (1+exp(-u+O\u00bb-l. \n\n(3) \n\nSettingy = f(u), the derivation off(u) is then given by f' (u) = y(1-y). \nBACK PROPAGATION (BP) METHOD \n\nThe three-layer BP algorithm is shown as follows. The back propagation error, \n80 k(n) , for an output unit is given by \n\n(4) \n\nwhere n is an iteration number. \nThe back propagation error, 8Hin), for a hidden unit is given by \n\n8Hin) = ('Ek80k(n) WHOjk) f' (u)