{"title": "Multilayer Neural Networks: One or Two Hidden Layers?", "book": "Advances in Neural Information Processing Systems", "page_first": 148, "page_last": 154, "abstract": null, "full_text": "Multilayer neural networks: \none or two  hidden layers? \n\nG.  Brightwell \n\nDept of Mathematics \nLSE,  Houghton Street \n\nLondon  WC2A 2AE,  U.K. \n\nc.  Kenyon,  H.  Paugam-Moisy \n\nLIP,  URA  1398  CNRS \n\nENS  Lyon,  46  alIee  d'Italie \n\nF69364  Lyon  cedex,  FRANCE \n\nAbstract \n\nWe study the number of hidden layers required by a multilayer neu(cid:173)\nral network with threshold  units to compute a function  f  from n d \nto {O, I}.  In dimension d  =  2,  Gibson  characterized  the functions \ncomputable with just one hidden layer, under  the assumption that \nthere  is no  \"multiple intersection  point\"  and that f  is only defined \non a compact set.  We consider the restriction of f  to the neighbor(cid:173)\nhood of a  multiple intersection  point or of infinity,  and give  neces(cid:173)\nsary  and sufficient  conditions for  it  to  be  locally computable with \none  hidden  layer.  We  show  that  adding  these  conditions  to  Gib(cid:173)\nson's  assumptions  is  not  sufficient  to  ensure  global  computability \nwith one hidden layer,  by exhibiting a new  non-local configuration, \nthe  \"critical cycle\",  which  implies  that f  is  not  computable  with \none  hidden  layer. \n\n1 \n\nINTRODUCTION \n\nThe number of hidden layers is a crucial parameter for the architecture of multilayer \nneural networks.  Early research,  in the 60's,  addressed  the problem of exactly real(cid:173)\nizing Boolean functions with binary networks or binary multilayer networks.  On the \none  hand, more recent  work focused  on  approximately realizing real functions  with \nmultilayer neural networks with one hidden layer [6,  7,  11]  or with two hidden units \n[2].  On  the  other  hand , some  authors  [1,  12]  were  interested  in  finding  bounds  on \nthe architecture of multilayer networks for exact realization of a finite  set of points. \nAnother  approach is  to search  the  minimal architecture  of multilayer networks for \nexactly realizing real functions, from nd to {O,  I}.  Our work , of the latter kind, is  a \ncontinuation of the effort  of [4,  5,  8,  9]  towards characterizing  the  real dichotomies \nwhich  can  be  exactly  realized  with a  single  hidden  layer  neural  network  composed \nof threshold  units. \n\n\fMultilayer Neural Networks:  One or Two Hidden Layers? \n\n149 \n\n1.1  NOTATIONS  AND  BACKGROUND \n\nA finite set of hyperplanes {Hd1<i<h  defines  a partition of the d-dimensional space \ninto  convex  polyhedral  open  regIons,  the  union  of the  Hi'S  being  neglected  as  a \nsubset  of  measure  zero.  A  polyhedral  dichotomy  is  a  function  I  : R d  --t  {O, I}, \nobtained by associating a class, equal to 0 or to 1, to each of those regions.  Thus both \n1-1 (0)  and 1-1 (1)  are unions of a finite  number of convex polyhedral open regions. \nThe  h  hyperplanes  which  define  the  regions  are called  the  essential hyperplanes of \nI.  A  point  P  is  an  essential  point if it is  the  intersection  of some  set  of essential \nhyperplanes. \n\nIn  this  paper,  all  multilayer  networks are  supposed  to  be  feedforward  neural  net(cid:173)\nworks  of threshold  units,  fully  interconnected  from  one  layer  to the  next,  without \nskipping interconnections.  A network is said to  realize a function I: Rd --t to, 1} if, \nfor  an input vector x, the network output is equal to I(x), almost everywhere in Rd . \nThe functions  realized  by our multilayer networks  are the polyhedral dichotomies. \n\nBy definition of threshold units, each unit of the first hidden layer computes a binary \nfunction  Yj  of the  real  inputs  (Xl, . .. ,Xd).  Therefore,  subsequent  layers  compute \na  Boolean function.  Since  any  Boolean function  can be  written  in  DNF -form, two \nhidden  layers  are  sufficient  for  a  multilayer network  to  realize  any  polyhedral  di(cid:173)\nchotomy.  Two  hidden  layers  are  sometimes also  necessary,  e.g.  for  realizing  the \n\"four-quadrant\"  dichotomy which generalizes  the  XOR function  [4]. \nFor  all  j, the /h  unit of the first  hidden  layer can  be  seen  as  separating  the space \n:  2::=1 WijXi  = OJ.  Hence  the  first  hidden  layer necessarily \nby  the hyperplane  Hj \ncontains  at  least  one  hidden  unit  for  each  essential  hyperplane  of I.  Thus  each \nregion  R  can  be  labelled  by  a  binary  number  Y = (Y1 , ... ,Yh)  (see  [5]).  The  /h \ndigit  Yj  will be denoted  by  Hj(R). \n\nUsually  there  are  fewer  than  2h  regions  and  not  all  possible  labels  actually exist. \nThe  Boolean  family BJ  of a  polyhedral dichotomy I  is  defined  to  be  the  set  of all \nBoolean functions  on h  variables which  are equal  to I  on  all  the existing  labels. \n\n1.2  PREVIOUS RESULTS \n\nIt is straightforward that all polyhedral dichotomies which have at least one linearly \nseparable  function  in  their  Boolean  family  can  be  realized  by  a  one-hidden-Iayer \nnetwork.  However  the converse  is  far  from  true.  A counter-example was  produced \nin  [5]:  adding  extra  hyperplanes  (i.e.  extra  units  on  the  first  hidden  layer)  can \neliminate the need for a second hidden layer.  Hence the problem of finding a minimal \narchitecture for  realizing dichotomies cannot be reduced  to the neural computation \nof Boolean functions .  Finding a generic description of all the polyhedral dichotomies \nwhich can be realized exactly by a one-hidden-Iayer network is still an open problem. \nThis paper is  a  new step towards its resolution. \n\nOne approach consists of finding geometric configurations which imply that a func(cid:173)\ntion is not realizable with a single hidden layer.  There are three known such geomet(cid:173)\nric  configurations:  the  XOR-situation,  the  XOR-bow-tie  and  the  XOR-at-infinity \n(see  Figure  1) . \n\nA  polyhedral dichotomy  is  said  to  be  in  an  XOR-situation  iff one  of its essential \nhyperplanes  H  is  inconsistent,  i.e.  if there  are  four  regions  B, B', W, W' such  that \nBand B' are in  class  1,  W  and W'  are in class  0,  Band W'  are  on one side of H, \nB' and  Ware on  the other side  of H, and Band Ware adjacent  along H, as  well \nas  B' and W'. \n\n\f150 \n\nG.  Brightwell, C.  Kenyon and H.  Paugam-Moisy \n\nGiven  a  point  P, two regions  containing P  in their closure  are  called  opposite  with \nrespect  to  P  if they  are  in different  halfspaces  w.r.t.  all essential  hyperplanes going \nthrough  P.  A  polyhedral dichotomy is  said to be  in  an  XOR-bow-tie iff there exist \nfour  distinct  regions  B,B', W, W',  such  that  Band  B',  both  in  class  1  (resp.  W \nand W',  both in class  0),  are opposite with respect  to point  P. \nThe third configuration is the XOR-at-infinity, which is analogous to the XOR-bow(cid:173)\ntie  at  a  point  00  added  to n d.  There exist  four  distinct  unbounded  regions  B, B' \n(in  class  1),  W, W'  (in  class  0)  such  that,  for  every  essential  hyperplane  H,  either \nall of them are on the same side of H  (e.g.  the horizontal line),  or Band B'  are on \nopposite sides  of H, and  Wand W'  are on opposite sides of H  (see  [3]) . \n\nB \n\nB' \n\nFigure 1:  Geometrical representation of XOR-situation, XOR-bow-tie and XOR-at(cid:173)\ninfinity in  the plane  (black  regions  are in class  1,  grey  regions  are in  class  0). \n\nTheorem 1  If a  polyhedral  dichotomy  I,  from  n d  to  {O,  I},  can  be  realized  by  a \none-hidden-layer  network,  then  it cannot  be  in  an  XOR-situation,  nor in  an  XOR(cid:173)\nbow-tie,  nor in  an  XOR-at-infinity. \n\nThe proof can  be found  in  [5]  for  the XOR-situation, in  [13]  for  the  XOR-bow-tie, \nand in  [5]  for  the XOR-at-infinity. \n\nAnother  research  direction,  implying  a  function  is  realizable  by  a  single  hidden \nlayer network,  is  based on  the universal  approximator property of one-hidden-Iayer \nnetworks,  applied  to  intermediate  functions  obtained  constructively  adding  extra \nhyperplanes to the essential hyperplanes of f.  This direction was explored by Gibson \n[9],  but there are virtually no  results known beyond two dimensions.  Gibson's result \ncan  be  reformulated as  follows: \nTheorem 2  II a polyhedral  dichotomy I  is  defined on  a  compact subset of n 2 ,  if I \nis  not  in  an  XOR-situation,  and  if no  three  essential hyperplanes  (lines)  intersect, \nthen f  is  realizable  with  a  single  hidden  layer network. \n\nUnfortunately Gibson's proof is  not  constructive,  and extending it to remove some \nof the  assumptions or  to go  to  higher  dimensions  seems  challenging.  Both  XOR(cid:173)\nbow-tie  and  XOR-at-infinity  are  excluded  by  his  assumptions  of compactness  and \nno  multiple intersections.  In  the  next  section,  we  explore  the  two  cases  which  are \nexcluded  by  Gibson's  assumptions.  We  prove  that,  in  n2 ,  the  XOR-bow-tie  and \nthe XOR-at-infinity  are  the only restrictions  to local  realizability. \n\n2  LOCAL REALIZATION  IN 1(,2 \n\n2.1  MULTIPLE INTERSECTION \nTheorem 3  Let I  be  a polyhedral dichotomy on n 2  and let P  be  a point of multiple \nintersection.  Let Cp  be  a  neighborhood  of P  which  does  not intersect  any  essential \nhyperplane other than those going through  P .  The  restriction of I  to Cp  is realizable \nby  a  one-hidden-layer  network iff I  is  not  in  an  XOR-bow-tie  at P. \n\n\fMultilayer Neural Networks:  One or 1Wo Hidden Layers? \n\n151 \n\nThe  proof is  in  three steps:  first,  we  reorder  the  hyperplanes  in  the  neighborhood \nof P,  so  as  to get  a  nice  looking system  of inequalities;  second,  we  apply  Farkas' \nlemma; third,  we  show  how  an  XOR-bow-tie can be deduced. \n\nProof:  Let  P  be  the  intersection  of k  2':  3  essential  hyperplanes  of f.  All  the \nhyperplanes  which  intersect  at  P  can  be  renumbered  and  re-oriented  so  that  the \nintersecting  hyperplanes  are  totally  ordered.  Thus  the  label  of the  regions  which \nhave  the  point  P  in  their  closure  is  very  regular.  If one drops  all  the digits corre(cid:173)\nsponding  to  the essential  hyperplanes  of f  which  do  not  contain  P,  the  remaining \npart of the region labels are exactly like  those of Figure 2. \n\nH, \n\nA= \n\nfA: \n0 \n0 \n\n0 \n\nH. \n\nH7 \n\nH5 \n\nfl \n\nf2 \n\n0 \nf2 \n\n0 \n\n0 \n0 \n\n0 \nfA: \n\n-fl \n\n-f2 \n\n-fA: \n\nfA:-l \n\nfA: \n\nfA:+! \n0 \n\nfk+2 \n\nfA:+! \n\n-fA:+l \n\n-fA:+2 \n\nf2A:-l \n\n-f21:-1 \n\n0 \n\n-f2\" \n\nFigure 2:  Labels of the regions  in  the neighborhood  of P, and  matrix A. \n\nThe problem of finding a one-hidden-Iayer network which realizes f  can be rewritten \nas a system of inequalities.  The unknown variables are the weights Wi  and threshold \n()  of the  output  unit.  Let  (S)  denote  the  subsystem of inequalities obtained from \nthe  2k  regions  which  have  the  point  P  in  their closure.  The  regular  numbering of \nthese  2k  regions  allows us  to write  the system as follows \n\nI l<i<k \n\nk + 1 ~ i  ~ 2k \n\n(S) \n\n[ 2:~=1 Wm  < \n() \n2:~=1 Wm  > \n() \n[ I:r=H+1 Wm  < \n2:m=i-k+l Wm  > \n\n() \n\n() \n\nif class 0 \nif class  1 \n\nif class  0 \nif class  1 \n\nThe system  (S)  can  be rewritten  in  the  matrix form Ax ~ b,  where \n\nx T  =  [Wl,W2, ... ,Wk,()]  and  bT  =  [b1,b2, ... ,bk,bk+1, ... ,b2k] \n\nwhere  bi  = -f, for  all  i,  and  f  is  an arbitrary small positive number.  Matrix A can \nbe seen in figure  2, where  fj  = +1 or -1 depending on whether region j  is in class 0 \nor  1.  The next  step is  to apply  Farkas lemma, or  an equivalent version  [10],  which \ngives  a  necessary  and sufficient  condition for  finding a  solution of Ax ~ b. \nLemma 1  (Farkas  lemma)  There  exists  a  vector x  E  nn  such  that  Ax  ~ b  iff \nthere  does  not exist  a  vector Y E nm  such  that  y T A  =  0,  y  2':  0  and y T b < O. \nAssume that Ax ~ b is not solvable.  Then,  by  Lemma 1 for  n = k + 1 and  m = 2k, \na  vector  y  can  be found  such  that  y  2':  O.  Since in  addition  y T b =  - f  2:~~1 Yj,  the \ncondition  y T b < 0 implies  (3jt)  Y31  > O.  But  y T A = 0 is equivalent  to  the system \n\n\f152 \n\n(t:)  of k + 1 equations \n\nG.  Brightwell.  C.  Kenyon and H.  Paugam-Moisy \n\n(t:) {  ~::; i  ::;  k \nz=k+1 \n\n\"i+k-l \ni..Jm=i  Ym/clau  0 \n,,2k \ni..Jm=l Ym/clau  0 \n\n\"i+k-l \ni..Jm=i  Ym/clau  1 \n,,2k \nL..m=l Ym/clau  1 \n\nSince  (3jt)  Yil  >  0, \nthe  last  equation  (Ek+l)  of  system  (t:)  implies  that \n(3h  /  class (region  jt)  ::/=  class(region h))  Yh  >  O.  Without  loss  of generality, \nassume that it and h  are less  than  k  and that region it is  in class 0 and  region h \nis  in class  1.  Comparing two successive  equations of (t:),  for  i  < k,  we  can write \n(VA  E {O, 1})  L(E.+d Ym/clau  >.  =  L(E.) Ym/clau  >.  - Yi/clau  >.  + Yi+k/clau  >. \n\nSince  Yit  > 0  and region it is  in  class  0,  the  transition from  Ejl  to  EiI+1  implies \nthat  Yil +k  =  Yit  >  0  and  region  it + k,  which  is  opposite  to  region  it,  is  also \nin  class  O.  Similarly,  the  transition from  Eh  to  Eh +1  implies that  both  opposite \nregions  h  + k  and h  are  in class  1.  These conditions are  necessary  for  the system \n(t:)  to  have  a  non-negative solution and  they  correspond  exactly  to  the definition \nof an  XOR-bow-tie at point  P.  The converse  comes from theorem  1. \n\u2022 \n\n2.2  UNBOUNDED  REGIONS \n\nIf no two essential hyperplanes are parallel, the case of unbounded regions is exactly \nthe same as  a  multiple intersection.  All  the  unbounded  regions  can  be  labelled  as \non  figure  2.  The  same  argument  holds  for  proving  that,  if the  local  system  (S) \nAx ::;  b is  not solvable,  then  there  exists  an  XOR-at-infinity.  The case  of parallel \nhyperplanes is more intricate because matrix A is more complex.  The proof requires \na  heavy case-by-case  analysis and cannot  be given  in full  in  this paper  (see  [3]) . \n\nTheorem 4  Let f  be  a polyhedral dichotomy on 'R,2 .  Let Coo  be  the  complementary \nregion  of the  convex hull  of the  essential points  of f\u00b7  The  restriction  of f  to Coo  is \nrealizable  by  a  one-hidden-layer network iff f  is  not in  an  XOR-at-in/inity. \n\nFrom theorems 3 and 4 we  can deduce  that a  polyhedral dichotomy is  locally real(cid:173)\nizable in 'R,2  by  a one-hidden-Iayer network iff f  has no XOR-bow-tie and no XOR(cid:173)\nat-infinity.  Unfortunately  this  result  cannot  be  extended  to  the  global  realization \nof f  in  'R, 2 because  more intricate distant configurations can involve contradictions \nin  the  complete system  of inequalities.  The object  of the  next  section  is  to  point \nout  such  a  situation  by  producing  a  new  geometric configuration,  called  a  critical \ncycle, which  implies that  f  cannot be realized  with one  hidden  layer. \n\n3  CRITICAL CYCLES \n\nIn contrast to section  2, the results  of this section hold for  any dimension d 2::  2. \nWe  first  need  some definitions.  Consider a  pair of regions  {T, T'} in the same class \nand  which  both  contain  an  essential  point  P  in  their  closure.  This  pair  is  called \ncritical with respect  to P  and H  if there is  an essential hyperplane H  going through \nP  such  that T'  is  adjacent  along H  to the  region  opposite  to T .  Note  that T  and \nT'  are  both on the same side of H. \nWe  define  a  graph G  whose  nodes  correspond  to the  critical  pairs of regions  of f. \nThere  is  a  red  edge  between  {T, T'}  and  {U, U'}  if the  pairs,  in  different  classes, \nare  both critical  with respect  to the  same point  (e.g.,  {Bp, Bp} and  {Wp, Wi>}  in \nfigure  3) .  There  is  a  green  edge  between  {T, T'}  and  {U, U'}  if the pairs are  both \ncritical with respect  to the same hyperplane H, and either the two pairs are on the \n\n\fMultilayer Neural Networks:  One or Two Hidden Layers? \n\n153 \n\nsame side  of H,  but  in  different  classes  (e.g.,  {W p, Wp} and  {BQ, BQ})'  or  they \nare on  different  sides of H,  but in the same class  (e.g.,  {Bp,Bp}  and  {BR, Bk})\u00b7 \n\nDefinition 1  A  critical cycle  is  a  cycle  in  graph  G,  with  alternating colors. \n\n....  ; \n\nP \n\nP. \n\nP \n\nP, \n\n-i B  B'  }--.. -.------- {Y  Y'} \n\nf \nI \nI \nI  {B Q, B'Q~- \u00b7-\u00b7-{Y Q ,Y'Q}', \nI \nI \nI \nI \n\" {  B R, B'R}l---{Y R ,Y'R}\" \n\n~ \n\n; \n\nred edge \ngreen edge \n\nFigure 3:  Geometrical configuration and graph of a critical cycle, in the plane.  Note \nthat one can  augment the figure  in  such  a  way  that there  is  no  XOR-situation,  no \nXOR-bow-tie,  and  no XOR-at-infinity. \n\nTheorem 5  If a  polyhedral  dichotomy  I,  from n-d  to  {O, I},  can  be  realized  by  a \none-hidden-layer network,  then  it cannot  have  a  critical cycle. \n\nLet  H  be  the hyperplane  associated  to one of the hidden units.  For T  a  region,  let \n\nProof:  For the sake of simplicity, we will restrict ourselves to doing the proof for a \ncase similar to the example figure  3, with notation as given in that figure,  but with(cid:173)\nout any restriction on the dimension d of I.  Assume, for  a contradiction, that I has \na critical cycle and can be realized by a one-hidden-Iayer network.  Consider the sets \nof regions  {Bp, Bp , BQ , BQ, BR, Bk}  and  {Wp, Wp, WQ , WQ, WR , WR}. Consider \nthe  regions  defined  by  all  the  hyperplanes  associated  to the hidden  layer  units  (in \ngeneral,  these  hyperplanes are a large superset  of the essential hyperplanes).  There \nis  a  region  b p  ~ B p,  whose  border  contains  P  and  a  (d  - 1 )-dimensional subset \nof H 1.  Similarly we  can  define  bp, .. . , bR, Wp , . . . ,wR'  Let  B  be  the  set  of such \nregions  which  are in class  1 and  W  be  the set of such  regions  in class 0. \nH(T)  be the digit  label of T  w.r.t.  H,  i.e.  H(T)  =  1 or \u00b0 according  to whether T \nis  above or  below  H  (cf.  section  1.1) .  We  do  a case-by-case  analysis. \nIf H  does  not  go  through  P,  then  H(bp)  = H(b'p)  = H(wp)  = H(wp);  similar \nequalities  hold  for  lines  not  going  through  Q or  R.  If H  goes  through  P  but  is \nnot  equal  to  H1  or  to  H2 ,  then,  from  the  viewpoint  of H,  things are  as  if b'p  was \nopposite  to bp,  and  w'p  was opposite to Wp,  so the two regions of each  pair are on \ndifferent sides of H, and so H (bp )+H(b'p) = H( wp )+H(w'p) = 1; similar equalities \nhold for  hyperplanes  going  through  Q  or  R.  If H  = H 1,  then  we  use  the fact  that \nthere  is  a  green  edge  between  {W p, Wp} and  {BQ, BQ},  meaning  in  the  case  of \nthe figure  that  all four  regions  are  on  the same side of H 1  but in  different  classes. \nThen  H(bp) +H(b'p) +H(bQ) +H(b'q) =  H(wp) +H(wp)+ H(wQ) +H(w'q).  In \nfact,  this equality would also hold in the other case,  as can easily be checked.  Thus \nfor  all  H,  we  have  L,bEB H(b)  =  L,wEW H(w).  But such  an equality is  impossible: \nsince each  b is  in class 1 and each  w  is  in class 0, this implies a contradiction in the \nsystem of inequalities and I  cannot  be realized  by  a one-hidden-Iayer network. \nObviously  there  can  exist  cycles  of length  longer  than  3,  but  the  extension  of the \nproof is straightforward. \n_ \n\n\f154 \n\nG. Brightwell,  C.  Kenyon and H.  Paugam-Moisy \n\n4  CONCLUSION AND  PERSPECTIVES \nThis paper makes partial progress towards characterizing functions which can be re(cid:173)\nalized by a one-hidden-Iayer network, with a particular focus on dimension 2.  Higher \ndimensions  are  more  challenging,  and  it  is  difficult  to  even  propose  a  conjecture: \nnew  cases  of inconsistency  emerge in  subspaces  of intermediate dimension.  Gibson \ngives  an example of an  inconsistent  line  (dimension  1)  resulting  of its  intersection \nwith two hyperplanes  (dimension 2)  which  are not inconsistent  in n3. \nThe  principle  of using  Farkas  lemma for  proving  local  realizability  still  holds  but \nthe matrix A  becomes more and more complex.  In n d ,  even  for  d = 3,  the labelling \nof the regions,  for  instance  around  a  point  P  of multiple intersection,  can  become \nvery complex. \n\nIn  conclusion,  it  seems  that  neither  the  topological  method  of Gibson,  nor  our \nalgebraic point of view,  can easily  be extended  to higher dimensions.  Nevertheless, \nwe  conjecture  that  in  dimension  2,  a  function  can  be  realized  by  a  one-hidden(cid:173)\nlayer  network iff it does not have any of the  four  forbidden  types of configurations: \nXOR-situation, XOR-bow-tie,  XOR-at-infinity, and critical cycle. \n\nAcknowledgements \nThis work  was supported  by  European  Esprit III Project  nO  8556,  NeuroCOLT. \n\nReferences \n\n[1]  E.  B.  Baum.  On  the  capabilities  of multilayer  perceptrons.  Journal  of Complexity, \n\n4:193-215,  1988. \n\n[2]  E.  K.  Blum and  L.  K.  Li.  Approximation  theory  and feedforward  networks.  Neural \n\nNetworks,  4(4):511-516,  1991. \n\n[3]  G.  Brightwell,  C. Kenyon,  and H.  Paugam-Moisy.  Multilayer neural networks:  one or \n\ntwo hidden layers?  Research Report  96-37,  LIP,  ENS  Lyon,  1996. \n\n[4]  M.  Cosnard,  P . Koiran,  and  H.  Paugam-Moisy.  Complexity  issues  in neural  network \ncomputations.  In I.  Simon,  editor,  Proc.  of LATIN'92,  volume  583  of LNCS,  pages \n530-544.  Springer Verlag,  1992. \n\n[5]  M.  Cosnard,  P.  Koiran,  and H.  Paugam-Moisy.  A  step towards  the frontier  between \none-hidden-Iayer  and  two-hidden  layer  neural networks.  In I.  Simon,  editor,  Proc.  of \nIJCNN'99-Nagoya, volume  3,  pages 2292-2295.  Springer Verlag,  1993. \n\n[6]  G. Cybenko.  Approximation by superpositions of a sigmoidal function.  Math .  Control, \n\nSignal Systems,  2:303-314,  October 1988. \n\n[7]  K.  F\\mahashi.  On  the  approximate  realization  of  continuous  mappings  by  neural \n\nnetworks.  Neural  Networks,  2(3):183-192,  1989. \n\n[8]  G . J. Gibson.  A combinatorial approach to understanding perceptron decision regions. \n\nIEEE  Trans.  Neural  Networks,  4:989--992,  1993. \n\n[9]  G.  J .  Gibson.  Exact  classification  with  two-layer  neural  nets.  Journal  of Computer \n\nand System Science, 52(2):349-356,  1996. \n\n[10]  M.  Grotschel,  1. Lovsz,  and  A.  Schrijver.  Geometric  Algorithms and  Combinatorial \n\nOptimization.  Springer-Verlag,  Berlin,  Heidelberg,  1988. \n\n[11]  K.  Hornik,  M.  Stinchcombe,  and  H.  White.  Multilayer  feedforward  networks  are \n\nuniversal  approximators.  Neural  Networks,  2(5):359-366,  1989. \n\n[12]  S.-C. Huang and Y.-F. Huang. Bounds on the number of hidden neurones in multilayer \n\nperceptrons.  IEEE  Trans.  Neural Networks,  2:47-55,  1991. \n\n[13]  P.  J.  Zweitering.  The  complexity of multi-layered perceptrons.  PhD thesis,  Technische \n\nUniversiteit  Eindhoven,  1994. \n\n\f", "award": [], "sourceid": 1239, "authors": [{"given_name": "Graham", "family_name": "Brightwell", "institution": null}, {"given_name": "Claire", "family_name": "Kenyon", "institution": null}, {"given_name": "H\u00e9l\u00e8ne", "family_name": "Paugam-Moisy", "institution": null}]}