{"title": "Classification of Multi-Spectral Pixels by the Binary Diamond Neural Network", "book": "Advances in Neural Information Processing Systems", "page_first": 1143, "page_last": 1150, "abstract": null, "full_text": "Classification of Multi-Spectral Pixels \n\nby the \n\nBinary Diamond Neural Network \n\nDepartment of Physics and CSTEA, Howard University, Washington,  DC  20059 \n\nYehuda Salu \n\nAbstract \n\nA new neural network, the Binary Diamond, is presented and its use \nas  a classifier is  demonstrated and evaluated. The network  is  of the \nfeed-forward  type.  It learns  from  examples  in  the 'one shot'  mode, \nand recruits new neurons as needed. It was tested on the problem of \npixel classification, and performed well.  Possible applications of the \nnetwork in associative memories are outlined. \n\n1 \n\nINTRODUCTION:  CLASSIFICATION  BY  CLUES \n\nClassification  is  a  process  by  which  an  item  is  assigned  to  a  class.  Classification  is \nwidely  used  in  the  animal  kingdom.  Identifying  an  item  as  food  is  classification. \nAssigning  words  to  objects,  actions,  feelings,  and  situations  is  classification.  The \npurpose  of  this  work  is  to  introduce  a  new  neural  network,  the  Binary  Diamond, \nwhich  can  be  used  as  a  general  purpose  classification  tool.  The  design  and \noperational  mode  of  the  Binary  Diamond  are  influenced  by  observations  of  the \nunderlying  mechanisms that take place in  human classification  processes. \n\nAn item to be classified consists of basic features. Any arbitrary combination of basic \nfeatures will be called a clue. Generally, an item will consist of many clues. Clues  are \nrelated not only to  the items which  contain them, but  also  to the classes.  Each class, \nthat resides in the memory, has a list of clues which are associated with it. These clues \n\n1143 \n\n\f1144 \n\nSalu \n\nare the basic building blocks of the classification rules. A classification rule for a class \nX would have the following general form: \n\nClassification  rule: If an  item  contains  clue Xl, or clue X2, ... , or clue Xn\u2022 and if it \ndoes not contain  clue  Xl. nor  clue X2,  ...\u2022  nor clue X m\u2022  it is classified as belonging \nto class X. \n\nClues  Xl \u2022... ,Xn  are  the  excitatory  clues  of  class  X,  and  clues  Xl, ... ,xmare  the \ninhibitory clues of class X. \n\nWhen classifying  an item,  we  frrst  identify the clues  that it  contains.  We then  match \nthese clues with  the classification rules, and fmd  the class  of the item. It may happen \nthat a certain item satisfies classification  rules of different classes. Some of the clues \nmatch  one  class,  while  others  match  another.  In  such  cases,  a  second  set  of rules, \ndisambiguation rules, are employed. These rules select one class out of those tagged \nby the  classification  rules.  The  disambiguation  rules  rely  on  a  hierarchy  that  exists \namong the clues. a hierarchy that may vary from  one classification scheme to another. \nFor example, in a  certain hierarchy clue A  is  considered more reliable than clue B, if \nit contains more features.  In  a  different  hierarchy scheme, the most  frequent  clue is \nconsidered  the  most  reliable.  In the  disambiguation  process,  the  most  reliable  clue, \nout of those that has actively contributed to the classification. is identified and serves \nas  the  pointer  to  the  selected  class.  This  classification  approach  will  be  called \nclassification by clues (CRC). \n\nThe  classification  rules  may be  'loaded'  into  our  memory  in  two  ways.  FIrst,  the \nprecise rules may be spelled out and recorded (e.g. 'A red light means stop').  Second, \nwe  may learn the classification rules from  examples presented to us,  utilizing  innate \ncommon sense learning mechanism. These mechanisms enable us to deduce from  the \nexamples  presented  to  us,  what  clues  should  serve  in  the  classification  rules  of the \nadequate  classes,  and  what  clues  have  no  specificity,  and  should  be  ignored.  For \nexample, by pointing to a red balloon and saying  red, an infant may associate each of \nthe stimuli red  and balloon as  pointers  to  the word  red.  After presenting a  red car, \nand  saying  red,  and  presenting  a  green  balloon  and  saying  green.  the  infant  has \nenough information to deduce that the stimulus  red is  associated with  the word red, \nand the stimulus balloon should not be classified as red. \n\n2 \n\nTHE  BINARY  DIAMOND \n\n2.1 \n\nSTRUCTURE \n\nIn order to perform a  CRe in  a  systematic way,  all  the clues  that are present in the \nitem  to  be  classified  have  to  be  identified  frrst,  and  then  compared  against  the \nclassification  rules.  The  Binary  Diamond  enables  carrying  these  tasks  fast  and \n\n\fClassification of Multi-Spectral Pixels by the Binary Diamond Neural Network \n\n1145 \n\neffectively. Assume that there are N different basic features in the environment. Each \nfeature  can be assigned  to a  certain bit  in  an N  dimensional  binary vector.  An  item \n\nwill be represented by turning-on (from the default value of \u00b0 to the value of 1)  all the \n\nbits that correspond to basic features,  that are  present in  the item.  The total number \nof  possible  clues  in  this  environment  is  at  most  2N.  One  way  to  represent  these \npossible  clues  is  by  a  lattice, in which  each  possible clue is  represented by one  node. \nThe Binary Diamond is  a lattice whose nodes represent clues. It is  arranged in layers. \nThe  frrst  (bottom)  layer  has  N  nodes  that  represent  the  basic  features  in  the \nenvironment. The second layer has N'(N-l)/2 nodes that represent clues consisting of \n2 basic features.  The K'th  layer  has  nodes  that  represent  clues,  which  consist  of K \nbasic  features.  Nodes  from  neighboring  layers  which  represent  clues  that  differ  by \nexactly one basic feature are connected by a line. Figure 1 is  a diagram  of the Binary \nDiamond for N = 4. \n\nFigure  1:  The  Binary  Diamond  of order  4.  The  numbers  inside  the  nodes  are  the \nbinary  codes  for  the  feature  combination  that  the  node  represents,  e.g  1  < = > \n(0,0,0,1),5< = >(0,1,0,1),14 < = >  (1,1,1,0),  15  < = >  (1,1,1,1). \n\n2.2 \n\nTHE  BINARY  DIAMOND  NEURAL  NElWORK \n\nThe  Binary Diamond can be turned  into  a  feed-forward  neural network  by  treating \neach node as  a  neuron, and each line as  a synapse leading from  a neuron in a  lower \nlayer (k) to a neuron in the higher layer (k + 1). All synaptic weights are set to 0.6, and \n\n\f1146 \n\nSalu \n\nall  thresholds  are  set  to  1,  in  a  standard  Pitts  McCulloch  neuron.  The  output  of a \nflring  neuron is  1.  An item is  entered into the network  by turning-on  the  neurons in \nthe  flrst  layer,  that  represent  the  basic  features  constituting  this  item.  Signals \npropagate forward one layer at a time tick,  and neurons stay active  for  one time tick. \nIt is easy to verify that all the clues that are part of the input item, and only such clues, \nwill be turned on as the signals  propagate in  the network. In other words, the network \nidentifies  all  the  clues  in  the  item  to  be  classified.  An  item  consisting  of  M  basic \nfeatures will activate  neurons  in the  fIrst  M  layers.  The activated neuron in  the  M'th \nlayer is the representation of the entire item. As an example, consider the input item \nwith feature vector (0,1,1,1), using the notations of figure  1. It is entered by activating \nneurons 1, 2, and 4 in the first layer. The signals will propagate to neurons 3, 5, 6,  and \n7, which represent all the clues that the input item contains. \n\n2.3 \n\nINCORPORATING  CLASS  INFORMATION \n\nEach  neuron  in  the  Binary  Diamond  represent  a  possible  clue  in  the  environment \nspun by N basic features. When an item is entered in the frrst layer, all the clues that it \ncontains activate their representing neurons in the upper layers. This  is  the first  step \nin the classification process.  Next, these clues have  to point to the  appropriate class, \nbased upon  the classification rule.  The possible  classes  are  represented by neurons \noutside  of  the  Binary  Diamond.  Let  x  denote  the  neuron,  outside  the  Binary \nDiamond, that represents class X.  An excitatory clue Xi (from the Binary Diamond) \nwill synapse onto x  with  a  synaptic weight of 1.  An inhibitory clue Xl  (in the Binary \nDiamond) will synapse onto x  with an inhibitory weight of -z, where z is  a very large \nnumber (larger than the maximum number of clues that may point to a  class).  This \narrangement ensures that the classification rule formulated  above is  carried out.  In \ncases of ambiguity, where a number of classes have been activated in the process,  the \nclass that was activated by the clue in  the highest layer will prevail. This clue  has the \nlargest number of features, as compared with the other clues that actively participated \nin the classification. \n\n2.4  GROWING  A  BINARY  DIAMOND \n\nA  possible limitation on the processes described in the two previous sections is that, if \nthere  are  many  basic  features  in  the  environment,  the  2N  nodes  of  the  Binary \nDiamond  may be  too  much  to  handle.  However,  in  practical  situations,  not  all  the \nclues  really occur,  and  there  is  no  need  to  actually  represent all  of them  by  nodes. \nOne way of taking advantage of this simplifying situation is to grow  the network one \nevent (a training item and its  classification)  at a  time. At the beginning,  there is just \nthe frrst layer with N neurons, that represent the N basic features.  Each event adds its \nneurons to the network,  in the exact positions that they  would occupy in the regular \n\n\fClassification of Multi-Spectral Pixels by the Binary Diamond Neural Network \n\n1147 \n\nBinary Diamond. A clue that has already  been represented in  previous events,  is not \nduplicated.  After  the  new  clues  of the  event  have  been  added  to  the  network,  the \ninformation about the relationships between clues and classes is updated. This is done \nfor  all  the clues that are contained in the new  event. The new neurons send synapses \nto  the  neuron  that  represent  the  class  of the  current  event.  Neurons  of the  current \nevent,  that took part in previous  events,  are  checked for  consistency.  If they  point  to \nother classes,  their synapses  are cut-off.  They have just lost their specificity.  It should \nbe noted that  there  is  no  need  to  present  an  event  more  than  one  time  for  it  to  be \ncorrectly  recorded  (' one  shot  learning').  A  new  event  will  never  adversely  interfere \nwith  previously recorded information. Neither the order of presenting the events, nor \nrepetitions in presenting them will affect  the final  structure of the  network.  Figure 2 \nillustrates how  a  Binary Diamond is  grown.  It  encodes the  information  contained  in \ntwo  events, each  having three basic features,  in an environment  that  has  four  basic \nfeatures. The first event belongs to class A, and the second to class B. \n\n(0,1,1,1)  -> A @ \n\nFigure 2.  Growing a Binary Diamond. Left: All the feature combinations of the three(cid:173)\nfeature item (0,1,1,1)  are represented by a 3'rd order Binary Diamond, which is grown \nfrom  the basic features  represented by  neurons  1,  2,  and  4.  All  these  combinations, \nmarked  by  a  wavy  background,  are,  for  the  time  being,  specific  clues  to  class  A. \nRight:  The  three-feature  item,  (1,1,1,0)  is  added,  as  another  3'rd  order  Binary \nDiamond. At  this point,  only neurons  l,3~,and 7 represent specific  clues  to class A. \nNeurons  8,10,12,  and  14  represent  specific  clues  to  class  B,  and  neurons  2,4,  and  6 \nrepresent non-specific clues. \n\n3 \n\nCLASSIFICATION  OF  MULTI-SPECfRAL  PIXELS \n\n3.1 \n\nTHE  PROBLEM \n\nSpectral  information  of  land  pixels,  which  is  collected  by  satellites,  is  used  in \npreparation of land cover  maps  and  similar  applications.  Depending  on  the  satellite \nand its instrumentation, the spectral information consists of the intensities  of several \n\n\f1148 \n\nSalu \n\nlight bands, usually in the visible and infra-red ranges, which have been reflected from \nthe  land  pixels.  One  method  of  classification  of such  pixels  relies  on  independent \nknowledge of the land cover of some pixels in the scene. These classified  pixels  serve \nas  the  training  set  for  a  classification  algorithm.  Once  the  algorithm  is  trained,  it \nclassifies the rest of the pixels. \n\nThe  actual  problem  described  here  involves  testing  the  Binary  Diamond  in  a  pixel \nclassification  problem.  The  tests  were  done  on  four  scenes  from  the  vicinity  of \nWashington  DC,  each  consisting  of  approximately  22,000  pixels.  The  spectral \ninformation  of each pixel  consisted of intensities  of four  spectral  bands,  as  collected \nby  the  Thematic  Mapper  of the  Landsat 4  satellite.  Ground  covers  of these  scenes \nwere determined independently by ground and aerial  surveys.  There were  17  classes \nof ground covers. The following list gives the number of pixels per class in one of the \nscenes. The distributions in the other scenes were similar. \n\n1)  water  (28).  2)  miscellaneous  crops  (299).  3)  corn-standing  (0).  4)  com-stubble \n(349).  5)  shrub-land (515).  6)  grass/ pasture (3,184).  7) soybeans  (125).  8)  bare(cid:173)\nsoil, clear land  (535).  9) hardwood, canopy>  50%  (10,169).  10) hardwood, canopy \n<  50%  (945).  11)  conifer  forest  (2,051). \n13) \nasphalt  (390).  14) single family housing  (2,220).  15) multiple family housing  (26). \n16)  industrial/ commercial  (118).  17)  bare soil-plowed field  (382). Total 21,952. \n\n12)  mixed  wood  forest \n\n(616). \n\n3.2  METHODS \n\nApproximately 10% of the pixels in each of the four scenes were randomly selected to \nbecome  a  training  set.  Four  Binary  Diamond  networks  were  grown,  based  on  these \nfour training sets. In the evaluation phase, each network classified each scene. \n\nThe intensity of the light in each band was  discretized into 64 intervals. Each interval \nwas  considered  as  a  basic  feature.  So,  each  pixel  was  characterized  by  four  basic \nfeatures  (one for  each band), out of 4x64=256 possible basic features. The fust layer \nof the Binary Diamond consisted of 256  neurons,  representing  these basic features. \nPixels  of the training set were  treated like  events.  They were  presented sequentially, \none at a time, for  one time, and the neurons that represent their clues were added to \nthe network, as explained in section 2.4. After the training phase, the rest of the pixels \nwere  presented,  and  the  network  classified  them.  The  results  of  this  classification \nwere kept for comparisons with the observed ground cover values. \n\nThe  same  training  sets  were  used  to  train  two  other  classification  algorithms;  a \nbackpropagation  neural  network,  and  a  nearest  neighbor  classifier.  The  back(cid:173)\npropagation network had four  neurons in the input layer, each representing a spectral \nband. It had  seventeen  neurons in  the  output layer,  each  representing a  class,  and a \nhidden layer  of ten  neurons.  The  nearest  neighbor  classifier  used  the  pixels  of the \ntraining set as models. The Euclidean distance between the feature vector of a pixel to \n\n\fClassification of Multi-Spectral Pixels by the Binary Diamond Neural Network \n\n1149 \n\nbe classified and each model pixel  was  computed. The pixel was classified  according \nto the class of its closest model. \n\n3.3 \n\nRESULTS \n\nIn auto-classification,  the  pixels  of a  scene  are  classified  by  an  algorithm  that  was \ntrained using pixels from the same scene. In cross-classification, the classification of a \nscene is done by an algorithm that was trained by pixels of another scene. It was  found \nthat  in  both  auto-classification  and  cross-classification,  the  results  depend  on  the \nconsistency  of the  training  set.  Boundary  pixels,  which  form  the  boundary  (on  the \nground) between two classes, may contain a combination of two ground cover classes. \nIf boundary pixels  were excluded from  the  scene,  the  results  of all  the  classification \nmethods  improved  significantly.  Table  1  compares  the  overall  performance  of  the \nthree classification  methods in  auto-classification  and  cross-classification,  when  only \nboundary pixels  were considered.  Similar  ordering  of the  classification  methods  was \nobtained when all the pixels were considered. \n\n3 \n\n1  2 \n4 \n83  58  71  74 \n41  78  50  44 \n49  48  75  52 \n54  44  57  76 \n\n1 \n2 \n3 \n4 \n\nBinary Diamond \n\n1 \n2 \n3 \n4 \n\n3 \n\n1  2 \n4 \n83  41  46  61 \n27  73  28  17 \n43  38  62  35 \n52  36  39  70 \nNearest Neighbor \n\n1 \n2 \n3 \n4 \n\n2  3 \n\n1 \n4 \n73  60  33  64 \n25  55  38  26 \n33  38  52  37 \n48  43  42  60 \nBack-Propagation \n\nTable 1:  The percent of correctly classified pixels for the implementations of the \nthree methods, for non-boundary  pixels only,  as tested on the four maps. Column's \nindex is the training map, rows index is the testing map. \n\nTable 2 compares the performances of the three methods class by class, as obtained in \nthe classification of the flfst scene. Similar results were obtained for the other scenes. \n\n1  2  3  4  5  6  7  8  9  10  11  U  13  14  15  16  17 \n1= \n0  33  7  44  10  53  88  37  34  5  32  69  33  34  41 \n48  10 \nBD \nBDp  57  8  0  48  10  14  10  58  87  37  35  6  30  69  42  25  27 \nbNN  63  54  0  72  47  19  52  77  60  63  48  60  70  43  64  62  72 \n1  26  45  54  52  27 \nBP \n\n68  5  0  68  0  1  11  66  80  74  11 \n\nTable 2:  The percent of pixels from category I that have been classified as category I. \nAuto-classification of scene 1. All the pixels are included. BD; results of Binary \nDiamond where the feature vectors are in the standard Cartesian representation. \nBDp = results of Binary Diamond where the feature vectors are in four dimensional \npolar coordinates. bNN results of nearest neighbor, and BP of back-propagation. \n\n\f1150 \n\nSalu \n\nThe overall performance of the Binary Diamond was better than those of the nearest \nneighbor and the back-propagation classifiers. This was the case in  auto-classification \nand in  cross-classification, \nin  scenes  that  included  all  the  pixels,  and  in  scenes  that \nconsisted only of non-boundary pixels. However, when comparing individual classes,  it \nwas  found  that  different  classes  may  have  different  best  classifiers.  In  practical \napplications, the prices of correct or the wrong classifications  of each class,  as well  as \nthe frequency of the classes in the environment will determine the optimal classifier. \n\nAll the networks recruited their neurons as needed, during the training phase. They \nall started with 256 neurons in the first layer, and with seventeen neuron in the class \nlayer, outside the Binary Diamond. At the end of the training phase of the first scene, \nThe Binary Diamond consisted of 5,622 neurons, in four layers. This is a manageable \nnumber, and it is much smaller than the maximum number of possible clues, \n644 =224. \n\n4 \n\nOrnER  APPLICATIONS OF mE BINARY DIAMOND \n\nThe Binary Diamond, as presented here, was the core of a network that was used as a \nclassifier.  Because of its  special structure, the Binary Diamond can be used  in  other \nrelated problems, such as in associative memories. In associative memory, a presented \nclue has  to retrieve all  the basic features  of an associated  item.  If we  start from  any \nnode in the Binary Diamond, and cascade down in the existing lines,  we reach all the \nbasic  features  of this  clue  in  the  frrst  layer.  So,  to  retrieve  an  associated  item,  the \nsignals  of the input clue  have  frrst  to  climb  up  the  binary diamond  till  they reach  a \nnode, which is  the best generalization  of this clue, and then  to  cascade down and  to \nactivate the basic features of this  generalization. The synaptic weights  in the upward \ndirection  can  encode  information  about  causality relationships  and  the  frequency  of \nco-activations of the pre and  post-synaptic  neurons. This information  can  be used in \nthe retrieval of the most appropriate generalization to the given clue. An associative \nmemory  of  this  kind  retrieves  information  in  ways  similar  to  human  associative \nretrieval  (paper submitted). \n\nREFERENCES \n\nA reference list, as well as more details about pixel classification can be found in: \nClassification of Multi-Spectral Image Data by the Binary Diamond Neural Network \nand  by  Non-Parametric  Pixel-by-Pixel  Methods,  by  Yehuda  Salu  and  James  Tilton. \nIEEE Transactions On Geoscience And Remote Sensing, 1993 (in press). \n\n\f", "award": [], "sourceid": 853, "authors": [{"given_name": "Yehuda", "family_name": "Salu", "institution": null}]}