NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
- It seems like the second layer in the model is lower dimensional than the first layer. Is there evidence of a dimensionality reduction from retina and LGN that would match this feature of the model? - "most artificial systems are obtained through heuristics and hours of painstaking parameter tweaking." => Does not sound like a relevant comparison, because these artificial systems can solve much more complicated tasks than MNIST. Clarity: - The task on which the network was tested (MNIST) should be mentioned in the abstract. - "The algorithm organizes inter-layer connections to construct a convolutional pooling layer, a key constituent of convolutional neural networks" => The term "convolutional" implies weight tying, but here you can only obtain locally connected units, without weight-tying. Please change language here and everywhere CNNs are mentioned. "Our work on growing artificial systems got us interested in how critical times of different developmental processes are controlled, and whether they were controlled by an internal clock." => Please share your thoughts on this interesting aspect (or remove this sentence). Originality: looks original, but the authors you be more explicit about how you work is different than refs [25->29]. In particular, can you expand on what these other studies contributed? Significance: Interesting results for neuroscience. How could this network apply to ML (beyond a slight benefit on MNIST compared to random networks)? Maybe new bio-inspired hardware?
Reviewer 2
The main contributions of this paper are to propose an algorithm to learn a pooling architecture and one to grow the architecture with only self-organization principles. The developmental algorithm is evaluated on a different input geometry and on experiments with faults in the first layer. A last experiment evaluates the proposed algorithms on a MNIST classification task. I like the originality of the work, as the authors propose the principle of a growing machine, that is able to yield a functional architecture from a limited set of rules. The principles to follow for building such self-organized network are clearly exposed. The concerns expressed below have been answered by the authors' rebuttal, I appreciate that network is implemented with spiking neurons relying on bio-inspired hardware, as it has nice properties for processing temporal input stream. The results could be more convincing with a task relying on coincidence detection or time-structured events, like audio processing, echolocation or event-based vision. The authors argued in the rebuttal that the global inhibition scheme is limited in space, indicating that it does not scale with the size of the network which was my main concern. This is thus a more biologically feasible approach that is compatible with hardware implementation. The "flexibility" of the network allows the network to address the so-called packing problem, that is how to efficiently cover the sensor space. The authors shown in their response convincing experimental results with a hyperbolic geometry that have non-uniform density. -- Original comments -- I could not understand why the authors rely on a spiking model for the layer-I and ReLU units for layer-II. Model of sensory inputs with intrinsic noise could be modeled with neural masses or discrete neural fields. What is the benefit of spiking neurons in this contribution? On the layer-I, the local-excitation and global-inhibition scheme is encountered in the literature but it is limited by the inhibition range. Biological observations are seldom in favor of a global inhibition, except in organisms with a limited brain size (mainly insects) as a truly global inhibition for all neurons require a complex network of connections. In my opinion, the proposed self-organization for arbitrary input-layer geometry of Sect. 4 is not convincing enough, mainly because the tiling of the input layer is uniform. I think that variation in the density of input sensors could better demonstrate the approach, for example following with hyperbolic distribution (like Poincaré Disk). In Sect. 6, I find that the random networks are performing very well on MNIST classification, how a network with random connectivity fed by spiking neurons is performing around 88% accuracy? As a small note, I did not find the movie of Fig. 2 in the supplementary material. This approach is highly original and clearly exposed. The quality and significance are less clear as this approach is difficult to compare to existing methods.
Reviewer 3
[Update after author's reply:] The authors have produced a strong response, with important additional results. I am switching my assessment to "moderate accept", UNDER the condition that the authors include these new results in the paper AND replace occurrences of "convolutional" with "retinotopic". [Original review below:] First, the author oversell their results considerably, claiming to observe the emergence of "convolutional" and "pooling" cells in the sense of convolutional neural networks, which would indeed be an interesting result. However, the learned receptive fields are not "convolutional": there is no feature-specific filter being replicated across the map. Indeed, there isn't any feature selectivity at all! They cannot be called "pooling" either, unless the term is extended to apply to any receptive field whatsoever. The proper term that the author are looking for is "retinotopy" - the neurons learn to restrict their inputs to a specific topological neighbourhood, i.e. they become retinotopic. This is considerably below the state of the art from the 90s, when various authors demonstrated the emergence of feature-selective neurons, with orientation maps and higher-level selectivity. I encourage the authors to consult the work of Poggio (Riesenhuber and Poggio, Serre and Poggio, Masquelier and Poggio) and Rolls (especially Treves & Rolls, Stringer & Rolls). Miikkulainen is cited, but not for his work on the highly relevant LISSOM map-learning system. All of these systems showed selectivity for features of varying complexity - certainly much more complex than pure spatial location (i.e. retinotopy) as shown here. (Note: I am not affiliated with any of these authors) The text is otherwise clear and well-written and seems technically sound. Originality: 1/5 Quality: 4/5 Clarity: 4/5 Significant: 1/5 My overall score is 4 rather than 3, in order to encourage the authors to work more on this project and improve it for future re-submission (possibly at a workshop). See "Improvements".