NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:5999
Title:Gaussian-Based Pooling for Convolutional Neural Networks

Reviewer 1

Originality. This paper viewed the existing pooling method in a convex combination of local feature activations. Based on this model, the authors explained how the proposed Gaussian pooling is different from existing pooling methods clearly. Also, this paper proposes to modify the Gaussian distribution such that the pooling value becomes larger than mean based on the knowledge of local pooling. To the best of my knowledge, such pooling functions are novel. Quality. The quality of this paper is good. The proposed algorithm is reasonable and technically sound. The experiments are conducted on large-scale datasets and compared with related pooling methods. Minor problem: The results of stochastic [32] are missing in Table 3.(c) and (d). Clarity. This paper is clearly written except for some points described below. More explanation of the inverse softplus function and iSP-Gaussian distribution would improve clarity. Mainly, why the term exp(x)/(exp(x) – 1) exists in Eq.(10) is not clear enough. It would be better to explain how to derive Eq.(10). In the experimental section, which layers the authors applied to the proposed methods are not clear. According to the discussion of Global pooling (line 219-), it seems that the poo1 and/or pool2 of Table (a) are used in previous comparisons, but there are no explanations. The fact that this paper is only focusing on local pooling is not clearly explained until this section. Significance. The new pooling method is useful for improving the recognition accuracy of various recognition problems. This paper proposes a novel form of pooling by modifying the parameter of Gaussian distribution, which is shown effective than state-of-the-art poolings on the large-scale dataset. Thus, other researches or practitioner can use the proposed method for any algorithms based on CNN.

Reviewer 2

Updates: I appreciate the additional experiments and clarifications in the rebuttal. I think this is a good paper and would like to increase the rating. Overall, the paper is clearly written and easy to follow. It proposes an interesting novel approach to pooling that leads favorable gains in performance. Although the core mechanism, i.e. estimating pooling parameters using global features, is from the previous GFGP method [1], I think connecting it to probabilistic models is not trivial and can be regarded as a satisfactory technical contribution. My biggest concern is about its practical usefulness. It says that the proposed pooling method requires additional O(C^2) parameters, which are not negligible. For example, we could simply use more convolution filters with average/max pooling to improve performance. It would be more convincing if authors can somehow compare methods with roughly the same number of parameters to prove that the improvement is not just about the increase of model capacity in general by adding more parameters. I'd like to know more details of the derivation of the approximation in Eq.15, particularly how these fixed number are derived. Also, Eq.16 is confusing because this only applies to the case sigma_0 = 0 but not in general. Authors mention the log-Gaussian as a possible alternative to iSP-Gaussian, but no reported in experiments. Was it totally impossible to train log-Gaussian based model because of instability?

Reviewer 3

Originality: The proposed pooling function is novel. The authors show the result of their pooling function with several network architectures on several datasets including ImageNet dataset. Quality: The submission is technically sound. The authors showed that their pooling function outperforms existing pooling functions like max or average pooling. The section 3.3 shows how change the estimated parameters \mu_0 and \sigma_0 of the pooling function during training. Clarity: The paper is well written and well organized. Significance: The experiments show that the proposed pooling function is better than standard pooling functions like max and average pooling. The improvement is about 1% on ImageNet dataset. --------------------------------------- The rebuttal addresses my concerns and I think it is a good paper.