Paper ID: | 1892 |
---|---|

Title: | The Product Cut |

1. Quick Summary The authors propose a new objective function for clustering networks that they call product cut. It is defined based on a personalized pagerank matrix (although it seems like some larger class should be possible and this choice is incidental). The key idea is that we are going to take a complicated product of properties of each partition rather than a simple sum of graph-based metrics of the partition as in many standard methods. One of the complexities involves using the personalized PageRank matrix in order to define the objective. This objective is shown to satisfy a simple analytical property that preserves the partitions in some region around a sample problem with two clear clusters with a third cluster as the perturbation. Next, in a small surprise, the problem is shown to be solvable by a relaxed convex _maximization_ problem (so this doesn't give any computational efficiency, but does mean it enables more practical algorithms.) The relaxed problem has integer solutions at any extrema. This motivates a heuristic LP procedure to optimize the function. There are a set of experimental results that show the standard results on a few test problems in terms of "purity". It'd be nice to see a few more evaluations (NMI or F1) that as well. Small issues & questions 1. One issue with this paper relates to the convergence of the optimization strategy for convex maximization. The results in the paper do not justify convergence to a local optima, nor to they justify convergence of the cluster at all; if f(x_{k+1}) >= f(x_k)) then we cannot conclude finding a local maxima (even with an upper-bound) nor does this give convergence of the x_k sequence. Were you using additional properties in your statement of this conclusion? 2. In your simple example, wouldn't clustering of A, B, C also be appropriate? It seems like you show that if you introduce a good conductance cluster (C), you break the NCut objective for bi-partition. But this isn't surprising as you are asking for a bisection of a graph, not the trisection that NCut would more naturally produce. Can you clarify what your method would do if you asked for a tripartition in this case? Suggestions There should be a broader class of matrices than the personalized PageRank matrices that would support your procedure. (For instance, at least the heat-kernel would seem to work as well...) Some parameter study about alpha would be nice. Since this is really a paper about a new objective function it'd be nice to see more motivation for it. Presentation notes * Given that the objective function is first defined with the most important term missing, it's hard to appreciate it's properties. * Typo on figure 1a.

This paper has some issues that would need to be addressed related to the use of the final optimization procedure. If this could be corrected, the rest of the paper makes a small contribution, but in the context of a new objective function. Adding new objective functions to the literature is a lot like adding new standards. It's unclear if they will be useful; but this one seems to have a few nice properties that would merit acceptance.

3-Expert (read the paper in detail, know the area, quite certain of my opinion)

The authors propose a new method for partitioning a given graph. The related optimization problem parallels the classical Normalized Cut optimization in terms of its underlying notion of cluster. They show how to cast this problem as a convex maximization program, which leads to an exact continuous relaxation of the discrete problem. With this formulation they develop a monotonic algorithm for solving the optimiziation problem via a sequence of linear programs, and introduce a randomized version of this strategy that leads to a simple algorithm. The authors carry out an experimental evaluation of the proposed method showing its performance in terms of cluster purity on a variety of real world data sets.

The paper is well clearly written and well organized. The problem formalisation is interesting and, as far as I know, novel. The comparison with the classical Normalized Cut is meaningful, although I would have liked to see some additional analysis examples considering an input a bit more significant/complex than a collection of isolated vertices, which might appear a "pathological input case". The usefulness of this partitioning method is anyway quite clear as well as its superior stability properties and performance. The possibility to cast the original minimization problem as a convex maximazation program is very interesting and it plays a crucial role in solving the original problem. The experiment evaluation of this method is carried out in a careful way and it is able to show very good agorithmic perfomance in terms of cluster purity on a variety of real world data sets. The results seem to me significant and the used methodology seems rigorous enough. My overall evaluation is positive.

1-Less confident (might not have understood significant parts)

This paper introduces a new criteria (product cut) for graph partitioning as an alternative to the popular normalized cut. An exact continuous relaxation is shown for the problem of minimizing the product cut (if the graph is connected). The continuous relaxation is then solved using sequential linear programming.

The authors introduce a new criteria called product cut as an alternative to the popular normalized cut used in graph-based clustering. The main motivation seems to be enforcing stronger balance among the components of the partition. There is already a wide variety of balanced cut criteria in the literature [Ref1] which address the imbalance problem pointed out in the paper. A comparison against these criteria is needed to analyze the advantages of the product cut. It is bit surprising that they use a very old method [14] to argue that normalized cut criteria yields highly imbalanced partition on real data. They should use more recent methods [Ref1, Ref2] that are shown to produce better normalized cuts. Also, as far as I understood, they use the fully dense smoothed version (\Gamma_\alpha) to minimize normalized cut instead of the similarity matrix W usually used in the literature. The continuous relaxation works only if the product cut is defined on a fully dense matrix (\Gamma_\alpha). Otherwise the objective of the continuous relaxation is not well-defined (because of the log-term; see lines 169-170). This is a serious modelling limitation. The algorithm proposed for the continuous relaxation suffers from early termination (line 197). They circumvent this problem by first incorporating a random subset of the constraints of the continuous problem and then adding more constraints. Moreover, in order to scale to large datasets, they employ several approximations in their algorithm (e.g., for solving the two linear systems in each iteration). So it is not clear how effectively the overall algorithm minimizes the product cut. Experiments: ------------ It is bit strange that they report the average purity over 500 different runs for product cut. They should report the purity of the best initialization (found according to the product cut criteria). Comparison against recent methods [Ref1, Ref2] that minimize normalized cut are clearly missing in the experiments. Moreover, one needs a comparison against other balanced cut criteria given in [Ref1] to see the benefits of product cut. The NMFR method [13] achieves almost the same clustering purity values as the product cut. It would be interesting to see the product cut values achieved by the partitions found by NMFR (other methods as well, although they do not directly minimize it). This would show the effectiveness of the proposed method in minimizing the product cut. Ref1: M. Hein and S. Setzer, Beyond Spectral Clustering - Tight Relaxations of Balanced Graph Cuts. NIPS 2011. Ref2: S. Rangapuram, P. K. Mudrakarta and M. Hein, Tight continuous relaxation of the balanced k-cut problem. NIPS 2014.

3-Expert (read the paper in detail, know the area, quite certain of my opinion)

This paper introduces the notion of a Product Cut in a graph. The Product Cut objective for graph clustering is argued to be more natural and stable than existing objectives. The paper proposes an algorithm for finding partitions with a good product cut. The experiments show that the algorithm generates better clusters than existing methods.

The motivation for product cut makes sense to me, and the paper is well-written. I did not verify all of the details of the proofs and the algorithm for approximately maximizing the product cut. The experimental results for the cluster purity are promising, showing that PCut performs favorably. The experiments on running time (Table 2) should be better explained. The experimental set up is not defined. What is the machine used and what language are the algorithms implemented in? There is also no discussion on why PCut performs faster than NMFR and no comparison in terms of running time against the other methods. Other comments: Is the problem NP-hard like many existing objective functions? It would be helpful to have some discussion of computational complexity (even in the conclusion if it has not been worked out yet). Caption of Figure 1a: C should be in red? Lines 111-115: Are you restricting the set of vertices in B that are connected to C to be of size n_0? Then each vertex in C will connect to u_0*k_0 vertices in that set of n_0 vertices? If this is the construction it should be stated that u_0*k_0 <= n_0. Line 119: It seems that the denominator of the conductance should be min(vol(A), vol(A^C)). Line 146: What program is being executed? Line 152: What is Table 2.1? It should probably be Figure 2. Line 209: close -> closed

2-Confident (read it all; understood it all reasonably well)

This paper presents a clustering method. Compared with normalized cut, the proposed model uses a multiplicative cut-based objective function to find a more balanced graph partition. Theoretical proof is provided on the validity of the model.

How to get balanced partition is an interesting and important problem in clustering. However, this paper does not provide necessary comparison with previous efforts on balanced partitions, such as [1]. Moreover, presentation of the paper is not clear. In my opinion, the introduction section should be enriched and restructured. For now, readers might get confused in Section 1 before they proceed to the following sections. Some suggestions on revising the paper: 1. I'd suggest the authors add comparison with previous works on balanced partitions; 2. The introduction section seems not clear to me. I think necessary background is needed before proposing Eq. (1); 3. Some derivations lack explanation, e.g., how to get Eq. (3) and (10)? 4. Maybe I missed it, but what do the three rows of numbers mean in Fig. 2? 5. What does the percentage value mean in Table 2? [1] Ding, Chris HQ, Xiaofeng He, Hongyuan Zha, Ming Gu, and Horst D. Simon. "A min-max cut algorithm for graph partitioning and data clustering." In Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, pp. 107-114. IEEE, 2001.

2-Confident (read it all; understood it all reasonably well)

The paper under review proposed a new measurement for multi-way graph partitioning problems, i.e., the product cut, and further proposed an algorithm for the product cut problem. The authors claimed that the proposed algorithm performs significantly better than existing algorithms.

The problem proposed in the paper is very interesting, discusses some interesting insides about clustering, which is new to the best of my knowledge. The overall presentation is very good, and is easy for a non-expert to follow. Detailed comments: - The runtime analysis of the algorithm is missing. A lemma summarising the algorithm's runtime and performance is necessary. Also, on Line 235 you stated that your algorithm typically runs significantly faster than other algorithms. Is this statement based on experimental observation? I think more rigorous analysis is needed for such claim. - You used the parameter Purity to measure the performance of algorithms. Some discussion about why you use this parameter is necessary. Also, the meaning of this parameter is not clear to me. Does it relate to the number of misclassified nodes? If this is the case, why the algorithm outputting high value of the purity value gives a better guarantee? - Line 224: There is a typo: "By and large"

2-Confident (read it all; understood it all reasonably well)