{"title": "Probabilistic Watershed: Sampling all spanning forests for seeded segmentation and semi-supervised learning", "book": "Advances in Neural Information Processing Systems", "page_first": 2780, "page_last": 2791, "abstract": "The seeded Watershed algorithm / minimax semi-supervised learning on a graph computes a minimum spanning forest which connects every pixel / unlabeled node to a seed / labeled node. We propose instead to consider all possible spanning forests and calculate, for every node, the probability of sampling a forest connecting a certain seed with that node. We dub this approach \"Probabilistic Watershed\". Leo Grady (2006) already noted its equivalence to the Random Walker / Harmonic energy minimization. We here give a simpler proof of this equivalence and establish the computational feasibility of the Probabilistic Watershed with Kirchhoff's matrix tree theorem. Furthermore, we show a new connection between the Random Walker probabilities and the triangle inequality of the effective resistance. Finally, we derive a new and intuitive interpretation of the Power Watershed.", "full_text": "Probabilistic Watershed:\n\nSampling all spanning forests\n\nfor seeded segmentation and semi-supervised learning\n\nEnrique Fita Sanmart\u00edn,\n\nSebastian Damrich,\n\nFred A. Hamprecht\n\nHCI/IWR at Heidelberg University, 69115 Heidelberg, Germany\n\n{fita@stud, sebastian.damrich@iwr, fred.hamprecht@iwr}.uni-heidelberg.de\n\nAbstract\n\nThe seeded Watershed algorithm / minimax semi-supervised learning on a graph\ncomputes a minimum spanning forest which connects every pixel / unlabeled node\nto a seed / labeled node. We propose instead to consider all possible spanning\nforests and calculate, for every node, the probability of sampling a forest connecting\na certain seed with that node. We dub this approach \"Probabilistic Watershed\".\nLeo Grady (2006) already noted its equivalence to the Random Walker / Harmonic\nenergy minimization. We here give a simpler proof of this equivalence and establish\nthe computational feasibility of the Probabilistic Watershed with Kirchhoff\u2019s matrix\ntree theorem. Furthermore, we show a new connection between the Random Walker\nprobabilities and the triangle inequality of the effective resistance. Finally, we\nderive a new and intuitive interpretation of the Power Watershed.\n\n1\n\nIntroduction\n\nSeeded segmentation in computer vision and graph-based semi-supervised machine learning are\nessentially the same problem. In both, a popular paradigm is the following: given many unlabeled\npixels / nodes in a graph as well as a few seeds / labeled nodes, compute a distance from a given\nquery pixel / node to all of the seeds, and assign the query to a class based on the shortest distance.\nThere is obviously a large selection of distances to choose from, and popular choices include: i) the\nshortest path distance (e.g. [19]), ii) the commute distance (e.g. [47, 46, 5, 26]) or iii) the bottleneck\nshortest path distance (e.g. [28, 12]). Thanks to its matroid property, the latter can be computed\nvery ef\ufb01ciently \u2013 a greedy algorithm \ufb01nds the global optimum \u2013 and is thus widely studied and\nused in different \ufb01elds under names including widest, minimax, maximum capacity, topographic and\nwatershed path distance. In computer vision, the corresponding algorithm known as \u201cWatershed\u201d is\npopular in seeded segmentation not only because it is so ef\ufb01cient [13] but also because it works well\nin a broad range of problems [45, 3], is well understood theoretically [17, 1], and unlike Markov\nRandom Fields induces no shrinkage bias [4]. Even though the Watershed\u2019s optimization problem\ncan be solved ef\ufb01ciently, it is combinatorial in nature. One consequence is the \u201cwinner-takes-all\u201d\ncharacteristic of its solutions: a pixel or node is always unequivocally assigned to a single seed. Given\nsuitable graph edge-weights, this solution is often but not always correct, see Figures 1 and 21.\nIntrigued by the value of the Watershed to many computer vision pipelines, we have sought to\nentropy-regularize the combinatorial problem to make it more amenable to end-to-end learning in\nmodern pipelines. Exploiting the equivalence of Watershed segmentations to minimum cost spanning\nforests, we hence set out from the following question: Is it possible to compute not just the minimum,\nbut all (!) possible spanning forests, and to compute, in closed form, the probability that a pixel of\n\n1which were produced with the code at https://github.com/hci-unihd/Probabilistic_Watershed\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\f0.00\n\n0.05\n\ns2\n\n0.70\n\n1.20\n\nq\n\n0.43\n\n0.22\n\n0.51\n\n0.92\n\n1.61\n\n0.10\n\ns1\n\n0.16\n\n0.36\n\n0.68\n\n0.77\n\n1.00\n\n1\n\ns2\n\ns2\n\ns2\n\ns2\n\nq\n\nq\n\nq\n\n0.21\n\n0.52\n\n0.72\n\nmSF\n\nq\n\ns1\n\ns1\n\ns1\n\ns1\n\n0.00\n\n0.30\n\n0.45\n\n0\n\nFigure 1: The Probabilistic Watershed computes the expected seed assignment of every node for\na Gibbs distribution over all exponentially many spanning forests in closed-form. It thus avoids\nthe winner-takes-all behaviour of the Watershed. (Top right) Graph with edge-costs and two seeds.\n(Bottom left) The minimum spanning forest (mSF) and other, higher cost forests. The Watershed\nselects the mSF, which assigns the query node q to seed s1. Other forests of low cost might however\ninduce different segmentations. The dashed lines indicate the cut of the segmentations. For instance,\nthe other depicted forests connect q to s2. (Top left) We therefore consider a Gibbs distribution over\nall spanning forests with respect to their cost (see equation (5), \u00b5 = 1). Each green bar corresponds\nto the cost of one of the 288 possible spanning forests. (Bottom right) Probabilistic Watershed\nprobabilities for assigning a node to s2. Query q is now assigned to s2. Considering a distribution\nover all spanning forests gives an uncertainty measure and can yield a segmentation different from\nthe mSF\u2019s. In contrast to the 288 forests in this toy graph, for the real-life image in Figure 2 one\nwould have to consider at least 1011847 spanning forests separating the 13 seeds (see appendix G), a\nfeat impossible without the matrix tree theorem.\n\ninterest is assigned to one of the seeds? More speci\ufb01cally, we envisaged a Gibbs distribution over the\nexponentially many distinct forests that span an undirected graph with edge-costs, where each forest\nis assigned a probability that decreases with increasing sum of the edge-costs in that forest.\nIf computed naively, this would be an intractable problem for all but the smallest graphs. However,\nwe show here that a closed-form solution can be found by recurring to Kirchhoff\u2019s matrix tree\ntheorem, and is given by the solution of the Dirichlet problem associated with commute distances\n[47, 46, 5, 26]. Leo Grady mentioned this connection in [26, 27] and based his argument on potential\ntheory, using results from [8]. Our informal poll amongst experts from both computer vision and\nmachine learning indicated that this connection has remained mostly unknown. We hence offer a\ncompletely self-contained, except for the matrix tree theorem, and hopefully simpler proof.\nIn this entirely conceptual work, we\n\u2022 give a proof, using elementary graph constructions and building on the matrix tree theorem, that\nshows how to compute analytically the probability that a graph node is assigned to a particular\nseed in an ensemble of Gibbs distributed spanning forests (Section 3).\n\u2022 establish equivalence to the algorithm known as Random Walker in computer vision [26] and as\nLaplacian Regularized Least Squares and under other names in transductive machine learning\n[47, 46, 5]. In particular, we relate, for the \ufb01rst time, the probability of assigning a query node to a\nseed to the triangle inequality of the effective resistance between seeds and query (Section 4).\n\n\u2022 give a new interpretation of the so-called Power Watershed [15] (Section 5).\n\n1.1 Related work\n\nWatershed as a segmentation algorithm was \ufb01rst introduced in [6]. Since then it has been studied\nfrom different points of view [7, 16], notably as a minimum spanning forest that separates the seeds\n\n2\n\n\f(a) Image with seeds\n\n(b) Watershed\n\n(c) Probabilistic Watershed\n\n(d) Uncertainty\n\nFigure 2: The Probabilistic Watershed pro\ufb01ts from using all spanning forests instead of only the\nminimum cost one. (2a) Crop of a CREMI image [18] with marked seeds. (2b) and (2c) show results\nof Watershed and multiple seed Probabilistic Watershed (end of section 3) applied to edge-weights\nfrom [11]. (2d) shows the entropy of the label probabilities of the Probabilistic Watershed (white high,\nblack low). The Watershed errs in an area where the Probabilistic Watershed expresses uncertainty\nbut is correct.\n\n[17]. The Random Walker [26, 46, 47, 5] calculates the probability that a random walker starting at a\nquery node reaches a certain seed before the other ones. Both algorithms are related in [15] by a limit\nconsideration termed Power Watershed algorithm. In this work, we establish a different link between\nthe Watershed and the Random Walker. The Watershed\u2019s and Random Walker\u2019s recent combination\nwith deep learning [45, 43, 11] also connects our Probabilistic Watershed to deep learning.\nRelated to our work by name though not in substance is the \"Stochastic Watershed\" [2, 34], which\nsamples different instances of seeds and calculates a probability distribution over segmentation\nboundaries. Instead, in [38] the authors suggest sampling the edge-costs in order to de\ufb01ne an\nuncertainty measure of the labeling. They show that it is NP-hard to calculate the probability that\na node is assigned to a seed if the edge-costs are stochastic. We derive a closed-form formula\nfor this probability for non-stochastic costs by sampling spanning forests. Ensemble Watersheds\nproposed by [12] samples part of the seeds and part of the features which determine the edge-costs.\nIntroducing stochasticity to distance transforms makes a subsequent Watershed segmentation more\nrobust to noise [36]. Minimum spanning trees are also applied in optimum-path forest learning, where\ncon\ufb01dence measures can be computed [21, 22]. Similar to our forest distribution, [30] considers a\nGibbs distribution over shortest paths. This approach is extended to more general bags-of-paths in\n[24].\nEntropic regularization has been used most successfully in optimal transport [20] to smooth the\ncombinatorial optimization problem and hence afford end-to-end learning in conjunction with deep\nnetworks [35]. Similarly, we smooth the combinatorial minimum spanning forest problem by\nconsidering a Gibbs distribution over all spanning forests.\nThe matrix tree theorem (MTT) plays a crucial role in our theory, permitting us to measure the weight\nof a set of forests. The MTT is applied in machine learning [31], biology [40] and network analysis\n[39, 41]. The matrix forest theorem (MFT), a generalization of the MTT, is applied in [14, 37]. By\nmeans of the MFT, a distance on the graph is de\ufb01ned in [14]. In a similar manner as we do with the\nMTT, [37] is able to compute a Gibbs distribution of forests using the MFT.\nSome of the theoretical results of our work are mentioned in [26, 27], where they refer to [8]. In\ncontrast to [26], we emphasize the relation with the Watershed and develop the theory in a simpler\nand more direct way.\n\n2 Background\n\n2.1 Notation and terminology\n\nLet G = (V, E, w, c) be a graph where V denotes the set of nodes, E the set of edges and w and\nc are functions that assign a weight w(e) \u2208 R\u22650 and a cost c(e) \u2208 R to each edge e \u2208 E. All the\ngraphs G considered will be connected and undirected. When we speak of a multigraph, we allow for\n\n3\n\n\f(cid:26)\u2212w(cid:0){u, v}(cid:1)\nk\u2208V w(cid:0){u, k}(cid:1)\n(cid:80)\n\nif u (cid:54)= v\nif u = v\n\nmultiple edges incident to the same two nodes but not for self-loops. We will consider simple graphs\nunless stated otherwise.\nThe Laplacian of a graph L \u2208 R|V |\u00d7|V | is de\ufb01ned as\n\n,\n\nLuv :=\n\nwhere we consider w(cid:0){u, v}(cid:1) = 0 if {u, v} /\u2208 E. L+ will denote its pseudo-inverse.\nWe de\ufb01ne the weight of a graph as the product of the weights of all its edges, w(G) =(cid:81)\nmanner, we de\ufb01ne the cost of a graph as the sum of the costs of all its edges, c(G) =(cid:80)\n\ne\u2208E w(e).\nThe weight of a set of graphs, w({Gi}n\ni=0) is the sum of the weights of the graphs. In a similar\nThe set of spanning trees of G will be denoted by T . Given a tree t \u2208 T and nodes u, v \u2208 V , the\nset of edges on the unique path between u and v in t will be denoted by Pt(u, v). By F v\nu we denote\nthe set of 2-trees spanning forests, i.e. spanning forests with two trees, such that u and v are not\nconnected. Furthermore, if we consider a third node q, we de\ufb01ne F v\nq , i.e. all 2-trees\nspanning forests such that q and u are in one tree and v belongs to the other tree. Note that the sets\nF v\nu,q (= F v\nv ), since q must be connected either\nto u or v, but not to both. In order to shorten the notation we will refer to 2-trees spanning forests\nsimply as 2-forests.\nWe consider w(e) = exp(\u2212\u00b5c(e)), \u00b5 \u2265 0, as will be motivated in Section 3.1 by the de\ufb01nition of a\nGibbs distribution over the 2-forests in F v\nu. Thus, a low edge-cost corresponds to a large edge-weight,\nand a minimum edge-cost spanning forest (mSF) is equivalent to a maximum edge-weight spanning\nforest (MSF).\n\nq,u) and F u\n\nv,q (= F u\n\nq,v) form a partition of F v\n\nu,q := F v\n\nu \u2229 F v\n\nu (= F u\n\ne\u2208E c(e).\n\n2.2 Seeded Watershed as minimum cost spanning forest computation\n\nLet G = (V, E, c) be a graph and c(e) be the cost of edge e. The lower the cost, the higher the af\ufb01nity\nbetween the nodes incident to e. Given different seeds, a forest in the graph de\ufb01nes a segmentation\nover the nodes as long as each component contains a different seed. The cost of a forest, c(f ), is equal\nto the sum of the costs of its edges. The Watershed algorithm calculates a minimum cost spanning\nforest, mSF, (or maximum weight, MSF) such that the seeds belong to different components [17].\n\n2.3 Matrix tree theorem\nIn our approach we want to take all possible 2-forests in F v\nu into account. The probability of a node\nlabel will be measured by the cumulative weight of the 2-forests connecting the node to a seed of that\nlabel. To compute the weight of a set of 2-forests we will use the matrix tree theorem (MTT) which\ncan be found e.g. in chapter 4 of [42] (see Appendix A) and has its roots in [29].\nTheorem 2.1 (MTT). For any edge-weighted multigraph G the sum of the weights of the spanning\ntrees of G, w(T ), is equal to\n\n(cid:88)\n\nt\u2208T\n\n(cid:88)\n\n(cid:89)\n\nt\u2208T\n\ne\u2208Et\n\n(cid:16)\n\n|V | 11(cid:62)(cid:17)\n\n1\n\nw(e) =\n\n1\n|V | det\n\nL +\n\nw(T ) :=\n\nw(t) =\n\n= det(L[v]),\n\nwhere 1 is a column vector of 1\u2019s. L[v] is the matrix obtained from L after removing the row and\ncolumn corresponding to an arbitrary but \ufb01xed node v.\nThis theorem considers trees instead of 2-forests. The key idea to obtain an expression for w (F v\nu ) by\nmeans of the MTT is that any 2-forest f \u2208 F v\nu can be transformed into a tree by adding an arti\ufb01cial\nedge \u00afe = {u, v} which connects the two components of f (as done in section 9 of [8] or in the\noriginal work of Kirchhoff [29]). We obtain the following lemma, which is proven in Appendix A.\nLemma 2.2. Let G = (V, E, w) be an undirected edge-weighted connected graph and u, v \u2208 V\narbitrary vertices.\n\na) Let (cid:96)+\n\nij denote the entry ij of the pseudo-inverse of the Laplacian of G, L+. Then we get\n\n(1)\n\nu ) = w(T )(cid:0)(cid:96)+\n\nw(F v\n\nuu + (cid:96)+\n\nvv \u2212 2(cid:96)+\n\nuv\n\n(cid:1) .\n\n4\n\n\fb) Let (cid:96)\n\n\u22121,[r]\nij\n\ndenote the entry ij of the inverse of the matrix L[r] (the Laplacian L after removing\n\nthe row and the column corresponding to node r), then\n\u22121,[r]\nvv \u2212 2(cid:96)\n\n\u22121,[r]\nuv\n\n(cid:17)\n\n(cid:16)\n\n\uf8f1\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f3w(T )\n\nw(T )(cid:96)\nw(T )(cid:96)\n\n\u22121,[r]\n(cid:96)\nuu + (cid:96)\n\u22121,[v]\nuu\n\u22121,[u]\nvv\n\nw(F v\n\nu ) =\n\nif r (cid:54)= u, v\nif r = v and u (cid:54)= v\nif r = u and u (cid:54)= v.\n\n(2)\n\n2.4 Effective resistance\n\nIn electrical network theory, the circuits are also interpreted as graphs, where the weights of the edges\nare de\ufb01ned by the reciprocal of the resistances of the circuit. The effective resistance between two\nuv := (\u03bdu \u2212 \u03bdv) /I where \u03bdu is the potential at node u and I is the\nnodes u and v can be de\ufb01ned as re\ufb00\ncurrent \ufb02owing into the network. Other equivalent expressions for the effective resistance [25] in\nterms of the matrices L+ and L[r], as de\ufb01ned in Lemma 2.2, are\n\n(cid:17)\n\n\u22121,[r]\nvv \u2212 2(cid:96)\n\n\u22121,[r]\nuv\n\n(cid:16)\n\n\uf8f1\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f3\n\n\u22121,[r]\n(cid:96)\nuu + (cid:96)\n\u22121,[v]\n(cid:96)\nuu\n\u22121,[u]\n(cid:96)\nvv\n\nif r (cid:54)= u, v\nif r = v and u (cid:54)= v\nif r = u and u (cid:54)= v.\n\n(3)\n\nre\ufb00\nuv = (cid:96)+\n\nuu + (cid:96)+\n\nvv \u2212 2(cid:96)+\n\nuv =\n\nWe observe that the expressions in Lemma 2.2 and in equation (3) are proportional. We will develop\nthis relation further in Section 3.2. An important property of the effective resistance is that it de\ufb01nes\na metric over the nodes of a graph ([23] Section 2.5.2).\n\n3 Probabilistic Watershed\n\nInstead of computing the mSF, as in the Watershed algorithm, we take into account all the 2-forests\nthat separate two seeds s1 and s2 in two trees according to their costs. Since each 2-forest assigns\na query node to exactly one of the two seeds, we calculate the probability of sampling a 2-forest\nthat connects the seed with the query node. Moreover, this provides an uncertainty measure of the\nassigned label. We call this approach to semi-supervised learning \u201cProbabilistic Watershed\".\n\n3.1 Probability of connecting two nodes in an ensemble of 2-forests\n\nIn Section 2.1, we de\ufb01ned the cost of a forest as the cumulative cost of its edges. We assume that\nthe 2-forests f \u2208 F s2\ns1 follow a probability distribution that minimizes the expected cost of a 2-forest\namong all distributions of given entropy J. Formally, the 2-forests are sampled from the distribution\nwhich minimizes\n\nP (f ) = 1 and H(P ) = J,\n\n(4)\n\n(cid:88)\n\nP (f )c(f ),\n\nmin\n\nP\n\nf\u2208F s2\ns1\n\ns.t. (cid:88)\n(cid:81)\n\nf\u2208F s2\ns1\n\ne\u2208Ef\n\nf(cid:48)\u2208F s2\ns1\n\n(cid:80)\n\nwhere H(P ) is the entropy of P . The lower the entropy, the more probability mass is given to the\n2-forests of lowest cost. The minimizing distribution is the Gibbs distribution (e.g. [44] 3.2):\n\n(cid:80)\n\nP (f ) =\n\nexp (\u2212\u00b5c(f ))\n\nexp (\u2212\u00b5c(f(cid:48)))\n\n=\n\nf(cid:48)\u2208F s2\ns1\n\n(cid:81)\nexp(\u2212\u00b5c(e))\ne\u2208Ef(cid:48) exp(\u2212\u00b5c(e))\n\n(cid:80)\n\n=\n\nw(f )\nf(cid:48)\u2208F s2\ns1\n\nw(f(cid:48))\n\n,\n\n(5)\n\nwhere \u00b5 implicitly determines the entropy. A higher \u00b5 implies a lower entropy (see Section 5\nand Figure 1 in the appendix). According to (5), an appropriate choice for the edge-weights is\nw(e) = exp(\u2212\u00b5c(e)). The main de\ufb01nition of the paper is:\nDe\ufb01nition 3.1 (Probabilities of the Probabilistic Watershed). Given two seeds s1 and s2 and a\nquery node q, we de\ufb01ne the Probabilistic Watershed\u2019s probability that q and s1 have the same label as\nthe probability of sampling a 2-forest that connects s1 and q, while separating the seeds:\n\n(cid:46) (cid:88)\n\nw(f(cid:48)) = w(cid:0)F s2\n\ns1,q\n\n(cid:1)(cid:14)w(cid:0)F s2\n\n(cid:1) .\n\ns1\n\n(6)\n\nP (q \u223c s1) :=\n\n(cid:88)\n\n(cid:88)\n\nP (f ) =\n\nw(f )\n\nf\u2208F s2\ns1,q\n\nf\u2208F s2\ns1,q\n\nf(cid:48)\u2208F s2\ns1\n\n5\n\n\fq\n\nq\n\nq\n\nq\n\nf \u2208 F q\n\ns1,s2\n\nf \u2208 F s2\n\ns1,q\n\nf \u2208 F s1\n\ns2,q\n\nf \u2208 F q\n\ns1,s2\n\nf \u2208 F s1\n\ns2,q\n\nf \u2208 F s2\n\ns1,q\n\ns1\n\ns2\n\ns1\n\ns2\n\ns1\n\ns2\n\ns1\n\ns2\n\n(b) forest f \u2208 F q\n\n(a) spanning tree t \u2208 T\nFigure 3: Amongst all spanning forests that isolate seed s1 from s2, we want to identify the fraction\nof forests connecting s1 and q (De\ufb01nition 3.1). The dashed lines represent all spanning trees. Either\ncut in (3b) yields a forest separating q from s2. The blue ones are of interest to us. Diagrams (3b) -\n(3d) correspond to the three equations in the linear system (7), which can be solved for w(F s2\n\n(d) forest f \u2208 F s2\n\n(c) forest f \u2208 F q\n\ns2\n\ns1\n\ns1\n\ns1,q).\n\nThe Watershed algorithm computes a minimum cost 2-forest, which is the most likely 2-forest\naccording to (5), and segments the nodes by their connection to seeds in the minimum cost spanning\n2-forest. However, it does not indicate which label assignments were ambiguous, for instance due\nto the existence of other low - but not minimum - cost 2-forests. This makes it a brittle \"winner-\ntakes-all\" approach. In contrast, the Probabilistic Watershed takes all spanning 2-forests into account\naccording to their cost (see Figure 1). The resulting assignment probability of each node provides an\nuncertainty measure. Assigning each node to the seed for which it has the highest probability can\nyield a segmentation different from the Watershed\u2019s.\n\n3.2 Computing the probability of a query being connected to a seed\n\nIn the previous subsection, we de\ufb01ned the probability of a node being assigned to a seed via a Gibbs\ndistribution over all exponentially many 2-forests. Here, we show that it can be computed analytically\nusing only elementary graph constructions and the MTT (Theorem 2.1). In Lemma 2.2 we have\nstated how to calculate w(F v\ns2 we can compute\nw(F s2\nF v\nu,q and F u\n2.1. Thus, we obtain the linear system of three equations in three unknowns:\n\nu for any mutually distinct nodes u, v, q as mentioned in Section\n\nu ) for any u, v \u2208 V . Applying this to F s2\n\nv,q form a partition of F v\n\ns2,q) by means of a linear system.\n\ns1,q) and w(F s1\n\ns1 and F q\n\ns1 , F q\n\nw(F s2\nw(F q\nw(F s2\n\ns1,q) + w(F q\n) + w(F s1\ns1,q) + w(F s1\n\n) = w(F q\ns2,q) = w(F q\ns2,q) = w(F s2\n\ns1,s2\n\ns1,s2\n\ns1\n\ns1\n\ns2\n\n)\n)\n).\n\n(7)\n\nIn this paragraph, we describe an alternative way of deriving (7) by relating spanning 2-forests to\nspanning trees before we solve it in (8). This is similar to our use of the MTT for counting spanning\n2-forests instead of trees in Lemma A.4 (see Appendix A). Let t be a spanning tree of G. To create a\n2-forest f \u2208 F s2\ns1 from t we need to remove an edge e in the path from s1 to s2, that is e \u2208 Pt(s1, s2).\nThis edge e must be either in Pt(q, s1) \u2229 Pt(s1, s2) or Pt(q, s2) \u2229 Pt(s1, s2) (shown in red and blue\nrespectively in Figure 3d), as the union of Pt(s1, q) and Pt(q, s2) contains Pt(s1, s2) and removing e\nfrom t cannot pairwise separate q, s1 and s2. If we remove an edge from Pt(q, s2) \u2229 Pt(s1, s2), we\nget f \u2208 F s2\ns2,q. Analogously, we obtain a\n2-forest in F q\ns2 if we remove an edge e from Pt(s1, q) or Pt(s2, q) respectively (see Figure 3).\nWhen applied to all spanning trees, we obtain the system (7).\nSolving the linear system (7) we obtain 2\n\ns1,q since we are disconnecting s2 from q, otherwise f \u2208 F s1\n\ns1 or F q\n\n) + w(F s2\n\ns1\n\ns2\n\n) \u2212 w(F q\n\n(8)\n\nIn consequence of equation (8) and De\ufb01nition 3.1 we get the following theorem:\nTheorem 3.1. The probability that q has the same label as seed s1 is\n\n) \u2212 w(F q\nv ) for any u, v, q \u2208 V but that formula is incorrect.\nFor instance, it does not hold for the complete graph with nodes {u, v, q} and with w(e) = 1 for all edges e,\nsince w(F v\n\n2Section IV.B of [26] states w(F v\n\n) + w(F s2\nu ) \u2212 w(F q\n\nu,q) = 1 (cid:54)= 0 = 2 \u2212 2 = w(F v\n\nu,q) = w(F v\n\nu ) \u2212 w(F q\nv ).\n\ns2\n\ns1\n\ns1\n\ns1\n\ns1,q\n\nw(cid:0)F s2\n\n(cid:1) =(cid:0)w(F q\nP (q \u223c s1) =(cid:0)w(F q\n\ns1\n\n)(cid:1)(cid:14)2.\n)(cid:1)(cid:14)(cid:0)2w(F s2\n\n)(cid:1).\n\n6\n\n\fTheorem 3.1 expresses P (q \u223c s1) in terms of weights of 2-forests, which we can compute with\nLemma 2.2, which is based on the MTT. We use this expression to relate P (q \u223c s1) to the effective\nresistance. As a result of Lemma 2.2 and equation (3), for any nodes u, v \u2208 V we have\n\nuv = w (F v\nre\ufb00\n\nu ) /w(T ).\n\ns1q\n\n\u2212 re\ufb00\n\nP (q \u223c s1) =(cid:0)re\ufb00\n\nuv being a metric, w (F v\n(cid:1)\ns1q \u2264 re\ufb00\n\n(cid:1)(cid:14)(cid:0)2re\ufb00\n\n(9)\nThis relation has already been proven in [8] (Proposition 17.1) but in terms of the effective conductance\n(the inverse of the effective resistance). Due to re\ufb00\nu ) also de\ufb01nes a metric over\nthe nodes of the graph. Combining (9) with Theorem 3.1, we have that the probability of q having\nseed s1\u2019s label is\n\ns2q + re\ufb00\ns2s1\n\n(10)\ns1s2\nThe probability is proportional to the gap in the triangle inequality re\ufb00\ns2q. It will be\nshown in Section 4 that the probability de\ufb01ned in De\ufb01nition 3.1 is equal to the probability given by\nthe Random Walker [26]. Equation (10) gives an interpretation of this probability, which is new to the\nbest of our knowledge. We can see that the greater the gap in the triangle inequality, the greater is the\nprobability. Further, we get P (q \u223c s1) \u2265 P (q \u223c s2) \u21d0\u21d2 re\ufb00\ns2q. This relation has already\nbeen pointed out in [26] (section IV.B) in terms of the effective conductance between two nodes, but\nnot as explicitly as in (10). We note that any metric distance on the nodes of a graph, e.g. the ones\nmentioned in the introduction, can de\ufb01ne an assignment probability along the lines of equation (10).\nOur discussion was constrained to the case of two seeds only to ease our explanation. We can reduce\nthe case of multiple seeds per label to the two seed case by merging all nodes seeded with the same\nlabel. Similarly, the case of more than two labels can be reduced to the two label scenario by using a\none versus all strategy: We choose one label and merge the seeds of other labels into one unique seed.\nIn both cases we might introduce multiple edges between node pairs. While having formulated our\narguments for simple graphs, they are also valid for multigraphs (see Appendix A).\n\ns1q \u2264 re\ufb00\n\n+ re\ufb00\n\ns1s2\n\n4 Connection between the Probabilistic Watershed and the Random Walker\n\nIn this section we will show that the Random Walker of [26] is equivalent to our Probabilistic\nWatershed, both computationally and in terms of the resulting label probabilities.\nTheorem 4.1. The probability xs1\nq that a random walker as de\ufb01ned in [26] starting at node q reaches\ns1 \ufb01rst before reaching s2 is equal to the Probabilistic Watershed probability de\ufb01ned in De\ufb01nition 3.1:\n\nq = P (q \u223c s1).\nxs1\n\nThis equivalence, which we prove in Appendix B, was pointed out by Leo Grady in [26] section\nIV.B but with a different approach. Grady relied on results from [8], where potential theory is used.\nThere it is shown that xs1\nusing equation (9):\nq = w(F s2\nxs1\n\nw(T )(cid:1). From this formula we get Theorem 4.1 by\n\ns1,q)/(cid:0)re\ufb00\nw(T )(cid:1) = w(cid:0)F s2\n\n(cid:1) = P (q \u223c s1).\n\n(cid:1) /w(cid:0)F s2\n\ns1,q)/(cid:0)re\ufb00\n\nq = w(F s2\n\ns1s2\n\ns1s2\n\ns1,q\n\ns1\n\nWe have proven the same statement with elementary arguments and without the main theory of [8].\nThrough the use of the MTT, we have shown that the forest-sampling point of view is computationally\nequivalent to the in practice very useful Random Walker (see [47, 26], and recently [43, 10, 11, 32, 9]),\nmaking our method just as potent. We thus refrained from adding further experiments and instead\ninclude a new interpretation of the Power Watershed within our framework.\n\n5 Power Watershed counts minimum cost spanning forests\n\nThe objective of this section is to recall the Power Watershed [15] (see Appendix C for a summary)\nand develop a new understanding of its nature. Power Watershed is a limit over the Random Walker\nand thus over the equivalent Probabilistic Watershed. The latter\u2019s idea of measuring the weight of\na set of 2-forests carries over nicely to the Power Watershed, where, as a limit, only the maximum\nweight / minimum cost spanning forests are considered. This section details the connection.\nLet G = (V, E, w, c) and s1, s2 \u2208 V be as before. In [15] the following objective function is\nproposed:\n\n(w(e))\u03b1 (|xu \u2212 xv|)\u03b2 , s.t. xs1 = 1, xs2 = 0.\n\n(11)\n\n(cid:88)\n\narg min\n\nx\n\ne={u,v}\u2208E\n\n7\n\n\f(a) P (node \u223c s1) and\nP (edge \u2208 some mSF)\n\n(b) P (node \u223c s1) and\n\nP (edge \u223c s1|edge \u2208 some msF)\n\n(c) P (node \u223c s1) and\n\nP (edge \u223c s1, edge \u2208 some mSF)\n\n(d) P (node \u223c s2) and\n\nP (edge \u223c s2, edge \u2208 some mSF)\n\nFigure 4: Power Watershed result on a grid graph with seeds s1, s2 and with random edge-costs\noutside a plateau of edges with the same cost (wide edges). By the results in Theorem 5.1, the Power\nWatershed counts mSFs. This is illustrated both with the node- and edge-colors. (4a-4d) The nodes\nare colored by their probability of belonging to seed s1 (s2), i.e. by the share of mSFs that connect a\ngiven node to s1 (s2). (4a) The edge-color indicates the share of mSFs in which the edge is present.\n(4b) The edge-color indicates the share of mSFs in which the edge is connected to seed s1 among the\nmSFs that contain the edge. (4c - 4d) The edge-color indicates the share of mSFs in which the edge is\nconnected to s1 or s2, respectively, among all mSFs. See Appendix F for a more detailed explanation.\n\nthe edges to \u03b1 we obtain(cid:0)w(e)(cid:1)\u03b1\n\nFor \u03b1 = 1 and \u03b2 = 2 it gives the Random Walker\u2019s objective function. The Power Watershed\nconsiders the limit case when \u03b1 \u2192 \u221e and \u03b2 remains \ufb01nite.\nIn section 3.1 we de\ufb01ned the weight of an edge e as w(e) = exp(\u2212\u00b5c(e)), where c(e) was the edge-\ncost and \u00b5 implicitly determined the entropy of the 2-forest distribution. By raising the weight of\n= exp(\u2212\u00b5\u03b1c(e)) = exp(\u2212\u00b5\u03b1c(e)), where \u00b5\u03b1 := \u00b5\u03b1. Therefore,\nwe can absorb \u03b1 into \u00b5. When \u03b1 \u2192 \u221e (and therefore \u00b5\u03b1 \u2192 \u221e) the distribution will have a lowest\nentropy. As a consequence only the mSFs / MSFs are considered in the Power Watershed:\nTheorem 5.1. Given two seeds s1 and s2, let us denote the potential of node q being assigned to\nseed s1 by the Power Watershed with \u03b2 = 2 as xPW\n\n. Let further wmax be maxf\u2208F s2\n\nw(f ). Then\n\ns1\n\n(cid:12)(cid:12){f \u2208 F s2\n\ns1,q : w(f ) = wmax}(cid:12)(cid:12)\n\nq\n\n: w(f ) = wmax}| .\n\nxPW\nq =\n\n|{f \u2208 F s2\n\ns1\n\nTheorem 5.1,which we prove in Appendix D, interprets the Power Watershed potentials as a ratio of\n2-forests similar to the Probabilistic Watershed. But instead of all 2-forests the Power Watershed only\nconsiders minimum cost 2-forests (equivalently maximum weight 2-forests) as they are the only ones\nthat matter after taking the limit \u00b5 \u2192 \u221e (or \u03b1 \u2192 \u221e). In other words, the Power Watershed counts\n\n8\n\ns1s20.00.20.40.60.81.00.00.20.40.60.81.0s1s20.00.20.40.60.81.0s1s20.00.20.40.60.81.0s1s20.00.20.40.60.81.0\fq\n\n1\n\n3\n\ns1\n\n7\n\n6\n\n4\n\n8\n\n5\n\ns2\n2\n\n4\n\n0\n\n8\n\n4\n\ns2\n\ns2\n\ns2\n\n0.00\n\n1.00\n\n1.00\n\n1\n\nq\n\ns2\n\ns1\n\ns1\n\ns1\n\n0.00\n\n0.33\n\n0.67\n\n0\n\ns1\n\n0.00\n\n0.33\n\n1.00\n\n(b) mSF1\n\n(c) mSF2\n\n(e) P\u221e(node \u223c s2)\n\n(d) mSF3\n\n(a) Graph\n\n(f) RW reachability\nFigure 5: Forest-interpretation of Power Watershed. (5a) Graph with edge-costs and its mSFs in\n((5b)-(5d)). (5e) Power Watershed probabilities for assigning a node to s2. The Power Watershed\ncomputes the ratio between the mSFs connecting a node to s2 and all possible mSFs. The dashed\nlines indicate the segmentation\u2019s cut. (5f) indicates the allowed Random Walker transitions when\n\u00b5 \u2192 \u221e with headed arrows. The Random Walker interpretation of the Power Watershed breaks down\nin the limit case since a Random Walker starting at node q does not reach any seed, but oscillates\nalong the bold arrow.\n\nby how many seed separating mSFs a node is connected to a seed (see Figure 5). Note, that there can\nbe more than one mSF when the edge-costs are not unique. In Figure 4 we show the probability of an\nedge being part of a mSF (see Appendix F for a more exhaustive explanation). In addition, it is worth\nrecalling that the cut given by the Power Watershed segmentation is a mSF-cut (Property 2 of [15]).\nThe Random Walker interpretation can break down in the limit case of the Power Watershed. After\ntaking the power of the edge-weights to in\ufb01nity, at any node a Random Walker would move along an\nincident edge with maximum weight / minimum cost. So, in the limit case a Random Walker could\nget stuck at the edges, e = {u, v}, which minimize the cost among all the edges incident to u or v.\nIn this case the Random Walker will not necessarily reach any seed (see Figure 5f). In contrast, the\nforest-counting interpretation carries over nicely to the limit case.\nThe Probabilistic Watershed with a Gibbs distribution over 2-forests of minimal (maximal) entropy,\n\u00b5 = \u221e (\u00b5 = 0), corresponds to the Power Watershed (only considers the graph\u2019s topology). The\neffect of \u00b5 is illustrated on a a toy graph in Figure 1 of the appendix. One could perform grid search to\nidentify interesting intermediate values of \u00b5. Alternatively, \u00b5 can be learned, alongside the edge-costs,\nby back-propagation [11] or by a \ufb01rst-order approximation thereof [43].\n\n6 Discussion\n\nIn this work, we provided new understanding of well-known seeded segmentation algorithms.\nWe have presented a tractable way of computing the expected label assignment of each node by a\nGibbs distribution over all the seed separating spanning forests of a graph (De\ufb01nition 3.1). Using the\nMTT we showed that this is computationally and by result equivalent to the Random Walker [26].\nOur approach has been developed without using potential theory (in contrast to [8]).\nThese facts have provided us with a novel understanding of the Random Walker (Probabilistic\nWatershed) probabilities: They are proportional to the gap produced by the triangle inequality of the\neffective resistance between the seeds and the query node.\nFinally, we have proposed a new interpretation of the Power Watershed potentials for \u03b2 = 2 and\n\u03b1 \u2192 \u221e: They are given as the probabilities of the Probabilistic Watershed when the latter is restricted\nto mSFs instead of all spanning forests.\nA mSF can also be seen as a union of minimax paths between the vertices [33]. Recently, [12] showed\nthat the Power Watershed assigns a query node q to the seed to which the minimax path from q has\nthe lowest maximum edge cost. In future work, we hope to extend this path-related point of view to\nan intuitive understanding of the Power Watershed.\nWe are currently working on an extension of the Probabilistic Watershed framework to directed\ngraphs, by means of the generalization of the MTT to directed graphs [42]. Here, one samples\ndirected spanning forests with the seeds as sinks to segment the unlabelled nodes. This might lead to\na new practical algorithm for semi-supervised learning on directed graphs such as social / citation or\nWeb networks and could be related to directed random walks.\n\n9\n\n\fAcknowledgements\n\nThe authors would like to thank Prof. Marco Saerens for his profound and constructive comments as\nwell as the anonymous reviewers for their helpful remarks. We would like to express our gratitude\nto Lorenzo Cerrone, who also shared the edge weights of [11], and Laurent Najman for the useful\ndiscussions about the Random Walker and Power Watershed algorithms, respectively. We also\nacknowledge partial \ufb01nancial support of the DFG under grant No. DFG HA-4364 8-1.\n\nReferences\n[1] C. All\u00e8ne, J. Y. Audibert, M. Couprie, J. Cousty, and R. Keriven. Some links between min-cuts,\noptimal spanning forests and watersheds. In Proceedings of the 8th International Symposium\non Mathematical Morphology, pages 253\u2013264, 2007.\n\n[2] J. Angulo and D. Jeulin. Stochastic watershed segmentation.\n\nIn Proceedings of the 8th\n\nInternational Symposium on Mathematical Morphology, pages 265\u2013276, 2007.\n\n[3] M. Bai and R. Urtasun. Deep watershed transform for instance segmentation. CVPR, pages\n\n2858\u20132866, 2017.\n\n[4] T. Beier, C. Pape, N. Rahaman, T. Prange, S. Berg, D. D. Bock, A. Cardona, G. W. Knott, S. M.\nPlaza, L. K. Scheffer, U. Koethe, A. Kreshuk, and F. A. Hamprecht. Multicut brings automated\nneurite segmentation closer to human performance. Nature Methods, 14(2):101\u2014102, January\n2017.\n\n[5] M. Belkin, P. Niyogi, and V. Sindhwani. Manifold regularization: A geometric framework for\n\nlearning from labeled and unlabeled examples. JMLR, 7:2399\u20132434, 2006.\n\n[6] S. Beucher and C. Lantu\u00e9joul. Use of watersheds in contour detection.\n\nIn International\nWorkshop on Image Processing: Real-time Edge and Motion Detection/Estimation, volume 132,\n1979.\n\n[7] S. Beucher and F. Meyer. The morphological approach to segmentation: the watershed transfor-\n\nmation. Mathematical Morphology in Image Processing, 34:433\u2013481, 1993.\n\n[8] N. Biggs. Algebraic potential theory on graphs. Bulletin of the London Mathematical Society,\n\n29:641\u2013682, 1997.\n\n[9] N. Bockelmann, D. Kr\u00fcger, D. F. Wieland, B. Zeller-Plumhoff, N. Peruzzi, S. Galli,\nR. Willumeit-R\u00f6mer, F. Wilde, F. Beckmann, J. Hammel, et al. Sparse annotations with\nrandom walks for u-net segmentation of biodegradable bone implants in synchrotron microto-\nmograms. In International Conference on Medical Imaging with Deep Learning \u2013 Extended\nAbstract Track, 2019.\n\n[10] V. Bui, L.-Y. Hsu, L.-C. Chang, and M. Y. Chen. An automatic random walk based method\nfor 3D segmentation of the heart in cardiac computed tomography images. In ISBI, pages\n1352\u20131355, 2018.\n\n[11] L. Cerrone, A. Zeilmann, and F. A. Hamprecht. End-to-end learned random walker for seeded\n\nimage segmentation. In CVPR, 2019.\n\n[12] A. Challa, S. Danda, B. S. D. Sagar, and L. Najman. Watersheds for semi-supervised classi\ufb01ca-\n\ntion. IEEE Signal Processing Letters, 26:720\u2013724, May 2019.\n\n[13] B. Chazelle. A minimum spanning tree algorithm with inverse-ackermann type complexity. J.\n\nACM, 47:1028\u20131047, November 2000.\n\n[14] P. Y. Chebotarev and E. Shamis. The matrix-forest theorem and measuring relations in small\n\nsocial groups1. Automation and Remote Control, 58:1505\u20131514, 1997.\n\n[15] C. Couprie, L. Grady, L. Najman, and H. Talbot. Power watershed: A unifying graph-based\noptimization framework. IEEE Transactions on Pattern Analysis and Machine Intelligence,\n2011.\n\n10\n\n\f[16] M. Couprie, L. Najman, and G. Bertrand. Quasi-linear algorithms for the topological watershed.\n\nJournal of Mathematical Imaging and Vision, 22:231\u2013249, 2005.\n\n[17] J. Cousty, G. Bertrand, L. Najman, and M. Couprie. Watershed cuts: Minimum spanning\nforests and the drop of water principle. IEEE Transactions on Pattern Analysis and Machine\nIntelligence, 31:1362\u201374, 2009.\n\n[18] CREMI. Miccai challenge on circuit reconstruction from electron microscopy images, 2017.\n\nhttps://cremi.org.\n\n[19] A. Criminisi, T. Sharp, and A. Blake. Geos: Geodesic image segmentation. In ECCV, pages\n\n99\u2013112. Springer, 2008.\n\n[20] M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In NIPS, pages\n\n2292\u20132300, 2013.\n\n[21] S. E. N. Fernandes and J. P. Papa. Improving optimum-path forest learning using bag-of-\n\nclassi\ufb01ers and con\ufb01dence measures. Pattern Analysis and Applications, 22:703\u2013716, 2019.\n\n[22] S. E. N. Fernandes, D. R. Pereira, C. C. O. Ramos, A. N. Souza, D. S. Gastaldello, and J. P.\nPapa. A probabilistic optimum-path forest classi\ufb01er for non-technical losses detection. IEEE\nTransactions on Smart Grid, 10:3226\u20133235, 2019.\n\n[23] F. Fouss, M. Saerens, and M. Shimbo. Algorithms and Models for Network Data and Link\n\nAnalysis. Cambridge University Press, New York, NY, USA, 1st edition, 2016.\n\n[24] K. Fran\u00e7oisse, I. Kivim\u00e4ki, A. Mantrach, F. Rossi, and M. Saerens. A bag-of-paths framework\n\nfor network data analysis. Neural Networks, 90:90\u2013111, 2017.\n\n[25] A. Ghosh, S. Boyd, and A. Saberi. Minimizing effective resistance of a graph. SIAM Rev.,\n\n50:37\u201366, 2008.\n\n[26] L. Grady. Random walks for image segmentation. IEEE Transactions on Pattern Analysis and\n\nMachine Intelligence, 2006.\n\n[27] L. J. Grady and J. R. Polimeni. Discrete Calculus. Springer, London, 2010.\n\n[28] K.-H. Kim and S. Choi. Label propagation through minimax paths for scalable semi-supervised\n\nlearning. Pattern Recognition Letters, 45:17\u201325, 2014.\n\n[29] G. Kirchhoff. \u00dcber die Au\ufb02\u00f6sung der Gleichungen, auf welche man bei der Untersuchung\nder linearen Vertheilung galvanischer Str\u00f6me gef\u00fchrt wird. Annalen der Physik, 148:497\u2013508,\n1847.\n\n[30] I. Kivim\u00e4ki, M. Shimbo, and M. Saerens. Developments in the theory of randomized shortest\npaths with a comparison of graph node distances. Physica A: Statistical Mechanics and its\nApplications, 393:600\u2013616, 2014.\n\n[31] T. K. Koo, A. Globerson, X. Carreras, and M. Collins. Structured prediction models via the\n\nmatrix-tree theorem. In EMNLP-CoNLL, 2007.\n\n[32] Z. Liu, Y. Song, C. Maere, Q. Liu, Y. Zhu, H. Lu, and D. Yuan. A method for PET-CT lung\ncancer segmentation based on improved random walk. 24th International Conference on Pattern\nRecognition (ICPR), pages 1187\u20131192, 2018.\n\n[33] B. M. Maggs and S. A. Plotkin. Minimum-cost spanning tree as a path-\ufb01nding problem.\n\nInformation Processing Letters, 26:291 \u2013 293, 1988.\n\n[34] F. Malmberg and C. L. L. Hendriks. An ef\ufb01cient algorithm for exact evaluation of stochas-\ntic watersheds. Pattern Recognition Letters, 47:80 \u2013 84, 2014. Advances in Mathematical\nMorphology.\n\n[35] A. Mensch and M. Blondel. Differentiable dynamic programming for structured prediction and\n\nattention. In ICML, 2018.\n\n11\n\n\f[36] J. \u00d6fverstedt, J. Lindblad, and N. Sladoje. Stochastic distance transform. In International\n\nConference on Discrete Geometry for Computer Imagery, pages 75\u201386. Springer, 2019.\n\n[37] M. Senelle, S. Garc\u00eda-D\u00edez, A. Mantrach, M. Shimbo, M. Saerens, and F. Fouss. The sum-\nover-forests density index: Identifying dense regions in a graph. IEEE Transactions on Pattern\nAnalysis and Machine Intelligence, 36:1268\u20131274, 2014.\n\n[38] C. Straehle, U. Koethe, G. Knott, K. Briggman, W. Denk, and F. A. Hamprecht. Seeded\nwatershed cut uncertainty estimators for guided interactive segmentation. In CVPR, pages\n765\u2013772, 2012.\n\n[39] A. Teixeira, P. Monteiro, J. Carri\u00e7o, M. Ramirez, and A. Francisco. Spanning edge betweenness.\n\nIn Workshop on Mining and Learning with Graphs, volume 24, pages 27\u201331, January 2013.\n\n[40] A. Teixeira, P. Monteiro, J. Carri\u00e7o, M. Ramirez, and A. P. Francisco. Not seeing the forest for\nthe trees: Size of the minimum spanning trees (msts) forest and branch signi\ufb01cance in mst-based\nphylogenetic analysis. PLOS ONE, 10, 2015.\n\n[41] F.-S. Tsen, T.-Y. Sung, M.-Y. Lin, L.-H. Hsu, and W. Myrvold. Finding the most vital edge with\nrespect to the number of spanning trees. IEEE Transactions on Reliability, 43:600\u2013602, 1994.\n\n[42] W. T. Tutte. Graph theory. Encyclopedia of mathematics and its applications. Addison-Wesley,\n\n1984.\n\n[43] P. Vernaza and M. Chandraker. Learning random-walk label propagation for weakly-supervised\n\nsemantic segmentation. In CVPR, pages 7158\u20137166, 2017.\n\n[44] G. Winkler.\n\nImage analysis, random \ufb01elds and Markov chain Monte Carlo methods: a\n\nmathematical introduction, volume 27. Springer Science & Business Media, 2012.\n\n[45] S. Wolf, L. Schott, U. K\u00f6the, and F. A. Hamprecht. Learned watershed: End-to-end learning of\n\nseeded segmentation. ICCV, pages 2030\u20132038, 2017.\n\n[46] D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Sch\u00f6lkopf. Learning with local and global\n\nconsistency. In NIPS, pages 321\u2013328, 2004.\n\n[47] X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using gaussian \ufb01elds and\n\nharmonic functions. In ICML, pages 912\u2013919, 2003.\n\n12\n\n\f", "award": [], "sourceid": 1587, "authors": [{"given_name": "Enrique", "family_name": "Fita Sanmartin", "institution": "Heidelberg University"}, {"given_name": "Sebastian", "family_name": "Damrich", "institution": "Heidelberg University"}, {"given_name": "Fred", "family_name": "Hamprecht", "institution": "Heidelberg University"}]}