{"title": "Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition", "book": "Advances in Neural Information Processing Systems", "page_first": 1401, "page_last": 1409, "abstract": "We propose a new sketch recognition framework that combines a rich representation of low level visual appearance with a graphical model for capturing high level relationships between symbols. This joint model of appearance and context allows our framework to be less sensitive to noise and drawing variations, improving accuracy and robustness. The result is a recognizer that is better able to handle the wide range of drawing styles found in messy freehand sketches. We evaluate our work on two real-world domains, molecular diagrams and electrical circuit diagrams, and show that our combined approach significantly improves recognition performance.", "full_text": "Learning from Neighboring Strokes:\n\nCombining Appearance and Context for\n\nMulti-Domain Sketch Recognition\n\nTom Y. Ouyang Randall Davis\n\nComputer Science and Arti\ufb01cial Intelligence Laboratory\n\nMassachusetts Institute of Technology\n{ouyang,davis}@csail.mit.edu\n\nCambridge, MA 02139 USA\n\nAbstract\n\nWe propose a new sketch recognition framework that combines a rich represen-\ntation of low level visual appearance with a graphical model for capturing high\nlevel relationships between symbols. This joint model of appearance and context\nallows our framework to be less sensitive to noise and drawing variations, improv-\ning accuracy and robustness. The result is a recognizer that is better able to handle\nthe wide range of drawing styles found in messy freehand sketches. We evaluate\nour work on two real-world domains, molecular diagrams and electrical circuit di-\nagrams, and show that our combined approach signi\ufb01cantly improves recognition\nperformance.\n\n1 Introduction\n\nSketches are everywhere. From \ufb02ow charts to chemical structures to electrical circuits, people use\nthem every day to communicate information across many different domains. They are also be an\nimportant part of the early design process, helping us explore rough ideas and solutions in an in-\nformal environment. However, despite their ubiquity, there is still a large gap between how people\nnaturally interact with sketches and how computers can interpret them today. Current authoring\nprograms like ChemDraw (for chemical structures) and Visio (for general diagrams) still rely on the\ntraditional point-click-drag style of interaction. While popular, they simply do not provide the ease\nof use, naturalness, or speed of drawing on paper.\nWe propose a new framework for sketch recognition that combines a rich representation of low level\nvisual appearance with a probabilistic model for capturing higher level relationships. By \u201cvisual ap-\npearance\u201d we mean an image-based representation that preserves the pictoral nature of the ink. By\n\u201chigher level relationships\u201d we mean the spatial relationships between different symbols. Our com-\nbined approach uses a graphical model that classi\ufb01es each symbol jointly with its context, allowing\nneighboring interpretations to in\ufb02uence each other. This makes our method less sensitive to noise\nand drawing variations, signi\ufb01cantly improving robustness and accuracy. The result is a recognizer\nthat is better able to handle the range of drawing styles found in messy freehand sketches.\nCurrent work in sketch recognition can, very broadly speaking, be separated into two groups. The\n\ufb01rst group focuses on the relationships between geometric primitives like lines, arcs, and curves,\nspecifying them either manually [1, 4, 5] or learning them from labeled data [16, 20]. Recognition\nis then posed as a constraint satisfaction problem, as in [4, 5], or as an inference problem on a\ngraphical model, as in [1, 16, 17, 20]. However, in many real-world sketches, it is dif\ufb01cult to extract\nthese primitives reliably. Circles may not always be round, line segments may not be straight, and\nstroke artifacts like pen-drag (not lifting the pen between strokes), over-tracing (drawing over a\n\n1\n\n\fpreviously drawn stroke), and stray ink may introduce false primitives that lead to poor recognition.\nIn addition, recognizers that rely on extracted primitives often discard potentially useful information\ncontained in the appearance of the original strokes.\nThe second group of related work focuses on the visual appearance of shapes and symbols. These\ninclude parts-based methods [9, 18], which learn a set of discriminative parts or patches for each\nsymbol class, and template-based methods [7, 11], which compare the input symbol to a library of\nlearned prototypes. The main advantage of vision-based approaches is their robustness to many of\nthe drawing variations commonly found in real-world sketches, including artifacts like over-tracing\nand pen drag. However, these methods do not model the spatial relationships between neighboring\nshapes, relying solely on local appearance to classify a symbol.\nIn the following sections we describe our approach, which combines both appearance and context.\nIt is divided into three main stages: (1) stroke preprocessing: we decompose strokes (each stroke is\nde\ufb01ned as the set of points collected from pen-down to pen-up) into smaller segments, (2) symbol\ndetection: we search for potential symbols (candidates) among groups of segments, and (3) candi-\ndate selection: we select a \ufb01nal set of detections from these candidates, taking into account their\nspatial relationships.\n\n2 Preprocessing\n\nThe \ufb01rst step in our recognition framework is to preprocess the sketch into a set of simple segments,\nas shown in Figure 1(b). The purpose for this step is twofold. First, like superpixels in computer\nvision [14], segments are much easier to work with than individual points or pixels; the number\nof points can be large even in moderate-sized sketches, making optimization intractable. Second,\nin the domains we evaluated, the boundaries between segments effectively preserve the boundaries\nbetween symbols. This is not the case when working with the strokes directly, so preprocessing\nallows us to handle strokes that contain more than one symbol (e.g., when a wire and resistor are\ndrawn together without lifting the pen).\nOur preprocessing algorithm divides strokes into segments by splitting them at their corner points.\nPrevious approaches to corner detection focused primarily on local pen speed and curvature [15], but\nthese measures are not always reliable in messy real-world sketches. Our corner detection algorithm,\non the other hand, tries to \ufb01nd the set of vertices that best approximates the original stroke as a whole.\nIt repeatedly discards the vertex vi that contributes the least to the quality of \ufb01t measure q, which we\nde\ufb01ne as:\n\nq(vi) = (MSE(v \\ vi, s) \u2212 MSE(v, s)) \u2217 curvature(vi)\n\n(1)\n\nwhere s is the set of points in the original stroke, v is the current set of vertices remaining in the line\nsegment approximation, curvature(vi) is a measure of the local stroke curvature1, and (MSE(v \\\nvi, s) \u2212 MSE(v, s)) is the increase in mean squared error caused by removing vertex vi from the\napproximation.\nThus, instead of immediately trying to decide which point is a corner, our detector starts by making\nthe simpler decision about which point is not a corner. The process ends when q(vi) is greater than\na prede\ufb01ned threshold2. At the end of the preprocessing stage, the system records the length of the\nlongest segment L (after excluding the top 5% as outliers). This value is used in subsequent stages\nas a rough estimate for the overall scale of the sketch.\n\n3 Symbol Detection\n\nOur algorithm searches for symbols among groups of segments. Starting with each segment in\nisolation, we generate successively larger groups by expanding the group to include the next closest\nsegment3. This process ends when either the size of the group exceeds 2L (a spatial constraint) or\n\n1De\ufb01ned as the distance between vi and the line segment formed by vi\u22121 and vi+1\n2In our experiments, we set the threshold to 0.01 times the diagonal length of the stroke\u2019s bounding box.\n3Distance de\ufb01ned as mindist(s, g) + bbdist(s, g), where mindist(s, g) is the distance at the nearest point\nbetween segment s and group g and bbdist(s, g) is the diagonal length of the bounding box containing s and g.\n\n2\n\n\fFigure 1: Our recognition framework. (a) An example sketch of a circuit diagram and (b) the seg-\nments after preprocessing. (c) A subset of the candidate groups extracted from the sketch (only those\nwith an appearance potential > 0.25 are shown). (d) The resulting graphical model: nodes represent\nsegment labels, dark blue edges represent group overlap potentials, and light blue edges represent\ncontext potentials. (e) The \ufb01nal set of symbol detections after running loopy belief propagation.\n\nwhen the group spans more strokes than the temporal window speci\ufb01ed for the domain4. Note that\nwe allow temporal gaps in the detection region, so symbols do not need to be drawn with consecutive\nstrokes. An illustration of this process is shown in Figure 1(c).\nWe classify each candidate group using the symbol recognizer we described in [11], which con-\nverts the on-line stroke sequences into a set of low resolution feature images (see Figure 2(a)). This\nemphasis on visual appearance makes our method less sensitive to stroke level differences like over-\ntracing and pen drag, improving accuracy and robustness. Since [11] was designed for classifying\nisolated shapes and not for detecting symbols in messy sketches, we augment its output with \ufb01ve\ngeometric features and a set of local context features:\nstroke count: The number of strokes in the group.\nsegment count: The number of segments in the group.\ndiagonal length: The diagonal length of the group\u2019s bounding box, normalized by L.\ngroup ink density: The total length of the strokes in the group divided by the diagonal length.\n\nThis feature is a measure of the group\u2019s ink density.\n\nstroke separation: Maximum distance between any stroke and its nearest neighbor in the group.\nlocal context: A set of four feature images that captures the local context around the group. Each\nimage \ufb01lters the local appearance at a speci\ufb01c orientation: 0, 45, 90, and 135 degrees. The\nimages are centered at the middle of the group\u2019s bounding box and scaled so that each\ndimension is equal to the group\u2019s diagonal length, as shown in Figure 2(b). The initial\n12x12 images are smoothed using a Gaussian \ufb01lter, down-sampled by a factor of 4.\n\nThe symbol detector uses a linear SVM [13] to classify each candidate group, labeling it as one of the\nsymbols in the domain or as mis-grouped \u201cclutter\u201d. The training data includes both valid symbols\nand clutter regions. Because the classi\ufb01er needs to distinguish between more than two classes, we\n\n4The temporal window is 8 strokes for chemistry diagrams and 20 strokes for the circuit diagrams. These\n\nparameters were selected empirically, and can be customized by the system designer for each new domain.\n\n3\n\n(b) Segments after preprocessing(c) Candidate groups(a) Original Strokes(d)Ghildl()Fildi(d) Graphical model(e) Final detections\fFigure 2: Symbol Detection Features. (a) The set of \ufb01ve 12x12 feature images used by the isolated\nappearance-based classi\ufb01er. The \ufb01rst four images encode stroke orientation at 0, 45, 90, and 135\ndegrees; the \ufb01fth captures the locations of stroke endpoints. (b) The set of four local context images\nfor multi-segment symbol. (c) The set of four local context images for single-segment symbols.\n\nuse the one-vs-one strategy for combining binary classi\ufb01ers. Also, to generate probability estimates,\nwe \ufb01t a logistic regression model to the outputs of the SVM [12].\nMany of the features above are not very useful for groups that contain only one segment. For\nexample, an isolated segment always looks like a straight line, so its visual appearance is not very\ninformative. Thus, we use a different set of features to classify candidates that contain only a single\nsegment: (e.g., wires in circuits and straight bonds in chemistry):\norientation: The orientation of the segment, discretized into evenly space bins of size \u03c0/4.\nsegment length: The length of the segment, normalized by L.\nsegment count: The total number of segments extracted from the parent stroke.\nsegment ink density: The length of the substroke matching the start and end points of the segment\ndivided by the length of the segment. This is a measure of the segment\u2019s curvature and is\nhigher for more curved segments.\n\nstroke ink density: The length of the parent stroke divided by the diagonal length of the parent\n\nstroke\u2019s bounding box.\n\nlocal context: Same as the local context for multi-segment symbols, except these images are cen-\ntered at the midpoint of the segment, oriented in the same direction as the segment, and\nscaled so that each dimension is equal to two times the length of the segment. An example\nis shown in 2(c).\n\n4 Improving Recognition using Context\n\nThe \ufb01nal task is to select a set of symbol detections from the competing candidate groups. Our\ncandidate selection algorithm has two main objectives. First, it must avoid selecting candidates\nthat con\ufb02ict with each other because they share one or more segments. Second, it should select\ncandidates that are consistent with each other based on what the system knows about the likely\nspatial relationships between symbols.\nWe use an undirected graphical model to encode the relationships between competing candidates.\nUnder our formulation, each segment (node) in the sketch needs to be assigned to one of the candi-\ndate groups (labels). Thus, our candidate selection problem becomes a segment labeling problem,\nwhere the set of possible labels for a given segment is the set of candidate groups that contain that\nsegment. This allows us to incorporate local appearance, group overlap consistency, and spatial\ncontext into a single uni\ufb01ed model.\n\n4\n\n(a) Isolated recognizer features(b) Local context features04590135end(c) Local context features0459013504590135\fFigure 3: Spatial relationships: The three measurements used to calculate the context potential\n\u03c8c(ci, cj, xi, xj), where vi and vj are vector representing segment xi and xj and vij is a vector\nfrom the center of vi to the center of vj.\n\nThe joint probability function over the entire graph is given by:\n\n(cid:88)\n\n(cid:122)\n\n(cid:125)(cid:124)\n\n(cid:123)\n\nappearance\n\n(cid:88)\n\n(cid:122)\n\n(cid:125)(cid:124)\n\noverlap\n\n(cid:123)\n\n(cid:122)\n\n\u03c8a(ci, x) +\n\n\u03c8o(ci, cj) +\n\n(cid:125)(cid:124)\n\ncontext\n\n(cid:123)\n\n\u03c8c(ci, cj, xi, xj)\u2212 log(Z)\n\n(2)\n\nlog P (c|x) =\n\ni\n\nij\n\nwhere x is the set of segments in the sketch, c is the set of segment labels, and Z is a normalizing\nconstant.\nAppearance potential. The appearance potential \u03c8a measures how well the candidate group\u2019s\nappearance matches that of its predicted class. It uses the output of the isolated symbol classi\ufb01er in\nsection 4 and is de\ufb01ned as:\n\nwhere Pa(ci|x) is the likelihood score for candidate ci returned by the isolated symbol classi\ufb01er.\n\n\u03c8a(ci, x) = log Pa(ci|x)\n\n(3)\n\nGroup overlap potential. The overlap potential \u03c8o(ci, cj) is a pairwise compatibility that ensures\nthe segment assignments do not con\ufb02ict with each other. For example, if segments xi and xj are\nboth members of candidate c and xi is assigned to c, then xj must also be assigned to c.\n\n(cid:26) \u2212100,\n\n0,\n\n\u03c8o(ci, cj) =\n\nif ((xi \u2208 cj) or (xj \u2208 ci)) and (ci (cid:54)= cj)\notherwise\n\n(4)\n\nTo improve ef\ufb01ciency, instead of connecting every pair of segments that are jointly considered in\nc, we connect the segments into a loop based on temporal ordering. This accomplishes the same\nconstraint with fewer edges. An example is shown in Figure 1(d).\n\nJoint Context Potential. The context potential \u03c8c(ci, cj, xi, xj) represents the spatial compatibil-\nity between segments xi and xj, conditioned on their predicted class labels (e.g., resistor-resistor,\nresistor-wire, etc). It is encoded as a conditional probability table that counts the number of times\neach spatial relationship (\u03b81, \u03b82, \u03b83) occurred for a given class pair (see Figure 3).\n\n\u03c8c(ci, cj, xi, xj) = log Pc(\u03b8(xi, xj) | class(ci), class(cj))\n\n(5)\n\nwhere class(ci) is the predicted class for candidate ci and \u03b8(xi, xj) is the set of three spatial rela-\ntionships (\u03b81, \u03b82, \u03b83) between segments xi and xj. This potential is active only for pairs of segments\nwhose distance at the closest point is less than L/2. To build the probability table we discretize \u03b81\nand \u03b82 into bins of size \u03c0/8 and \u03b83 into bins of size L/4.\nThe entries in the conditional probability table are de\ufb01ned as:\n\nPc(\u03b8 | li, lj) =\n\nN\u03b8,classi,classj + \u03b1\n\u03b8(cid:48) N\u03b8(cid:48),classi,classj + \u03b1\n\n(6)\n\n(cid:80)\n\n5\n\nvivijvivvivijviSpatial relationships:\u03b81=angle(vi, vj)\u03b82=angle(vi, vij)\u03b83=abs(|vi|-|vj|)vjvjvjvj1i j2i ij3ij\fwhere N\u03b8,classi,classj is the number of times we observed a pair of segments with spatial relationship\n\u03b8 and class labels (classi, classj) and \u03b1 is a weak prior (\u03b1 = 10 in our experiments).\n\nInference. We apply the max-product belief propagation algorithm [22] to \ufb01nd the con\ufb01guration\nthat maximizes Equation 2. Belief propagation works by iteratively passing messages around the\nconnected nodes in the graph; each message from node i to node j contains i\u2019s belief for each\npossible state of j. In our implementation we use an \u201caccelerated\u201d message passing schedule [21]\nthat propagates messages immediately without waiting for other nodes to \ufb01nish. The procedure\nalternates between forward and backward passes through the nodes based on the temporal ordering\nof the segments, running for a total of 100 iterations.\n\n5 Evaluation\n\nOne goal of our research is to build a system that can handle the range of drawings styles found in\nnatural, real world diagrams. As a result, our data collection program was designed to behave like\na piece of paper, i.e., capturing the sketch but providing no recognition or feedback. Using the data\nwe collected, we evaluated \ufb01ve versions of our system:\nAppearance uses only the isolated appearance-based recognizer from [11].\nAppearance+Geometry uses isolated appearance and geometric features.\nAppearance+Geometry+Local uses isolated appearance, geometric features, and local context.\nComplete is the complete framework described in this paper, using our corner detector.\nComplete (corner detector from [15]) is the complete framework, using the corner detector in [15].\n(We include this comparison to evaluate the effectiveness of our corner detection algorithm.)\nNote that the \ufb01rst three versions still use the group overlap potential to select the best set of consistent\ncandidates.\n\nChemistry\nFor this evaluation we recruited 10 participants who were familiar with organic chemistry and asked\neach of them to draw 12 real world organic compounds (e.g., Aspirin, Penicillin, Sildena\ufb01l, etc) on a\nTablet PC. We performed a set of user-independent performance evaluations, testing our system on\none user while using the examples from the other 9 users as training data. By leaving out sketches\nfrom the same participant, this evaluation demonstrates how well our system would perform on a\nnew user.\nFor this domain we noticed that users almost never drew multiple symbols using a single stroke,\nwith the exception of multiple connected straight bonds (e.g., rings). Following this observation, we\noptimized our candidate extractor to \ufb01lter out multi-segment candidates that break stroke boundaries.\n\nMethod\nComplete (corner detector from [15])\nAppearance\nAppearance+Geometry\nAppearance+Geometry+Local\nComplete\n\nAccuracy\n0.806\n0.889\n0.947\n0.958\n0.971\n\nTable 1: Overall recognition accuracy for the chemistry dataset.\n\nNote that for this dataset we report only accuracy (recall), because, unlike traditional object detec-\ntion, there are no overlapping detections and every stroke is assigned to a symbol. Thus, a false\npositive always causes a false negative, so recall and precision are redundant: e.g., misclassifying\none segment in a three-segment \u201cH\u201d makes it impossible to recognize the original \u201cH\u201d correctly.\nThe results in Table 1 show that our method was able to recognize 97% of the symbols correctly.\nTo be considered a correct recognition, a predicted symbol needs to match both the segmentation\nand class of the ground truth label. By modeling joint context, the complete framework was able\nto reduce the error rate by 31% compared to the next best method. Figure 4 (top) shows several\nsketches interpreted by our system. We can see that the diagrams in this dataset can be very messy,\n\n6\n\n\fand exhibit a wide range of drawing styles. Notice that in the center diagram, the system made two\nerrors because the author drew hash bonds differently from all the other users, enclosing them inside\na triangle.\n\nCircuits\nThe second dataset is a collection of circuit diagrams collected by Oltmans and Davis [9]. The\nexamples were from 10 users who were experienced in basic circuit design. Each user drew ten or\neleven different circuits, and every circuit was required to include a pre-speci\ufb01ed set of components.\nWe again performed a set of user-independent performance evaluations. Because the exact locations\nof the ground truth labels are somewhat subjective (i.e., it is not obvious whether the resistor label\nshould include the short wire segments on either end), we adopt the same evaluation metric used in\nthe Pascal Challenge [2] and in [9]: a prediction is considered correct if the area of overlap between\nits bounding box and the ground truth label\u2019s bounding box is greater than 50% of the area of their\nunion. Also, since we do not count wire detections for this dataset (as in [9]), we report precision as\nwell as recall.\n\nMethod\nOltmans 2007 [9]\nComplete (corner detector from [15])\nAppearance\nAppearance+Geometry\nAppearance+Geometry+Local\nComplete\n\nPrecision Recall\n0.739\n0.257\n0.802\n0.831\n0.824\n0.710\n0.832\n0.774\n0.879\n0.874\n0.912\n0.908\n\nTable 2: Overall recognition accuracy for the circuit diagram dataset.\n\nTable 2 shows that our method was able to recognize over 91% of the circuit symbols correctly.\nCompared to the next best method, the complete framework was able to reduce the error rate by 30%.\nOn this dataset Oltmans and Davis [9] were able to achieve a best recall of 73.9% at a precision of\n25.7%. Compared to their reported results, we reduced the error rate by 66% and more than triple the\nprecision. As Figure 4 (bottom) shows, this is a very complicated and messy corpus with signi\ufb01cant\ndrawing variations like overtracing and pen drag.\n\nRuntime\nIn the evaluations above, it took on average 0.1 seconds to process a new stroke in the circuits\ndataset and 0.02 seconds for the chemistry dataset (running on a 3.6 GHz machine, single-thread).\nWith incremental interpretation, the system should be able to easily keep up in real time.\n\nRelated Work\nSketch recognition is a relatively new \ufb01eld, and we did not \ufb01nd any publicly available benchmarks\nfor the domains we evaluated. In this section, we summarize the performance of existing systems\nthat are similar to ours. Alvarado and Davis [1] proposed using dynamically constructed Bayesian\nnetworks to represent the contextual relationships between geometric primitives. They achieved an\naccuracy of 62% on a circuits dataset similar to ours, but needed to manually segment any strokes\nthat contained more than one symbol. Gennari et al [3] developed a system that searches for symbols\nin high density regions of the sketch and uses domain knowledge to correct low level recognition\nerrors. They reported an accuracy of 77% on a dataset with 6 types of circuit components. Sezgin\nand Davis [16] proposed using an HMM to model the temporal patterns of geometric primitives, and\nreported an accuracy of 87% on a dataset containing 4 types of circuit components.\nShilman et. al. [17] proposed an approach that treats sketch recognition as a visual parsing problem.\nOur work differs from theirs in that we use a rich model of low-level visual appearance and do\nnot require a pre-de\ufb01ned spatial grammar. Ouyang and Davis [10] developed a sketch recognition\nsystem that uses domain knowledge to re\ufb01ne its interpretation. Their work focused on chemical\ndiagrams, and detection was limited to symbols drawn using consecutive strokes. Outside of the\nsketch recognition community, there is also a great deal of interest in combining appearance and\ncontext for problems in computer vision [6, 8, 19].\n\n7\n\n\fFigure 4: Examples of chemical diagrams (top) and circuit diagrams (bottom) recognized by our\nsystem (complete framework). Correct detections are highlighted in green (teal for hash and wedge\nbonds), false detections in red, and missed symbols in orange.\n\n6 Discussion\n\nWe have proposed a new framework that combines a rich representation of low level visual appear-\nance with a probabilistic model for capturing higher level relationships. To our knowledge this is\nthe \ufb01rst paper to combine these two approaches, and the result is a recognizer that is better able\nto handle the range of drawing styles found in messy freehand sketches. To preserve the familiar\nexperience of using pen and paper, our system supports the same symbols, notations, and drawing\nstyles that people are already accustomed to.\nIn our initial evaluation we apply our method on two real-world domains, chemical diagrams and\nelectrical circuits (with 10 types of components), and achieve accuracy rates of 97% and 91% re-\nspectively. Compared to existing benchmarks in literature, our method achieved higher accuracy\neven though the other systems supported fewer symbols [3, 16], trained on data from the same user\n[3, 16], or required manual pre-segmentation [1].\nAcknowledgements\nThis research was supported in part by a DHS Graduate Research Fellowship and a grant from P\ufb01zer,\nInc. We thank Michael Oltmans for kindly making his dataset available to us.\n\n8\n\n\fReferences\n[1] C. Alvarado and R. Davis. Sketchread: A multi-domain sketch recognition engine. In Proc.\n\nACM Symposium on User Interface Software and Technology, 2004.\n\n[2] M. Everingham, L. Van Gool, C. Williams, J. Winn, and A. Zisserman. The pascal visual\n\nobject classes challenge 2008 results, 2008.\n\n[3] L. Gennari, L. Kara, T. Stahovich, and K. Shimada. Combining geometry and domain knowl-\n\nedge to interpret hand-drawn diagrams. Computers & Graphics, 29(4):547\u2013562, 2005.\n\n[4] M. Gross. The electronic cocktail napkina computational environment for working with design\n\ndiagrams. Design Studies, 17(1):53\u201369, 1996.\n\n[5] T. Hammond and R. Davis. Ladder: a language to describe drawing, display, and editing in\nsketch recognition. In Proc. International Conference on Computer Graphics and Interactive\nTechniques, 2006.\n\n[6] X. He, R. Zemel, and M. Carreira-Perpinan. Multiscale conditional random \ufb01elds for image\n\nlabeling. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2004.\n\n[7] L. Kara and T. Stahovich. An image-based, trainable symbol recognizer for hand-drawn\n\nsketches. Computers & Graphics, 29(4):501\u2013517, 2005.\n\n[8] K. Murphy, A. Torralba, and W. Freeman. Using the forest to see the trees: a graphical model\nrelating features, objects and scenes. Advances in Neural Information Processing Systems,\n2003.\n\n[9] M. Oltmans. Envisioning Sketch Recognition: A Local Feature Based Approach to Recognizing\nInformal Sketches. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, May\n2007.\n\n[10] T. Y. Ouyang and R. Davis. Recognition of hand drawn chemical diagrams. In Proc. AAAI\n\nConference on Arti\ufb01cial Intelligence, 2007.\n\n[11] T. Y. Ouyang and R. Davis. A visual approach to sketched symbol recognition.\n\nInternational Joint Conferences on Arti\ufb01cial Intelligence, 2009.\n\nIn Proc.\n\n[12] J. Platt. Probabilities for sv machines. Advances in Neural Information Processing Systems,\n\n1999.\n\n[13] J. Platt. Sequential minimal optimization: A fast algorithm for training support vector ma-\n\nchines. Advances in Kernel Methods-Support Vector Learning, 1999.\n\n[14] X. Ren and J. Malik. Learning a classi\ufb01cation model for segmentation. In Proc. IEEE Inter-\n\nnational Conference on Computer Vision, pages 10\u201317, 2003.\n\n[15] T. Sezgin and R. Davis. Sketch based interfaces: Early processing for sketch understanding. In\nProc. International Conference on Computer Graphics and Interactive Techniques. ACM New\nYork, NY, USA, 2006.\n\n[16] T. Sezgin and R. Davis. Sketch recognition in interspersed drawings using time-based graphical\n\nmodels. Computers & Graphics, 32(5):500\u2013510, 2008.\n\n[17] M. Shilman, H. Pasula, S. Russell, and R. Newton. Statistical visual language models for ink\n\nparsing. Proc. AAAI Spring Symposium on Sketch Understanding, 2002.\n\n[18] M. Shilman, P. Viola, and K. Chellapilla. Recognition and grouping of handwritten text in\ndiagrams and equations. In Proc. International Workshop on Frontiers in Handwriting Recog-\nnition, 2004.\n\n[19] J. Shotton, J. Winn, C. Rother, and A. Criminisi. Textonboost: Joint appearance, shape and\ncontext modeling for multi-class object recognition and segmentation. Lecture Notes in Com-\nputer Science, 3951:1, 2006.\n\n[20] M. Szummer. Learning diagram parts with hidden random \ufb01elds. In Proc. International Con-\n\nference on Document Analysis and Recognition, 2005.\n\n[21] M. Tappen and W. Freeman. Comparison of graph cuts with belief propagation for stereo,\nusing identical mrf parameters. In Proc. IEEE International Conference on Computer Vision,\n2003.\n\n[22] J. Yedidia, W. Freeman, and Y. Weiss. Understanding belief propagation and its generaliza-\n\ntions. Exploring Arti\ufb01cial Intelligence in the New Millennium, pages 239\u2013269, 2003.\n\n9\n\n\f", "award": [], "sourceid": 520, "authors": [{"given_name": "Tom", "family_name": "Ouyang", "institution": null}, {"given_name": "Randall", "family_name": "Davis", "institution": null}]}