{"title": "Global seismic monitoring as probabilistic inference", "book": "Advances in Neural Information Processing Systems", "page_first": 73, "page_last": 81, "abstract": "The International Monitoring System (IMS) is a global network of sensors whose purpose is to identify potential violations of the Comprehensive Nuclear-Test-Ban Treaty (CTBT), primarily through detection and localization of seismic events. We report on the first stage of a project to improve on the current automated software system with a Bayesian inference system that computes the most likely global event history given the record of local sensor data. The new system, VISA (Vertically Integrated Seismological Analysis), is based on empirically calibrated, generative models of event occurrence, signal propagation, and signal detection. VISA exhibits significantly improved precision and recall compared to the current operational system and is able to detect events that are missed even by the human analysts who post-process the IMS output.", "full_text": "Global seismic monitoring as probabilistic inference\n\nNimar S. Arora\n\nDepartment of Computer Science\nUniversity of California, Berkeley\n\nBerkeley, CA 94720\n\nnimar@cs.berkeley.edu\n\nStuart Russell\n\nDepartment of Computer Science\nUniversity of California, Berkeley\n\nBerkeley, CA 94720\n\nrussell@cs.berkeley.edu\n\nPaul Kidwell\n\nErik Sudderth\n\nLawrence Livermore National Lab\n\nDepartment of Computer Science\n\nLivermore, CA 94550\n\nkidwell1@llnl.gov\n\nBrown University\n\nProvidence, RI 02912\n\nsudderth@cs.brown.edu\n\nAbstract\n\nThe International Monitoring System (IMS) is a global network of sensors whose\npurpose is to identify potential violations of the Comprehensive Nuclear-Test-Ban\nTreaty (CTBT), primarily through detection and localization of seismic events.\nWe report on the \ufb01rst stage of a project to improve on the current automated\nsoftware system with a Bayesian inference system that computes the most likely\nglobal event history given the record of local sensor data. The new system, VISA\n(Vertically Integrated Seismological Analysis), is based on empirically calibrated,\ngenerative models of event occurrence, signal propagation, and signal detection.\nVISA exhibits signi\ufb01cantly improved precision and recall compared to the current\noperational system and is able to detect events that are missed even by the human\nanalysts who post-process the IMS output.\n\n1\n\nIntroduction\n\nThe CTBT aims to prevent the proliferation and the advancement of nuclear weapon technology\nby banning all nuclear explosions. A global network of seismic, radionuclide, hydroacoustic, and\ninfrasound sensors, the IMS, has been established to enforce the treaty. The IMS is the world\u2019s\nprimary global-scale, continuous, real-time system for seismic event monitoring. Data from the IMS\nsensors are transmitted via satellite in real time to the International Data Center (IDC) in Vienna,\nwhere automatic event-bulletins are issued at prede\ufb01ned latency. Perfect performance remains well\nbeyond the reach of current technology: the IDC\u2019s automated system, a highly complex and well-\ntuned piece of software, misses nearly one third of all seismic events in the magnitude range of\ninterest, and about half of the reported events are spurious. A large team of expert analysts post-\nprocesses the automatic bulletins to improve their accuracy to acceptable levels.\nLike most current systems, the IDC operates by detection of arriving signals at each sensor station\n(the station processing stage) and then grouping multiple detections together to form events (the\nnetwork processing stage).1 The time and location of each event are found by various search methods\nincluding grid search [2], the double-difference algorithm [3], and the intersection method [4]. In\nthe words of [5], \u201cSeismic event location is\u2014at its core\u2014a minimization of the difference between\nobserved and predicted arrival times.\u201d Although the mathematics of seismic event detection and\n\n1Network processing is thus a data association problem similar to those arising in multitarget tracking [1].\n\n1\n\n\flocalization has been studied for almost 100 years [6], the IDC results indicate that the problem is\nfar from trivial.\nThere are three primary sources of dif\ufb01culty: 1) the travel time between any two points on the earth\nand the attenuation of various frequencies and wave types are not known accurately; 2) each detector\nis subject to local noise that may mask true signals and cause false detections (as much as 90% of\nall detections are false); and 3) there are many thousands of detections per day, so the combinatorial\nproblem of proposing and comparing possible events (subsets of detections) is daunting. These con-\nsiderations suggest that an approach based on probabilistic inference and combination of evidence\nmight be effective, and this paper demonstrates that this is in fact the case. For example, such an\napproach automatically takes into account non-detections as negative evidence for a hypothesized\nevent, something that classical methods cannot do.\nIn simple terms, let X be a random variable ranging over all possible collections of events, with\neach event de\ufb01ned by time, location, magnitude, and type (natural or man-made). Let Y range\nover all possible waveform signal recordings at all detection stations. Then P\u03b8(X) describes a\nparameterized generative prior over events, and P\u03c6(Y | X) describes how the signal is propagated\nand measured (including travel time, selective absorption and scattering, noise, artifacts, sensor\nbias, sensor failures, etc.). Given observed recordings Y = y, we are interested in the posterior\nP (X | Y = y), and perhaps in the value of X that maximizes it\u2014i.e., the most likely explanation\nfor all the sensor readings. We also learn the model parameters \u03b8 and \u03c6 from historical data.\nOur overall project, VISA (Vertically Integrated Seismic Analysis), is divided into two stages. The\n\ufb01rst stage, NET-VISA, is the subject of the current paper. As the name suggests, NET-VISA deals\nonly with network processing and relies upon the IDC\u2019s pre-existing signal detection algorithms.\n(The second stage, SIG-VISA, will incorporate a signal waveform model and thereby subsume the\ndetection function.) NET-VISA computes a single most-likely explanation: a set of hypothesized\nevents with their associated detections, marking all other detections as noise. This input-output\nspeci\ufb01cation, while not fully Bayesian in spirit, enables direct comparison to the current automated\nsystem bulletin, SEL3. Using the \ufb01nal expert-generated bulletin, LEB, as ground truth, we compared\nthe two systems on 7 days of held-out data. NET-VISA has 16% more recall at the same precision\nas SEL3, and 25% more precision at the same recall as SEL3. Furthermore, taking data from the\nmore comprehensive NEIC (National Event Information Center) database as ground truth for the\ncontinental United States, we \ufb01nd that NET-VISA is able to detect events in the IMS data that are\nnot in the LEB report produced by IDC\u2019s expert analysts; thus, NET-VISA\u2019s true performance may\nbe higher than the LEB-based calculation would suggest.\nThe rest of the paper is structured as follows. Section 2 describes the problem in detail and cov-\ners some elementary seismology. Sections 3 and 4 describe the probability model and inference\nalgorithm. Section 5 presents the results of our evaluation, and Section 6 concludes.\n\n2 The Seismic Association and Localization Problem\n\nSeismic events are disturbances in the earth\u2019s crust. Our work is concerned primarily with earth-\nquakes and explosions (nuclear and conventional), but other types of events\u2014waves breaking, trees\nfalling, ice falling, etc.\u2014may generate seismic waves too. All such waves occur in a variety of types\n[7]\u2014body waves that travel through the earth\u2019s interior and surface waves that travel on the surface.\nThere are two types of body waves\u2014compression or P waves and shear or S waves. There are also\ntwo types of surface waves\u2014Love and Rayleigh. Further, body waves may be re\ufb02ected off different\nlayers of the earth\u2019s crust and these are labeled distinctly by seismologists. Each particular wave\ntype generated by a given event is called a phase. These waves are picked up in seismic stations\nas ground vibrations. Typically, seismic stations have either a single 3-axis detector or an array\nof vertical-axis detectors spread over a scale of many kilometers. Most detectors are sensitive to\nnanometer-scale displacements, and so are quite susceptible to noise.\nRaw seismometer measurements are run through standard signal processing software that \ufb01lters out\nnon-seismic frequencies and computes short-term and long-term averages of the signal amplitude.\nWhen the ratio of these averages exceeds a \ufb01xed threshold, a detection is announced. Various\nparameters of the detection are measured\u2014onset time, azimuth (direction from the station to the\nsource of the wave), slowness (related to the angle of declination of the signal path), amplitude, etc.\n\n2\n\n\fBased on these parameters, a phase label may be assigned to the detection based on the standard\nIASPEI phase catalog [7]. All of these detection attributes may be erroneous.\nThe problem that we attempt to solve in this paper is to take a continuous stream of detections\n(with onset time, azimuth, slowness, amplitude, and phase label) from the roughly 120 IMS seismic\nstations as input and produce a continuous stream of events and associations between events and\ndetections. The parameters of an event are its longitude, latitude, depth, time, and magnitude (mb\nor body-wave magnitude). A 3-month dataset (660 GB) has been made available by the IDC for the\npurposes of this research. We have divided the dataset into 7 days of validation, 7 days of test, and\nthe rest as training data. We compute the accuracy of an event history hypothesis by comparison to a\nchosen ground-truth history. A bipartite graph is created between predicted and true events. An edge\nis added between a predicted and a true event that are at most 5 degrees in distance2 and 50 seconds\nin time apart. The weight of the edge is the distance between the two events. Finally, a min-weight\nmax-cardinality matching is computed on the graph. We report 3 quantities from this matching\u2014\nprecision (percentage of predicted events that are matched), recall (percentage of true events that\nare matched), and average error (average distance in kilometers between matched events).\n\n3 Generative Probabilistic Model\n\nOur generative model for seismic events and detections follows along the lines of the aircraft de-\ntection model in [8, Figure 3]. In our model, there is an unknown number of seismic events with\nunknown parameters (location, time, etc.). These events produce 14 different types of seismic waves\nor phases. A phase from an event may or may not be detected by a station. If a phase is detected at\na station, a corresponding detection is generated. However, the parameters of the detection may be\nimprecise. Additionally, an unknown number of noise detections are generated at each station. For\nNET-VISA, the evidence Y = y consists only of each station\u2019s set of detections and their parameters.\n\n3.1 Events\n\nThe events are generated by a time-homogeneous Poisson process.\nIf e is the set of events (of\nsize |e|), \u03bbe is the rate of event generation, and T is the time period under consideration, we have\n(1)\n\n(\u03bbe \u00b7 T )|e| exp (\u2212\u03bbe \u00b7 T )\n\n.\n\nP\u03b8(|e|) =\n\n|e|!\n\nThe longitude and latitude of the ith event, ei\non the surface of the earth. The depth of the event, ei\ndepth D (700 km in our experiments). Similarly, the time of the event ei\nbetween 0 and T . The magnitude of the event, ei\nGutenberg-Richter distribution, which is in fact an exponential distribution with rate \u03bbm:\n\nl are drawn from an event location density, pl(el)\nd is uniformly distributed up to a maximum\nt is uniformly distributed\nm, is drawn from what seismologists refer to as the\n\nP\u03b8(ei) = pl(ei\nl)\n\n1\nD\n\n1\nT\n\nSince all the events are exchangeable, we have\n\n(cid:1) .\n\n\u2212\u03bbmei\n\nm\n\n\u03bbm exp(cid:0)\n|e|(cid:89)\n\ni=1\n\n|e|(cid:89)\n\ni=1\n\n\u03bbe\u03bbm exp(cid:0)\n\n(cid:1) .\n\n\u2212\u03bbmei\n\nm\n\npl(ei\nl)\n\n1\nD\n\n(2)\n\n(3)\n\nP\u03b8(e) = P\u03b8(|e|) \u00b7 |e|! \u00b7\n\nP\u03b8(ei) = exp (\u2212\u03bbe \u00b7 T )\n\nMaximum likelihood estimates of \u03bbe and \u03bbm may be easily determined from historical event fre-\nquencies and magnitudes. To approximate pl(el), we use a kernel density estimate derived from the\nfollowing exponentially decaying kernel:\n\nKb,x(y) =\n\n1 + 1/b2\n\n2\u03c0R2\n\nexp (\u2212\u2206xy/b)\n1 + exp (\u2212\u03c0/b)\n\n.\n\n(4)\n\n2In this paper, by distance between two points on the surface of the earth we refer to the great-circle distance.\n\nThis can be represented in degrees, radians, or kilometers (using the average earth radius of 6371 km).\n\n3\n\n\fFigure 1: Heat map (large values in red, small in blue) of the prior event location density log pl(el).\n\nHere b > 0 is the bandwidth, \u2206xy is the distance (in radians) between locations x and y on the\nsurface of the earth, and R is the earth\u2019s radius. The bandwidth was estimated via cross-validation.\nIn addition, we additively mixed this kernel density with a uniform distribution, with prior proba-\nbility 0.001, to allow the possibility of explosions at an arbitrary location. The overall density, as\nillustrated in Figure 1, was pre-computed on a one degree grid and interpolated during inference.\n\n3.2 Correct Detections\nThe probability that an event\u2019s jth phase, 1 \u2264 j \u2264 J, is detected by a station k, 1 \u2264 k \u2264 K,\ndepends on the wave type or phase, the station, and the event\u2019s magnitude, depth, and distance to the\nstation. Let dijk be a binary indicator variable for such a detection of event i, and \u2206ik the distance\nbetween event i and station k. Then we have\nm, ei\n\nd, \u2206ik).\n\n(5)\n\nd (ei\n\nP\u03c6(dijk = 1 | ei) = pjk\n\nIf an event phase is detected at a station, i.e. dijk = 1, our model speci\ufb01es probability distribution\nfor the attributes of that detection, aijk. The arrival time, aijk\n, is assigned a Laplacian distribu-\ntion whose mean consists of two parts. The \ufb01rst is the IASPEI travel time prediction for that phase,\nwhich depends only on the event depth and the distance between the event and station. The second is\na learned station-speci\ufb01c correction which accounts for inhomogeneities in the earth\u2019s crust, which\nallow seismic waves to travel faster or slower than the IASPEI prediction. The station-speci\ufb01c cor-\nrection also accounts for any systematic biases in picking seismic onsets from waveforms. Let \u00b5jk\nt\nbe the location of this Laplacian (a function of the event time, depth, and distance to the station)\nand let bjk\nt be its scale. Truncating this Laplacian to the range of possible arrival times produces a\nnormalization constant Z jk\nt\n\n, so that\n\nt\n\n(cid:32)\n\u2212|aijk\n\n(cid:33)\n\nP\u03c6(aijk\n\nt\n\n| dijk = 1, ei) =\n\n1\nZ jk\nt\n\nexp\n\nt \u2212 \u00b5jk\nt (ei\nbjk\nt\n\nt, ei\n\nd, \u2206ik)|\n\n.\n\n(6)\n\nof the\nSimilarly, the arrival azimuth and slowness follow a Laplacian distribution. The location aijk\narrival azimuth depends only on the location of the event, while the location aijk\ns of the arrival slow-\nness depends only on the event depth and distance to the station. The scales of all these Laplacians\nare \ufb01xed for a given phase and station, so that\n\nz\n\nP\u03c6(aijk\n\nz\n\nP\u03c6(aijk\n\ns\n\n| dijk = 1, ei) =\n| dijk = 1, ei) =\n\n1\nZ jk\nz\n1\nZ jk\ns\n\nexp\n\nexp\n\nz (ei\nl)|\n\n\u2212|aijk\n\u2212|aijk\n\nz \u2212 \u00b5jk\nbjk\nz\ns \u2212 \u00b5jk\ns (ei\nbjk\ns\n\nd, \u2206ik)|\n\n(cid:19)\n\n.\n\n(7)\n\n(8)\n\n(cid:19)\n\n,\n\n(cid:18)\n(cid:18)\n\n4\n\n\f(cid:33)\n\nThe arrival amplitud aijk\nis similar to the detection probability in that it depends only on the event\nmagnitude, depth, and distance to the station. We model the log of the amplitude via a linear regres-\nsion model with Gaussian noise:\n\na\n\n(cid:32)\n\nP\u03c6(aijk\n\na\n\n| dijk = 1, ei) =\n\n1\n\n\u221a2\u03c0\u03c3jk\n\na\n\nexp\n\n\u2212\n\n(log(aijk\n\nm, ei\na (ei\na ) \u2212 \u00b5jk\n2\n2\u03c3jk\na\n\nd, \u2206ik))2\n\n.\n\n(9)\n\nFinally, the phase label aijk\nh\ntion whose parameters depends on the true phase, j:\n\nautomatically assigned to the detection follows a multinomial distribu-\n\nP\u03c6(aijk\nh\n\n| dijk = 1, ei) = pjk\n\nh (aijk\nh ).\n\n(10)\n\nThe phase- and station-speci\ufb01c detection distributions, pjk\nd (\u00b7), were obtained using logistic re-\ngression models estimated via a hierarchical Bayesian procedure [9]. Because phase labels indi-\ncate among other things the general physical path taken from an event to a station, a distinct set\nof features were learned from the event characteristics for each phase. To estimate the individ-\nual station weights \u03b1wjk for each phase j and feature w, a hierarchical model was speci\ufb01ed in\nwhich each station-speci\ufb01c weight is independently drawn from a feature-dependent global Normal\nwj). Weakly informative diffuse priors \u00b5wj \u223c N (0, 1002),\ndistribution, so that \u03b1wjk \u223c N (\u00b5wj, \u03c32\n\u03c3\u22122\nwj \u223c Gamma(0.01, 0.01), were placed on the parameters of these global distributions, and pos-\nterior mean estimates of the station-speci\ufb01c weights obtained via Gibbs sampling. Figure 2 shows\ntwo of the empirical and modeled distributions for one phase-site.\n\nFigure 2: Conditional detection probabilities and arrival time distributions (relative to the IASPEI\nprediction) for the P phase at Station 6.\n\n3.3 False Detections\n\n(cid:17)\n\n(cid:16)\n\nEach station, k, also generates a set of false detections f k through a time-homogeneous Poisson\nf :\nprocess with rate \u03bbk\nf \u00b7 T )|f k| exp\n|f k|!\nz , and slowness f kl\n\nt , azimuth f kl\n\n\u2212\u03bbk\n\nf \u00b7 T\n\n(\u03bbk\n\nP\u03c6(|f k|) =\nThe time f kl\ntheir respective ranges. The amplitude f kl\nGaussians, pk\ndistribution, pk\nrespectively, then the probability of the lth false detection is given by\n\ns of these false detections are generated uniformly over\na of the false detection is generated from a mixture of two\nh assigned to the false detection follows a multinomial\nh ). If the azimuth and slowness take values on ranges of length Mz and Ms,\n\na ). Finally, the phase label f kl\nh(f kl\n\na(f kl\n\n(11)\n\n.\n\nP\u03c6(f kl) =\n\n1\nT\n\n1\nMz\n\n1\nMs\n\npk\na(f kl\n\na )pk\n\nh(f kl\n\nh ) .\n\nSince the false detections at a station are exchangeable, we have\n\nP\u03c6(f k) = P\u03c6(|f k|) \u00b7 |f k|!\n\nl=|f k|(cid:89)\n\nP\u03c6(f kl) = exp(cid:0)\n\nl=1\n\n5\n\n(12)\n\n(13)\n\n\u03bbk\nf\n\nMzMs\n\npk\na(f kl\n\na )pk\n\nh(f kl\n\nh ) .\n\nf \u00b7 T(cid:1) l=|f k|(cid:89)\n\nl=1\n\n\u2212\u03bbk\n\n020406080100120140160180Distance(deg)0.00.20.40.60.81.0ProbabilityDetectionprobabilityatstation6forPphase,surfaceeventmodel3.5mbdata3\u20134mb\u22126\u22124\u221220246Time0.000.020.040.060.080.10ProbabilityTimeResidualsaroundIASPEIpredictionforPphaseatstation6modeldata\f4\n\nInference\n\nCombining the model components developed in the preceding section, the overall probability of any\nhypothesized sequence of events e, detected event phases d, arrival attributes a for correctly detected\nevent phases, and arrival attributes f for falsely detected events is\n\nP (e, d, a, f ) = P\u03b8(e)P\u03c6(d | e)P\u03c6(a | d, e)P\u03c6(f ).\n\n(14)\nWe will attempt to \ufb01nd the most likely explanation consistent with the observations. This involves\ndetermining e, d, a, and f which maximize P (e, d, a, f ), such that the set of detections implied\nby d, a, and f correspond exactly with the observed detections. Since detections from real seismic\nsensors are observed incrementally and roughly in time-ascending order, our inference algorithm\nalso produces an incremental hypothesis which advances with time. Our algorithm can be seen as a\nform of greedy search, in which the current hypothesis is improved via a set of local moves.\nLet MT denote the maximum travel time for any phase. Initially, we start with an event-window\nof size W from t0 = 0 to t1 = W , and a detection-window of size W + MT from t0 = 0 to\nt1 = W + MT . Our starting hypothesis is that all detections in our detection-window are false\ndetections and there are no events. We then repeatedly apply the birth, death, improve-event, and\nimprove-detection moves (described below) for a \ufb01xed number of iterations (N times the number of\ndetections in that window) before shifting the windows forward by a step size S. Any new detections\nadded to the detection window are again assumed to be false detections. As the windows move\nforward the events older than t0 \u2212 MT become stable: none of the moves modify either the event or\ndetections associated with them. These events are then output. While in theory this algorithm never\nneeds to terminate, our experiments continue until the test dataset is fully consumed.\nIn order to simplify the computations needed to compare alternate hypotheses, we decompose the\noverall probability of Eq. (14) into the contribution from each event. We de\ufb01ne the score Se of an\nevent as the probability ratio of two hypotheses: one in which the event exists, and another in which\nthe event doesn\u2019t exist and all of its associated detections are noise. If an event has score less than 1,\nan alternative hypothesis in which the event is deleted clearly has higher probability. Critically, this\nevent score is unaffected by other events in the current hypothesis. From Eqs. (3) and (13) we have\n\nSe(ei) =\n\npl(ei\n\nl)\u03bbe\u03bbm\nD\n\nexp\n\nP\u03c6(dijk | ei)\n\n(cid:16)\u2212\u03bbmei\n\nm\n\n(cid:17)(cid:89)\n\nj,k\n\n\uf8eb\uf8ed\u03b4(dijk, 0) + \u03b4(dijk, 1)\n\n\uf8f6\uf8f8 .\n\nP\u03c6(aijk | dijk, ei)\nh(f kl\nh )\n\na(f kl\npk\n\na )pk\n\n\u03bbk\nf\n\nMzMs\n\nNote that the \ufb01nal fraction is a likelihood ratio comparing interpretations of the same detection as\neither the detection of event i\u2019s jth phase at station k, or the lth false detection at station k. We\ncan further decompose the score into scores Sd for each detection. The score of dijk, de\ufb01ned when\ndijk = 1, is the ratio of the probabilities of the hypothesis where the detection is associated with\nphase j of event i at station k, and one in which this detection is false and phase j of event i is\nmissed by station k:\n\nSd(dijk) =\n\npjk\nd (ei\n1 \u2212 pjk\n\nm, ei\n\nd, \u2206ik)\n\nd (ei\n\nm, ei\n\nd, \u2206ik)\n\nP\u03c6(aijk | dijk, ei)\nh(f kl\nh )\n\npk\na(f kl\n\na )pk\n\n\u03bbk\nf\n\nMzMs\n\n.\n\n(15)\n\nBy de\ufb01nition, any detection with score less than 1 is more likely to be a false detection. Also, the\nscore of an individual detection is independent of other detections and unassociated events in the\nhypothesis. These scores play a key role in the following local search moves.\n\nBirth Move We randomly pick a detection, invert it into an event location (using the detection\u2019s\ntime, azimuth, and slowness), and sample an event in a 10 degree by 100 second ball around this\ninverted location. The depth of the event is \ufb01xed at 0, and the magnitude is uniformly sampled.\n\nImprove Detections Move For each detection in the detection window, we consider all possible\nphases j of all events i up to MT seconds earlier. We then associate the best event-phase for this\ndetection that is not already assigned to a detection with higher score at the same station k. If this\nbest event-phase has score Sd(dijk) < 1, the detection is changed to a false detection.\n\nImprove Events Move For each event ei, we consider 10 points chosen uniformly at random in a\nsmall ball around the event (2 degrees in longitude and latitude, 100 km in depth, 5 seconds in time,\nand 2 units of magnitude), and choose those attributes with the highest score Se(ei).\n\n6\n\n\fFigure 3: Precision-recall performance of the proposed NET-VISA and deployed SEL3 algorithms,\ntreating the analyst-generated LEB as ground truth.\n\nDeath Move Any event ei with score Se(ei) < 1 is deleted, and all of its currently associated\ndetections are marked as false alarms.\n\nFinal Pruning Before outputting event hypotheses, we perform a \ufb01nal round of pruning to remove\nsome duplicate events. In particular, we delete any event for which there is another higher-scoring\nevent within 5 degrees distance and 50 seconds time. Such spurious, or shadow, event hypotheses\narise because real seismic events generate many more phases than we currently model. In addition,\na single phase may sometimes generate multiple detections due to waveform processing, or \u201cpick\u201d,\nerrors. These additional unmodeled detections, when taken together, often suggest an additional\nevent at about the same location and time as the original event.\nNote that the birth move is not a greedy move: the proposed event will almost always have a score\nSe(ei) < 1 until some number of detections are assigned in subsequent moves. The overall structure\nof these moves could be easily converted to an MCMC or simulated annealing algorithm. However,\nin our experiments this search outperformed simple MCMC methods in terms of speed and accuracy.\n\n5 Experimental Results\n\nAs discussed in Section 2, we measure the precision, recall, and average error of our predictions via\nan assumed ground truth. We \ufb01rst treat the IMS analyst-generated LEB as ground truth, and com-\npare the performance of our NET-VISA algorithm to the currently deployed SEL3 system. Using\nthe scores for hypothesized events, we have generated a precision-recall curve for NET-VISA, and\nmarked SEL3 on it as a point (see Figure 3). Also in this \ufb01gure, we show a precision-recall curve\nfor SEL3 using scores from an SVM trained to classify true and false SEL3 events [10] (SEL3 ex-\ntrapolation). As shown in the \ufb01gure, NET-VISA has at least 16% more recall at the same precision\nas SEL3, and at least 25% more precision at the same recall as SEL3.\nThe true precision of NET-VISA is perhaps higher than this comparison suggests. We have evaluated\nthe recall of LEB and NET-VISA with the NEIC dataset as ground truth. Since the NEIC has many\nmore sensors in the United States than the IMS, it is considered a more reliable summary of seismic\nactivity in this region. Out of 33 events in the continental United States, LEB found 4, and NET-\nVISA found 8 including the 4 found by LEB.\nFigure 4 shows the recall and error divided among different types of LEB events. The table on\nthe left shows a break-down by LEB event magnitude. For magnitudes up to 4, NET-VISA has\nnearly 20% higher recall with similar error. The table on the right shows a break-down by azimuth\n\n7\n\n0.40.50.60.70.80.91.0precision0.40.50.60.70.80.91.0recallPrecision-RecallcurvewithLEBasgroundtruthSEL3SEL3extrapolationNET-VISA\fmb\n0 \u2013 2\n2 \u2013 3\n3 \u2013 4\n> 4\nall\n\nCount\n74\n36\n558\n164\n832\n\nSEL3\n\nRecall\n64.9\n50.0\n66.5\n86.6\n69.7\n\nNET-VISA\nErr\n91\n171\n109\n80\n103\n\nErr Recall\n85.1\n101\n75.0\n186\n85.1\n104\n70\n93.3\n86.3\n99\n\nAzimuth\n\nGap\n0 \u2013 90\n90 \u2013 180\n180 \u2013 270\n270 \u2013 360\n\nall\n\nCount\n72\n315\n302\n143\n832\n\nSEL3\n\nRecall\n100.0\n88.9\n51.0\n51.0\n69.7\n\nNET-VISA\nErr\n38\n72\n126\n187\n103\n\nErr Recall\n100.0\n28\n93.7\n76\n82.1\n134\n176\n72.0\n86.3\n99\n\nFigure 4: Recall and error (km) broken down by LEB event magnitude and azimuth gap (degrees).\n\ngap, de\ufb01ned as the largest difference in consecutive event-to-station azimuths for stations which\ndetect an event. Large gaps indicate that the event location is under-constrained. For example, if all\nstations are to the southwest of an event, the gap is greater than 270 degrees and the event will be\npoorly localized along a line running from southwest to northeast. By using evidence about missed\ndetections ignored by SEL3, NET-VISA reduces this uncertainty and performs much better.\nAll of the results in this section were produced using 7 days of data from the validation set. The\ninference used a window size, W , of 30 minutes, a step size, S, of 15 minutes, and N = 1000\niterations. There were a total of 832 LEB events during this period, and roughly 120,000 detections.\nThe inference took about 4.5 days on a single core running at 2.5 GHz. Estimating model parameters\nfrom 2.5 months of training data took about 1 hour.\n\n6 Conclusions and Further Work\n\nOur results demonstrate that a Bayesian approach to seismic monitoring can improve signi\ufb01cantly\non the performance of classical systems. The NET-VISA system can not only reduce the human\nanalyst effort required to achieve a given level of accuracy, but can also lower the magnitude thresh-\nold for reliable detection. Given that the dif\ufb01culty of seismic monitoring was cited as one of the\nprincipal reasons for non-rati\ufb01cation of the CTBT by the United States Senate in 1999, one hopes\nthat improvements in monitoring may increase the chances of \ufb01nal rati\ufb01cation and entry into force.\nPutting monitoring onto a sound probabilistic footing also facilitates further improvements such as\ncontinuous estimation of local noise conditions, travel time, and attenuation models without the need\nfor ground-truth calibration experiments (controlled explosions). We also expect to lower the detec-\ntion threshold signi\ufb01cantly by extending the generative model to include waveform characteristics,\nso that detection becomes part of a globally integrated inference process\u2014and hence susceptible to\ntop-down in\ufb02uences\u2014rather than being a purely local, bottom-up, hard-threshold decision.\n\nAcknowledgments\n\nWe would like to thank the many seismologists who patiently explained to us the intricacies of their\n\ufb01eld, among them Ronan LeBras, Robert Engdahl, David Bowers, Bob Pearce, Stephen Myers,\nDmitry Storchak, Istvan Bondar, and Barbara Romanowicz. We also received assistance from sev-\neral Berkeley undergraduates, including Matthew Cann, Hong Hu, Christopher Lin, and Andrew\nLee. The third author\u2019s work was performed under the auspices of the U.S. Department of Energy\nat Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. The other au-\nthors were partially supported by the Preparatory Commission for the CTBT. Finally, the \ufb01rst author\nwishes to thank his family for their in\ufb01nite patience and support.\n\nReferences\n[1] Y. Bar-Shalom and T.E. Fortmann. Tracking and Data Association. Academic Press, 1988.\n[2] P. M. Shearer. Improving local earthquake location using the L1 norm and waveform cross\ncorrelation: Application to the Whittier Narrows, California, aftershock sequence. J. Geophys.\nRes., 102:8269 \u2013 8283, 1997.\n\n[3] F. Waldhauser and W. L. Ellsworth. A double-difference earthquake location algorithm:\nmethod and application to the Northern Hayward Fault, California. Bulletin of the Seismo-\nlogical Society of America, 90:1353 \u2013 1368, 2000.\n\n8\n\n\f[4] J. Pujol. Earthquake location tutorial: graphical approach and approximate epicentral location\n\ntechniques. Seis. Res. Letter, 75:63 \u2013 74, 2004.\n\n[5] Stephen C. Myers, Gardar Johannesson, and William Hanley. A Bayesian hierarchical method\nfor multiple-event seismic location. Geophysical Journal International, 171:1049\u20131063, 2009.\n[6] L. Geiger. Probability method for the determination of earthquake epicenters from the arrival\n\ntime only. Bull. St. Louis Univ., 8:60 \u201371, 1912.\n\n[7] D. A. Storchak, J. Schweitzer, and P. Bormann. The IASPEI standard seismic phase list.\n\nSeismol. Res. Lett., 74(6):761 \u2013 772, 2003.\n\n[8] Brian Milch, Bhaskara Marthi, Stuart J. Russell, David Sontag, Daniel L. Ong, and Andrey\nKolobov. BLOG: Probabilistic models with unknown objects. In IJCAI, pages 1352\u20131359,\n2005.\n\n[9] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian Data Analysis. Chapman &\n\nHall, 2004.\n\n[10] Lester Mackey, Ariel Kleiner, and Michael I. Jordan. Improved automated seismic event ex-\ntraction using machine learning. Eos Trans. AGU, 90(52), 2009. Fall Meet. Suppl., Abstract\nS31B-1714.\n\n9\n\n\f", "award": [], "sourceid": 891, "authors": [{"given_name": "Nimar", "family_name": "Arora", "institution": null}, {"given_name": "Stuart", "family_name": "Russell", "institution": null}, {"given_name": "Paul", "family_name": "Kidwell", "institution": null}, {"given_name": "Erik", "family_name": "Sudderth", "institution": null}]}