{"title": "Data-driven Estimation of Sinusoid Frequencies", "book": "Advances in Neural Information Processing Systems", "page_first": 5127, "page_last": 5137, "abstract": "Frequency estimation is a fundamental problem in signal processing, with applications in radar imaging, underwater acoustics, seismic imaging, and spectroscopy. The goal is to estimate the frequency of each component in a multisinusoidal signal from a finite number of noisy samples. A recent machine-learning approach uses a neural network to output a learned representation with local maxima at the position of the frequency estimates. In this work, we propose a novel neural-network architecture that produces a significantly more accurate representation, and combine it with an additional neural-network module trained to detect the number of frequencies. This yields a fast, fully-automatic method for frequency estimation that achieves state-of-the-art results. In particular, it outperforms existing techniques by a substantial margin at medium-to-high noise levels.", "full_text": "Data-driven Estimation of Sinusoid Frequencies\n\nGautier Izacard\n\nEcole Polytechnique\n\ngautier.izacard@polytechnique.edu\n\nSreyas Mohan\n\nCenter for Data Science\nNew York University\nsm7582@nyu.edu\n\nCourant Institute of Mathematical Sciences,\n\nand Center for Data Science\n\nCarlos Fernandez-Granda\n\nNew York University\n\ncfgranda@cims.nyu.edu\n\nAbstract\n\nFrequency estimation is a fundamental problem in signal processing, with applica-\ntions in radar imaging, underwater acoustics, seismic imaging, and spectroscopy.\nThe goal is to estimate the frequency of each component in a multisinusoidal signal\nfrom a \ufb01nite number of noisy samples. A recent machine-learning approach uses a\nneural network to output a learned representation with local maxima at the position\nof the frequency estimates. In this work, we propose a novel neural-network ar-\nchitecture that produces a signi\ufb01cantly more accurate representation, and combine\nit with an additional neural-network module trained to detect the number of fre-\nquencies. This yields a fast, fully-automatic method for frequency estimation that\nachieves state-of-the-art results. In particular, it outperforms existing techniques by\na substantial margin at medium-to-high noise levels.\n\n1\n\nIntroduction\n\n1.1 Estimation of sinusoid frequencies\n\nEstimating the frequencies of multisinusoidal signals from a \ufb01nite number of samples is a funda-\nmental problem in signal processing. Examples of applications include underwater acoustics [2],\nseismic imaging [5], target identi\ufb01cation [3, 11], digital \ufb01lter design [37], nuclear-magnetic-resonance\nspectroscopy [43], and power electronics [27]. In radar and sonar systems, the frequencies encode\nthe direction of electromagnetic or acoustic waves arriving at a linear array of antennae or micro-\nphones [26].\nIn signal processing, multisinusoidal signals are usually represented as linear combinations of complex\nexponentials,\n\nS(t) :=\n\naj exp(i2\u03c0fjt) = Re(aj) cos(2\u03c0fjt) + i Im(aj) sin(2\u03c0fjt).\n\n(1)\nwhere the unknown amplitudes a \u2208 Cm encode the magnitude and phase of the different sinusoidal\ncomponents, and t denotes time. The frequencies f1, . . . , fm quantify the oscillation rate of each\ncomponent. The goal of frequency estimation is to determine their values from noisy samples of the\nsignal S.\nWithout loss of generality, let us assume that the true frequencies belong to the unit interval, i.e.\n0 \u2264 fj \u2264 1, 1 \u2264 j \u2264 m. By the Sampling Theorem [25, 30, 35] the signal in equation 1 is completely\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\nm(cid:88)\n\nj=1\n\n\fIn\ufb01nite samples\n\nN = 40\n\nN = 20\n\n0\n\n10\n\n20\n\n30\n\n40\n\n50\n\n0\n\n10\n\n20\n\n30\n\n40\n\n50\n\n0\n\n10\n\n20\n\n30\n\n40\n\n50\n\nTime\n\nTime\n\nTime\n\nData\n\nFrequency\nestimate\n\n0 5 \u00b7 10\u22122 0.1\n\n0.2\n0.15\nFrequency\n\n0.25\n\n0.3\n\n0.35\n\n0 5 \u00b7 10\u22122 0.1\n\n0.2\n0.15\nFrequency\n\n0.25\n\n0.3\n\n0.35\n\n0 5 \u00b7 10\u22122 0.1\n\n0.2\n0.15\nFrequency\n\n0.25\n\n0.3\n\n0.35\n\nFigure 1: Illustration of the frequency-estimation problem. A multisinusoidal signal given by\nequation 1 (dashed blue line) and its Nyquist-rate samples (blue circles) are depicted on the top\nrow. The bottom row shows that the resolution of the frequency estimate obtained by computing the\ndiscrete-time Fourier transform from N samples decreases as we reduce N. The signal is real-valued,\nso its Fourier transform is even; only half of it is shown.\n\ndetermined by samples measured at a unit rate1: . . . , S(\u22121), S(0), S(1), S(2), . . . Computing the\ndiscrete-time Fourier transform from such samples recovers the frequencies exactly (intuitively, the\ndiscretized sinusoids form an orthonormal basis, see e.g. [31]). However this requires an in\ufb01nite\nnumber of measurements, which is not an option in most practical situations.\nIn practice, the frequencies must be estimated from a \ufb01nite number of measurements corrupted by\nnoise. Here we study a popular measurement model, where the signal is sampled at a unit rate,\n\nyk := S(k) + zk,\n\n1 \u2264 k \u2264 N,\n\n(2)\nand the noise z1, . . . , zN is additive. Limiting the number of samples is equivalent to multiplying\nthe signal by a rectangular window of length N. In the frequency domain, this corresponds to\na convolution with a kernel (called a discrete sinc or Dirichlet kernel) of width 1/N that blurs\nthe frequency information, limiting its resolution as illustrated in Figure 1 (see Section 1 in the\nSupplementary Material for a more detailed explanation). Because of this, the frequency-estimation\nproblem is often known as spectral super-resolution in the literature (in signal processing, the\nspectrum of a signal refers to its frequency representation).\n\n1.2 State of the art\n\nA natural way to perform frequency estimation from data following the model in equation 2 is to\ncompute the magnitude of their discrete-time Fourier transform. This is a linear estimation method\nknown as the periodogram in the signal-processing literature [39]. As illustrated by Figure 1 and\nexplained in more detail in Section 1 of the Supplementary Material, the periodogram yields a\nsuperposition of kernels centered at the true frequencies. The interference produced by the side\nlobes of the kernel complicates \ufb01nding the locations precisely2 (see for example the middle spike in\nFigure 1 for N = 20). The periodogram consequently does not recover the true frequencies exactly,\neven if there is no noise in the data. However, it is a popular technique that often outperforms more\nsophisticated methods when the noise level is high.\nThe sample covariance matrix of the data in equation 2 is low rank [39]. This insight can be exploited\nto perform frequency estimation by performing an eigendecomposition of the matrix, a method\nknown as MUltiple SIgnal Classi\ufb01cation (MUSIC) [4, 34]. The approach is related to Prony\u2019s\nmethod [32, 42]. In a similar spirit, matrix-pencil techniques extract the frequencies by forming a\n\n1If we consider frequencies supported on an interval of length (cid:96), then the sampling rate must equal (cid:96).\n2To alleviate the interference one can multiply the data with a smoothing window, but this enlarges the width\n\nof the blurring kernel and consequently reduces the resolution of the data in the frequency domain [18].\n\n2\n\n\fFigure 2: Architecture of the DeepFreq method.\n\nmatrix pencil before computing the eigendecomposition of the sample covariance matrix [21, 33]. We\nrefer to [38] for an exhaustive list of related methods. Eigendecomposition-based methods are very\naccurate at low noise levels [28, 29], and provably achieve exact recovery of the frequencies in the\nabsence of noise, but their performance degrades signi\ufb01cantly as the signal-to-noise ratio decreases.\nThe periodogram and eigendecomposition-based methods assume prior knowledge of the number\nof frequencies to be estimated, which is usually not available in practice. Classical approaches to\nestimate the number of frequencies use information-theoretic criteria such as the Akaike information\ncriterion (AIC) [44] or minimum description length (MDL) [45]. Both methods minimize a criterion\nbased on maximum likelihood that involves the eigenvalues of the sample covariance matrix. An\nalternative technique known as the second-order statistic of eigenvalues (SORTE) [20, 17] produces\nan estimate of the number of frequencies based on the gap between the eigenvalues of the sample\ncovariance matrix.\nVariational techniques are based on an interpretation of frequency estimation as a sparse-recovery\nproblem. Sparse solutions are obtained by minimizing a continuous counterpart of the (cid:96)1 norm [10,\n40, 15]. The approach has been extended to settings with missing data [41], outliers [16], and varying\nnoise levels [8]. As in the case of eigendecomposition-based methods, these techniques are known\nto be robust at low noise levels [9, 13, 40, 14], but exhibit a deteriorating empirical performance as\nthe noise level increases. An important drawback of this methodology is the computational cost of\nsolving the optimization problem, which is formulated as a semide\ufb01nite program or as a quadratic\nprogram in very high dimensions.\nVery recently, the authors of [23] propose a learning-based approach to frequency estimation based\non a deep neural network. The method is shown to be competitive with the periodogram and\neigendecomposition-based methods for a range of noise levels, but requires an estimate of the number\nof sinusoidal components as an input. Other recent works apply deep learning to related inverse\nproblems, including sparse recovery [47, 19], point-source superresolution [7], and acoustic source\nlocalization [1, 46, 12].\n\n1.3 Contributions\n\nThis work introduces a novel deep-learning framework to perform frequency estimation from data\ncorrupted by noise of unknown variance. The approach is inspired by the learning-based method\nin Ref. [23], which generates a frequency representation that can be used to perform estimation\nif the number of true frequencies is known.\nIn this work, we propose a novel neural-network\narchitecture that produces a signi\ufb01cantly more accurate frequency representation, and combine it\nwith an additional neural-network module trained to estimate the number of frequencies. This yields\na fast, fully-automatic method for frequency estimation that achieves state-of-the-art results. The\napproach outperforms existing techniques by a substantial margin at medium-to-high noise levels.\nOur results showcase an area of great potential impact for machine-learning methodology: problems\nwith accurate physical models where model-based methodology breaks down due to stochastic\nperturbations that can be simulated accurately. The code used to train and evaluate our models is\navailable online at https://github.com/sreyas-mohan/DeepFreq.\n\n3\n\nImaginary partInput SignalReal partFrequencynumberestimateFrequency-reprsentation moduleFrequency-counting moduleFrequency representationPeak locationsFrequencyestimates\fA1\n\nA2\n\nA3\n\nFigure 3: Top: Architecture of the DeepFreq frequency-representation module described in Section 2.2.\nBottom: Heat maps showing the magnitudes of the Fourier transform of the rows of the matrices\nassociated to three of the channels in the \ufb01rst layer of the encoder of the frequency-representation\nmodule. The diagonal pattern indicates that each channel computes a Fourier-like transformation.\nNote that the frequencies are ordered automatically. The reason is that after the \ufb01rst layer the\nnetwork is convolutional and has a reduced \ufb01eld of view. In order to produce an accurate frequency\nrepresentation at the output, the \ufb01rst layer needs to order the relevant frequency information so that it\ncan be propagated by the convolutional layers.\n\n2 Methodology\n\n2.1 Overview\n\nMost existing techniques for frequency estimation build continuous frequency representations of the observed\ndata, as opposed to estimating the frequencies directly. In the case of the periodogram, the representation is just\nthe discrete-time Fourier transform of the measurements. In the case of eigendecomposition-based methods,\na different representation\u2013 known as the pseudo-spectrum\u2013 is computed using a subset of the eigenvectors\nof the sample covariance matrix of the data. One can show that in the absence of noise, the peaks of the\npseudo-spectrum are located exactly at the locations of the true frequencies. For noisy data, the hope is that the\nperturbation does not vary the locations too much. In the case of variational methods, yet another representation\nis obtained from the solution to the dual of the sparsity-promoting convex program [10]. In this case, the\nfrequencies are estimated by locating local maxima that have magnitude close to one.\nRecently, the authors of [23] propose generating a frequency representation in a data-driven manner, training a\nneural network called the PSnet to produce it directly from the measurements. Frequency estimation is then\ncarried out by \ufb01nding the peaks of the representation. The authors show that the approach is more effective\nthan using a deep-learning model to directly output the frequency values. Building upon the idea of learned\nfrequency representations, we propose an improved version of the PSnet and combine it with an additional\nneural network that performs automatic estimation of the number of frequencies. Figure 2 shows a diagram\nof the architecture. First, the data are fed through a module that produces a frequency representation. Then,\nthe representation is fed into a second frequency-counting module that outputs an estimate of the number of\n\nsinusoidal components (cid:98)m. Finally, the frequency estimates are computed by locating the (cid:98)m highest maxima\n\nin the frequency representation. We call this method DeepFreq. Sections 2.2 and 2.3 describe the proposed\narchitectures for the frequency-representation and frequency-counting modules respectively.\n\n2.2 Frequency-representation module\n\nBuilding upon the methodology proposed in Ref. [23], we implement the frequency-representation module as a\nfeedforward deep neural network. Given a set of true frequencies f1, . . . , fm, we de\ufb01ne a ground-truth frequency\n\n4\n\n...Conv + BN + ReLUParallel linear transformsConcatenationConv + BN + ReLUTransposed convolutionFrequency representationInput SignalReal partImaginary part...0255075100FourierTransformofithrow020406080100120ithrow0255075100FourierTransformofithrow0255075100FourierTransformofithrow01234\fFigure 4: Frequency representation learned by DeepFreq for data generated from a signal with two\nsinusoidal components. The amplitude of the \ufb01rst component has magnitude equal to one. The second\ncomponent has magnitude equal to 0.5 (left) and 0.1 (right). For four different signal-to-noise ratios,\nthe representation is averaged over 100 signals with random phases and different noise realizations.\nThe error bars represent standard error.\n\nrepresentation FR as a superposition of narrow Gaussian kernels K : R \u2192 R centered at each frequency\n\nFR (u) :=\n\nK (u \u2212 fj) .\n\n(3)\n\nm(cid:88)\n\nj=1\n\nFR is a smooth function that has sharp peaks at the location of the true frequencies, and decays rapidly away\nfrom them. Note that amplitude information is not encoded in FR; each shifted kernel has the same amplitude.\nThe neural network is calibrated to output an approximation to FR from N noisy, low-resolution data given\nby the model in equation 2. This is achieved by minimizing a training loss that penalizes the square (cid:96)2-norm\napproximation error between the output and the true FR function over a \ufb01ne grid for a database of examples.\nFigure 3 shows the proposed architecture for the frequency-representation module. The overall structure is\nsimilar to the PSnet architecture from Ref. [23]. First, a linear encoder maps the input data to an intermediate\nfeature space. Then, the features are processed by a series of convolutional layers with localized \ufb01lters of length\n3 and batch normalization [22], interleaved with ReLUs. The dimension of the input is preserved using circular\npadding. Finally, a decoder produces the FR estimate applying a transposed convolution (in the PSnet a fully\nconnected layer is used instead). If the data are complex-valued, the real and imaginary parts are processed as\npairs of real numbers.\nThe main difference between our proposed architecture and the PSnet is the encoder. Intuitively, the encoder\nlearns a Fourier-like transformation that concentrates frequency information locally so that it can be processed\nby the convolutional \ufb01lters in the subsequent layer. The PSnet uses a single linear map to implement the\ntransformation: for an input y \u2208 CN the output of the encoder is Ay, where A is a \ufb01xed M \u00d7 N matrix and\nM > N. We propose to instead use multiple separate linear maps. The output of the DeepFreq encoder can be\nrepresented by a feature matrix\n\n(4)\nwhere each Ai, 1 \u2264 i \u2264 C, is a \ufb01xed M \u00d7 N matrix. The C columns can be interpreted as different channels,\nwhich extract complementary features from the input. The \ufb01lters in the next layer of the architecture combine the\ninformation of all channels, while acting convolutionally on the columns of the feature matrix. Visualizing the\nFourier transform of the rows of A1, . . . , AC for a trained DeepFreq network reveals that each of the channels\nimplement similar, yet different, Fourier-like transformations: the rows are approximately sinusoidal, with\nfrequencies that are ordered sequentially (see Figure 3). This provides a rich set of frequency features to the\nconvolutional layers, which boosts the performance of the frequency-representation module with respect to the\nPSnet (see Section 3.2).\n\n(cid:2)A1y A2y\n\n\u00b7\u00b7\u00b7 AC y(cid:3) ,\n\n2.3 Frequency-counting module\n\nFigure 4 shows the output of the frequency-representation module for a simple signal with two sinusoidal\ncomponents. When one of the components has small amplitude and the data are noisy, the representation may\nstill detect the frequency, but the magnitude of the corresponding peak decreases. In addition, spurious local\nmaxima may appear due to the stochastic \ufb02uctuations in the data. In order to perform estimation by locating\nmaxima in the learned representation, it is necessary to \ufb01rst decide how many components to look for. This is\na pervasive problem in frequency estimation, which is also an issue for traditional methods. Many published\nworks assume that the number of components is known beforehand (including [23]), but this is often not the case\n\n5\n\n\u22120.10\u22120.050.000.050.10FrequencyEstimatedFrequencyRepresentation\u22120.10\u22120.050.000.050.10Frequency1dB5dB10dB100dBFrequencylocation\fFigure 5: False negative rate of DeepFreq compared to other methodologies. DeepFreq outperforms\nall other methods, including PSnet. Only CBLasso at high signal-to-noise ratios exhibits similar\nperformance. The experiment is described in Section 3.2.\n\nin many practical applications. In this section we describe a frequency-counting module designed to estimate the\nnumber of sinusoidal components automatically.\nWe propose to implement the frequency-counting module using a neural network. The network is trained to\nextract the number of components from the output of the frequency-representation module in Section 2.2. The\nrepresentation produced by the module concentrates the frequency information locally, which makes it easier to\ncount the number of components. Patterns indicating the presence of true frequencies can be expected to be\ninvariant to translations as long as the noise is not structured in the frequency domain. We exploit this insight\nby applying a convolutional architecture to count the frequencies. An initial 1D strided convolutional layer\nwith a wide kernel is followed by several convolutional blocks with localized \ufb01lters. The \ufb01nal layer is fully\nconnected. It outputs a single real number, which is rounded to the nearest integer to produce the count estimate.\nThe counting-module is calibrated on a training dataset containing instances of FR functions produced by the\nfrequency-representation module. Note that the frequency-representation and frequency-counting modules are\ntrained separately. The loss function is given by the squared (cid:96)2 norm difference between the count estimate\nand the true number of sinusoidal components. Section 3.3 shows that our approach clearly outperforms\neigendecomposition-based methods at medium-to-high noise levels.\n\n3 Computational experiments\n\n3.1 Experimental design\n\nTo validate our approach we simulate data according to the signal model in equation 1 and the measurement\nmodel in equation 2 for N := 50. The data generation process is the following:\n\n1. The number of components m in each signal is chosen uniformly at random between 1 and 10.\n\n2. The frequency values f1, . . . , fm are generated so that the minimum separation between them is greater or\nequal to 1/N. The minimum separation governs the dif\ufb01culty of locating the differences. Under 2/N the\nproblem is very challenging and under 1/N it is almost impossible (we refer the reader to [29, 36, 10] for an\nin-depth analysis of this phenomenon). The separation between the frequencies is set to equal 1/N + |w|,\nwhere w is a Gaussian random variable with standard deviation equal to 2.5/N.\n\n3. The coef\ufb01cients aj, 1 \u2264 j \u2264 m, are given by aj := (0.1 + |wj|) ei\u03b8j , where wj is sampled from a standard\nGaussian distribution and the phase \u03b8j is uniform in [0, 2\u03c0]. The minimum possible amplitude also governs\nthe dif\ufb01culty of the problem. We \ufb01x it to 0.1.\n\n4. The noise level varies in a certain range, and is considered unknown. For each noise realization, we \ufb01rst\nsample the noise level \u03c3 uniformly in the interval [0, 1]. Then we generate N i.i.d. standard Gaussian samples.\nFinally, we scale the noise so that the ratio between the (cid:96)2 norm of the noise and the signal equals \u03c3. This\nyields a range of signal-to-noise ratios (SNR) between 0 dB and \u221e.\n\n6\n\n01020304050SNR(dB)0510152025303540FNR(%)DeepFreqPSnetCBLassoPeriodogramMUSIC\fFigure 6: Average error of the DeepFreq frequency-counting module, the DeepFreq frequency-\ncounting module trained with the output of the PSnet, and three representative eigendecomposition-\nbased methods for the experiment described in Section 3.3.\n\n3.2 Frequency representation\n\nAs mentioned in Section 2, most existing methods for frequency estimation construct frequency representations.\nHere we compare these representations to the one learned by DeepFreq, in a setting where the noise level in the\ndata is unknown. We consider four representative methods: the periodogram [39], MUSIC [4, 34], a variational\nmethod known as the concomitant Beurling lasso (CBLasso) [8], and the PSnet method in [23].\nThe architecture of the DeepFreq frequency-representation module follows the description in Section 2.2. We\n\ufb01x the standard deviation of the Gaussian \ufb01lter in the representation to 0.3/N. We train a single model for the\nwhole range of noise levels. The number of channels C in the encoder is set to 64. The output dimensionality M\nof the encoder is set to 125. The number of intermediate convolutional layers is set to 20. The width of the \ufb01lter\nin the transposed convolution in the decoder is set to 25 with a stride of 8 in order to obtain a discretization of the\nrepresentation on a grid of size 103. We build the training set generating 2 \u00b7 105 clean signals. During training,\nnew noise realizations are added at each epoch. The training loss is minimized using the Adam optimizer [24]\nwith a starting learning rate of 3 \u00b7 10\u22124. The same training procedure is used to train the PSnet network.\nWe evaluate the different methods on a test set where the clean signal samples follow the model in Section 3.1.\nFor each noise level, we generate 103 signals, which are different from the ones in the training set. We assume\nthat the true number of sinusoidal components m is known. The frequency estimates \u02c6f1, . . . , \u02c6fm are obtained\nby locating the highest m maxima of the frequency representations constructed by the different methods from\nthe noisy data. The representations are evaluated on a \ufb01ne grid with 103 points. The accuracy of the estimate\nis measured by computing the false negative rate FNR. The FNR is de\ufb01ned as the number of true frequencies\nthat are undetected, meaning that there is no estimated frequency within a radius of (2N )\u22121 (recall that the\nminimum separation is 1/N).\nFigure 5 shows the results. The DeepFreq frequency-representation module outperforms all other methods at\nlow-to-middle SNRs, and is only matched by CBLasso at high SNRs. In particular, it outperforms the PSnet\nby between 4% and 7% over the whole range of noise levels. It is worth noting, that CBLasso is extremely\nslow: its average running time is 1.71 seconds. The DeepFreq module is two orders of magnitude faster (42\nmilliseconds)3.\n\n3.3 Frequency counting\n\nIn this section we report the performance of the DeepFreq frequency-counting module. To the best of our\nknowledge, the only existing techniques to estimate the number of sinusoidal components rely on an eigende-\ncomposition of the sample covariance matrix of the data. We compare to three of the most popular examples:\nAIC [44], MDL [45] and SORTE [20].\nThe architecture of the module is convolutional with a \ufb01nal fully-connected layer, as described in 2.3. The initial\nlayer contains 16 \ufb01lters of size 25 with a stride of 5, which downsample the input into features vectors of length\n200. We set the number of subsequent convolutional layers to 20, each containing 16 \ufb01lters of size 3. We generate\ntraining data by feeding the training data described in Section 3.2 through a DeepFreq frequency-representation\n\n3Running times are measured on an Intel Core i5-6300HQ CPU.\n\n7\n\n01020304050SNR(dB)020406080Error(%)DeepFreqPSnet+countingmoduleSORTEMDLAIC\fFigure 7: Frequency-estimation performance of DeepFreq compared to other methodologies. Standard\nerror bars for the DeepFreq method are shown in Section 2 of the Supplementary material. The\nexperiment is described in Section 3.4.\n\nmodule with \ufb01xed, calibrated weights. The training loss is minimized using the Adam optimizer [24]. Figure 6\nshows the fraction of signals in the test set for which the number of components is not estimated correctly for\ndifferent methodologies (the test data is generated as in Section 3.2). The DeepFreq frequency-counting module\nclearly outperforms the eigendecomposition-based methods except at very high signal-to-noise ratios. A natural\nquestion to ask is how DeepFreq compares to a model using our counting module combined with the PSnet. To\ninvestigate this, we train the proposed frequency-counting module using the representation produced by PSnet.\nAs shown in Figure 6 replacing the DeepFreq representation by the PSnet representation results in a signi\ufb01cant\ndecrease in performance. This suggests that the performance of the counting module is highly dependent on the\nquality of the frequency representation provided as input.\n\n3.4 Frequency estimation\n\nIn this section we evaluate the frequency-estimation performance of DeepFreq in a realistic setting where both\nthe noise level and the number of sinusoidal components are unknown. The DeepFreq modules are calibrated\nseparately, as described in Sections 3.2 and 3.3. Training takes 11 hours on an NVIDIA P40. The test data are\ngenerated as described in Section 3.2. We compare our approach to an eigendecomposition-based procedure\nthat combines MUSIC with AIC or MDL, the CBLasso (where frequencies are selected from the dual solution\nusing a threshold calibrated with a validation dataset), and to a model combining the PSnet with our proposed\nfrequency-counting module. We measure estimation accuracy by computing the Chamfer distance [6] between\n\nthe m true frequencies f := (f1, . . . , fm) and the (cid:98)m estimates \u02c6f := ( \u02c6f1, . . . , \u02c6f(cid:98)m):\n(cid:12)(cid:12)(cid:12) \u02c6fj \u2212 fi\n(cid:12)(cid:12)(cid:12) .\n\n(cid:88)\n\n(cid:12)(cid:12)(cid:12)fi \u2212 \u02c6fj\n\n(cid:12)(cid:12)(cid:12) +\n\nd(f, \u02c6f ) =\n\n(5)\n\n(cid:88)\n\n\u02c6fj\u2208 \u02c6f\n\nmin\nfi\u2208f\n\nmin\n\u02c6fj\u2208 \u02c6f\n\nfi\u2208f\n\nFigure 7 shows the results. DeepFreq clearly outperforms the other methods over the whole range of noise levels.\n\n4 Conclusion and future work\n\nIn this paper, we introduce a machine-learning framework for frequency estimation, which combines two\nneural-network modules calibrated with simulated data. The approach achieves state-of-the-art performance,\nis fully automatic, and can operate at varying (and unknown) signal-to-noise ratios. Our framework can be\nextended to other signal and noise models by modifying the training dataset accordingly. Our results illustrate an\nincipient shift of paradigm in modern signal processing, from model-based methods towards learning-based\ntechniques. An interesting direction for future research is to design learning-based models capable of generating\nfrequency representations that can be interpreted probabilistically in terms of the uncertainty of the estimate.\n\nAcknowledgements\n\nC.F. was supported by NSF award DMS-1616340.\n\n8\n\n01020304050SNR(dB)10\u2212210\u22121100ChamfererrorDeepFreqPSnet+countingmoduleCBLassoMDL+MUSICAIC+MUSIC\fReferences\n[1] ADAVANNE, S., POLITIS, A., AND VIRTANEN, T. Direction of arrival estimation for multiple sound\n\nsources using convolutional recurrent neural network. arXiv preprint arXiv:1710.10059 (2017).\n\n[2] BEATTY, L. G., GEORGE, J. D., AND ROBINSON, A. Z. Use of the complex exponential expansion as a\nsignal representation for underwater acoustic calibration. The Journal of the Acoustical Society of America\n63, 6 (1978), 1782\u20131794.\n\n[3] BERNI, A. J. Target identi\ufb01cation by natural resonance estimation. IEEE Trans. on Aerospace and\n\nElectronic systems, 2 (1975), 147\u2013154.\n\n[4] BIENVENU, G. In\ufb02uence of the spatial coherence of the background noise on high resolution passive\nmethods. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing\n(1979), vol. 4, pp. 306 \u2013 309.\n\n[5] BORCEA, L., PAPANICOLAOU, G., TSOGKA, C., AND BERRYMAN, J. Imaging and time reversal in\n\nrandom media. Inverse Problems 18, 5 (2002), 1247.\n\n[6] BORGEFORS, G. Distance transformations in arbitrary dimensions. Computer Vision, Graphics, and\n\nImage Processing 27, 3 (1984), 321 \u2013 345.\n\n[7] BOYD, N., JONAS, E., BABCOCK, H. P., AND RECHT, B. DeepLoco: Fast 3D localization microscopy\n\nusing neural networks. BioRxiv (2018), 267096.\n\n[8] BOYER, C., DE CASTRO, Y., AND SALMON, J. Adapting to unknown noise level in sparse deconvolution.\n\nInformation and Inference: A Journal of the IMA 6, 3 (2017), 310\u2013348.\n\n[9] CAND\u00c8S, E. J., AND FERNANDEZ-GRANDA, C. Super-resolution from noisy data. Journal of Fourier\n\nAnalysis and Applications 19, 6 (2013), 1229\u20131254.\n\n[10] CAND\u00c8S, E. J., AND FERNANDEZ-GRANDA, C. Towards a mathematical theory of super-resolution.\n\nCommunications on Pure and Applied Mathematics 67, 6 (2014), 906\u2013956.\n\n[11] CARRIERE, R., AND MOSES, R. L. High resolution radar target modeling using a modi\ufb01ed Prony\n\nestimator. IEEE Trans. on Antennas and Propagation 40, 1 (1992), 13\u201318.\n\n[12] CHAKRABARTY, S., AND HABETS, E. A. Broadband DOA estimation using convolutional neural networks\ntrained with noise signals. In Applications of Signal Processing to Audio and Acoustics (WASPAA) (2017),\nIEEE, pp. 136\u2013140.\n\n[13] DUVAL, V., AND PEYR\u00c9, G. Exact support recovery for sparse spikes deconvolution. Foundations of\n\nComputational Mathematics (2015), 1\u201341.\n\n[14] FERNANDEZ-GRANDA, C. Support detection in super-resolution. In Proceedings of the 10th International\n\nConference on Sampling Theory and Applications (SampTA 2013) (2013), pp. 145\u2013148.\n\n[15] FERNANDEZ-GRANDA, C. Super-resolution of point sources via convex programming. Information and\n\nInference 5, 3 (2016), 251\u2013303.\n\n[16] FERNANDEZ-GRANDA, C., TANG, G., WANG, X., AND ZHENG, L. Demixing sines and spikes: Robust\n\nspectral super-resolution in the presence of outliers. Information and Inference 7, 1 (2017), 105\u2013168.\n\n[17] HAN, K., AND NEHORAI, A. Improved source number detection and direction estimation with nested\n\narrays and ulas using jackkni\ufb01ng. IEEE Transactions on Signal Processing 61, 23 (2013), 6118\u20136128.\n\n[18] HARRIS, F. On the use of windows for harmonic analysis with the discrete Fourier transform. Proceedings\n\nof the IEEE 66, 1 (1978), 51 \u2013 83.\n\n[19] HE, H., XIN, B., IKEHATA, S., AND WIPF, D. From Bayesian sparsity to gated recurrent nets. In\n\nAdvances in Neural Information Processing Systems (2017), pp. 5554\u20135564.\n\n[20] HE, Z., CICHOCKI, A., XIE, S., AND CHOI, K. Detecting the number of clusters in n-way probabilistic\nclustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 11 (2010), 2006\u20132021.\n\n[21] HUA, Y., AND SARKAR, T. Matrix pencil method for estimating parameters of exponentially\ndamped/undamped sinusoids in noise. IEEE Trans. Acoust., Speech, Signal Process. 38, 5 (May 1990),\n814\u2013824.\n\n9\n\n\f[22] IOFFE, S., AND SZEGEDY, C. Batch normalization: Accelerating deep network training by reducing\n\ninternal covariate shift. arXiv preprint arXiv:1502.03167 (2015).\n\n[23] IZACARD, G., BERNSTEIN, B., AND FERNANDEZ-GRANDA, C. A learning-based framework for\n\nline-spectra super-resolution. CoRR abs/1811.05844 (2018).\n\n[24] KINGMA, D. P., AND BA, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980\n\n(2014).\n\n[25] KOTELNIKOV, V. A. On the carrying capacity of the \"ether\" and wire in telecommunications. In Material\nfor the First All-Union Conference on Questions of Communication (Russian), Izd. Red. Upr. Svyzai RKKA,\nMoscow, 1933 (1933).\n\n[26] KRIM, H., AND VIBERG, M. Two decades of array signal processing research: the parametric approach.\n\nIEEE signal processing magazine 13, 4 (1996), 67\u201394.\n\n[27] LEONOWICZ, Z., LOBOS, T., AND REZMER, J. Advanced spectrum estimation methods for signal\n\nanalysis in power electronics. IEEE Trans. on Industrial Electronics 50, 3 (2003), 514\u2013519.\n\n[28] LIAO, W., AND FANNJIANG, A. Music for single-snapshot spectral estimation: Stability and super-\n\nresolution. Applied and Computational Harmonic Analysis 40, 1 (2016), 33\u201367.\n\n[29] MOITRA, A. Super-resolution, extremal functions and the condition number of Vandermonde matrices. In\n\nProceedings of the 47th Annual ACM Symposium on Theory of Computing (STOC) (2015).\n\n[30] NYQUIST, H. Certain topics in telegraph transmission theory. Trans. of the American Institute of Electrical\n\nEngineers 47, 2 (1928), 617\u2013644.\n\n[31] OPPENHEIM, A., WILLSKY, A., AND NAWAB, S. Signals and Systems. Prentice-Hall signal processing\n\nseries. Prentice Hall, 1997.\n\n[32] PRONY, B. G. R. D. Essai \u00e9xperimental et analytique: sur les lois de la dilatabilit\u00e9 de \ufb02uides \u00e9lastique et\nsur celles de la force expansive de la vapeur de l\u2019alkool, \u00e0 diff\u00e9rentes temp\u00e9ratures. Journal de l\u2019\u00c9cole\nPolytechnique 1, 22 (1795), 24\u201376.\n\n[33] ROY, R., AND KAILATH, T. ESPRIT- estimation of signal parameters via rotational invariance techniques.\n\nIEEE Trans. on Acoustics, Speech and Signal Processing 37, 7 (1989), 984 \u2013995.\n\n[34] SCHMIDT, R. Multiple emitter location and signal parameter estimation. IEEE Trans. on Antennas and\n\nPropagation 34, 3 (1986), 276 \u2013 280.\n\n[35] SHANNON, C. E. Communication in the presence of noise. Proceedings of the IRE 37, 1 (1949), 10\u201321.\n\n[36] SLEPIAN, D. Prolate spheroidal wave functions, Fourier analysis, and uncertainty. V - The discrete case.\n\nBell System Technical Journal 57 (1978), 1371\u20131430.\n\n[37] SMITH, J. O. Introduction to digital \ufb01lters: with audio applications, vol. 2. Julius Smith, 2008.\n\n[38] STOICA, P. List of references on spectral line analysis. Signal Processing 31, 3 (1993), 329\u2013340.\n\n[39] STOICA, P., AND MOSES, R. L. Spectral analysis of signals, 1 ed. Prentice Hall, Upper Saddle River,\n\nNew Jersey, 2005.\n\n[40] TANG, G., BHASKAR, B., AND RECHT, B. Near minimax line spectral estimation. Information Theory,\n\nIEEE Trans. on 61, 1 (Jan 2015), 499\u2013512.\n\n[41] TANG, G., BHASKAR, B. N., SHAH, P., AND RECHT, B. Compressed sensing off the grid. IEEE Trans.\n\non Information Theory 59, 11 (2013), 7465\u20137490.\n\n[42] VETTERLI, M., MARZILIANO, P., AND BLU, T. Sampling signals with \ufb01nite rate of innovation. IEEE\n\nTrans. on Signal Processing 50, 6 (2002), 1417\u20131428.\n\n[43] VITI, V., PETRUCCI, C., AND BARONE, P. Prony methods in NMR spectroscopy. International Journal\n\nof Imaging Systems and Technology 8, 6 (1997), 565\u2013571.\n\n[44] WAX, M., AND KAILATH, T. Detection of signals by information theoretic criteria. IEEE Transactions on\n\nAcoustics, Speech, and Signal Processing 33, 2 (1985), 387\u2013392.\n\n[45] WAX, M., AND ZISKIND, I. Detection of the number of coherent signals by the mdl principle. IEEE\n\nTransactions on Acoustics, Speech, and Signal Processing 37, 8 (1989), 1190\u20131196.\n\n10\n\n\f[46] XIAO, X., ZHAO, S., ZHONG, X., JONES, D. L., CHNG, E. S., AND LI, H. A learning-based approach to\ndirection of arrival estimation in noisy and reverberant environments. In Proceedings of the International\nConference on Acoustics, Speech and Signal Processing (2015), pp. 2814\u20132818.\n\n[47] XIN, B., WANG, Y., GAO, W., WIPF, D., AND WANG, B. Maximal sparsity with deep networks? In\n\nAdvances in Neural Information Processing Systems (2016), pp. 4340\u20134348.\n\n11\n\n\f", "award": [], "sourceid": 2808, "authors": [{"given_name": "Gautier", "family_name": "Izacard", "institution": "Ecole Polytechnique"}, {"given_name": "Sreyas", "family_name": "Mohan", "institution": "NYU"}, {"given_name": "Carlos", "family_name": "Fernandez-Granda", "institution": "NYU"}]}