|
Submitted by
Assigned_Reviewer_3
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
Multivariate data streams are almost always
nonstationary, so methods for tracking and adapting to the local
statistics of the timeseries are required. This paper develops and
explores a model for local adaptive Bayesian approaches, motivated through
definition of a multivariate stoch process. The authors show that such an
approach goes some way to avoid computational overload and that it offers
performance matching state of the art alternatives.
The paper is
well-written, with a clear explanation of the approach. I was able to
follow and re-derive the core expressions in the paper, though often this
required re-reading other material in the field.
Fig 4 needs more
explanation - and some significance testing. "Comparing boxplots in (a)
with those in (b) we can see that our model allows to obtain improvements
also in terms of prediction." This is very difficult to see and would seem
to be stat insig. Please comment or refine the
description. Q2: Please summarize your review in 1-2
sentences
An interesting paper in an important topic. The
theoretical developments are of broad interest, outside the finance domain
as well as in it. The choice of results presentation was disappointing
after a well-written modeling and development
section. Submitted by
Assigned_Reviewer_5
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
Summary: The paper proposes a multivariate stochastic
process for modeling time series which incorporates locally varying
smoothness in the mean and in the covariance matrix. The process uses
latent dictionary functions with nested Gaussian process priors; the
dictionary functions are linearly related to the observations through a
sparse mapping. The authors outline MCMC and online algorithms for
approximate Bayesian inference and assess performances using simulation
and processing of financial data.
Quality: The paper extends the
application of the nested Gaussian process priors in [23] to the
multivariate case and employs them for both the mean and covariance. This
constitutes a sensible extension, and the authors develop an effective
inference algorithm.
The authors outline an online algorithm and
suggest that it could prove beneficial for high-frequency data. While this
is certainly attractive, the paper lacks any clear characterization of the
computational overhead (or complexity). Nor is there any indication of the
reduction in accuracy that is induced by executing the online method as
opposed to rerunning the full posterior computation. As such it is
difficult for the reader to gain a sense of the settings in which the
online approach might prove useful and appropriate.
The authors do
not provide compelling evidence that the extended model proposed in this
paper is important in a practical setting. The simulation examples are
very toy cases tailored to the technique, so that one would be
disappointed if the proposed strategy did not provide improved results.
The studies certainly hint at a setting where the local smoothness in the
model could prove beneficial, but they do not model a practical
measurement setting.
The paper provides an analysis of correlation
estimation for national stock indices, with a qualitative analysis of the
derived correlation results. Unfortunately, there is limited discussion of
the choice of hyperparameters for the inference and as a result it is
difficult to determine whether the comparison between LBCR and BCR in
Figure 3 is meaningful. The BCR graph has the appearance of a correlation
estimate that is constructed using much greater smoothing than the LBCR
method. It is not clear that LBCR is using less smoothing where necessary
and similar levels of smoothing where appropriate; it just appears to be a
much less smooth estimate of the correlation throughout the entire
time-series. I wonder whether BCR might achieve a more similar result with
different parameter choices.
Since there is no ground truth, it is
impossible to know whether the generally increased level of correlation is
real or an artifact of the analysis. One cannot know if the oscillations
in first portion of the data (2004-2006) are a true reflection of
significant variation in the correlations over this period or if they
simply indicate that there has been inadequate smoothing in the formation
of the estimate. Although the authors provide plausible explanations for
the changes in the LBCR estimates, it is not at all clear that these were
hypothesized before the results were obtained. If not, then one cannot
place much value in the explanations, since it is almost always possible
to find some plausible explanation for an increase or decrease in
correlation of financial time series at the national level.
The
final section of the paper suggests that the model could be employed to
obtain better predictions of the log-returns of the national stock
indices. The results here add very little of value to the paper. The
authors suggest that the boxplots in Figure 4 indicate improved prediction
performance, but this figure provides no real evidence of any meaningful
improvement. It is extremely unlikely that any analysis conducted on only
7 weeks of log-returns (at the weekly level) would be able to provides any
evidence of any improvement in estimation quality.
Clarity: The
paper is well-written and easy to follow. At times it is not completely
clear what choices have been made in algorithmic simulation and data
analysis comparisons.
Originality: The model in the paper is a
variation of the model in [22], replacing Gaussian process priors with
nested Gaussian process priors. The use of nested Gaussian process priors
is suggested in [23], where it is applied to the univariate case, for the
mean only. The paper is not significantly original.
Significance:
The introduced model is a relatively minor adaptation of an existing model
and the inference techniques are fairly straightforward adaptations of
existing methods. Although there is some novelty and the model and method
are of some interest, the paper is unlikely to have a significant impact.
Q2: Please summarize your review in 1-2
sentences
The paper is well-written and introduces a novel
multivariate model and inference approach, but the innovation is
relatively minor. The authors do not provide convincing evidence that the
model can provide a meaningful improvement over existing techniques in a
practical setting.
Submitted by
Assigned_Reviewer_6
Q1: Comments to author(s).
First provide a summary of the paper, and then address the following
criteria: Quality, clarity, originality and significance. (For detailed
reviewing guidelines, see
http://nips.cc/PaperInformation/ReviewerInstructions)
The paper proposes a Bayesian method for modeling
multivariate (continuous) time series along with a sampling algorithm. The
method and algorithm improve existing methods by (1) offering a prior that
is locally adaptive to varying smoothness (since existing methods are
demonstrated to "under-smooth during periods of stability and over-smooth
during periods of sharp changes") and (2) allowing efficient inference due
to its formulation using a stochastic differential equation that includes
dependence only up to fixed derivative orders (estimates for which, along
with the instantaneous mean process A, may then be used as state). The
method is examined in both simulation studies, which demonstrate the
ability to capture locally varying smoothness compared to other methods,
and in an application to stock market indices, which shows both the
importance of the modeling regime and the effectiveness of the proposed
method.
The paper is well-written and the subject is of
significant interest to the NIPS community. However, it seems some aspects
of the method could be better explained. The model limitation that makes
inference tractable (limiting the derivative dependencies, if I understand
correctly, roughly corresponds to the bandwidth truncation used in other
methods) could be better highlighted along with the way in which it limits
the long-range (and unbounded derivative order) dependencies that general
GP modeling allows. Some notation, like the dictionary truncation levels
L* and K*, could be more clearly defined, though space constraints are
certainly active. (And around lines 106-107, should it read l=1,...,L and
k=1,...,K ?) The algorithmic complexity measures could be stated a bit
more carefully: getting all means and the full covariance in an
unstructured GP may be O(T^3) and require matrix inversion, and that
should be pointed out, but many of the competing methods cited (e.g. GPs
with truncated bandwidth) scale like O(T). Despite those quibbles, the
method seems explained clearly enough to be implemented.
Overall,
the paper provides a thorough treatment (up to space constraints) of an
interesting new modeling idea that is very relevant to
NIPS. Q2: Please summarize your review in 1-2
sentences
This paper provides a pretty thorough treatment of an
interesting new time series modeling method that is of significant
interest to the NIPS community.
Q1:Author
rebuttal: Please respond to any concerns raised in the reviews. There are
no constraints on how you want to argue your case, except for the fact
that your text should be limited to a maximum of 6000 characters. Note
however that reviewers and area chairs are very busy and may not read long
vague rebuttals. It is in your own interest to be concise and to the
point.
Novelty & Significance:
Our proposed
approach is novel and significant in being the first coupled
mean-covariance process, which allows locally varying smoothness. Although
our work builds on the (unpublished) Fox and Dunson (2011) formulation,
their approach assumes a single level of smoothness over time, and hence
is substantially less flexible than ours. We accomplish this flexibility
by using nested Gaussian processes, which have only been considered
previously by Zhu and Dunson (2012) in an unpublished arXiV manuscript
focused on single function estimation. We additionally develop efficient
computational algorithms, reducing the O(T^3) bottleneck of GPs to O(T)
and obtaining an accurate online algorithm as an alternative to MCMC.
Our methodology can be broadly used and improve on the state of the art in
multivariate time series settings and beyond.
Technical
Comments 1] Online: Our algorithm is not fully online in
updating on the time varying dictionary functions. As T increases the
posterior for the time-stationary parameters rapidly becomes concentrated,
so it is reasonable to fix these parameters at estimates while dynamically
updating the dictionary functions. We validate this algorithm in
simulation studies, showing that our online approximation to the posterior
yields accurate results and predictions. The online algorithm is also
efficient in exploiting the advantages of the state-space formulation for
the dictionary functions. We need matrix inversion computations of order
depending only on the length of the additional sequence H and on the
number of the last observations k used to initialize the algorithm
O((T+H)-(T-k)) = O(H+k), a massive reduction.
2] Simulation Study:
The simulated datasets are not tailored for our model since the dictionary
functions are time-varying functions adapted from Donoho and Johnstone
(1994), instead of being generated from nGP. Moreover the structure of the
underlying mean and covariance processes has been chosen also to mimic
possible behavior in practical settings. The “bumps” in the covariance
functions from the simulated dataset are also found in the estimated
volatility processes in Figure 2 using real data. Similar bumps are
observed in protein mass spectrometry, influenza levels at different
locations over time, and electricity load trends (see e.g. Ba et al.,
2012). 3] Application: There is a rich theory literature
supporting usual GPs having careful hyperpriors on the covariance
parameters, including posterior consistency (Ghosal and Roy, 2006) and
minimax optimal rates of posterior concentration (van der Vaart and van
Zanten, 2008). However, this theory assumes that the function is in a
smooth class, with a single smoothness level at all locations. If the
smoothness varies locally, GPs with a stationary covariance will yield
badly sub-optimal rates. Our simulations for the Fox and Dunson (2011)
approach illustrate how this sub-optimality can lead to poor
performance in applications. We observe that the posteriors for
parameters characterizing the nGP dictionary functions concentrate on
values consistent with varying smoothness, even when priors for these
parameters are noninformative. We learn important new aspects of the data,
which were not previously apparent and do not show up with alternative
analysis methods. In particular, the change of regime as well as the most
evident increase in correlation occur in correspondence with financial
events of main importance worldwide rather than on national events. This
is consistent with the “international contagion effect” theory of
financial markets (Baig and Goldfajn, 1999 and Claessens and Forbes, 2009)
and also with recent applications of stochastic volatility models to
exchange rates (see e.g. Kastner et al. 2013). 4] Prediction:
prediction with unconditional mean in a] seems to lead to over-predicted
values while our approach b] seems at least to provide median-unbiased
predictions. The combination of our approach and the use of conditional
distributions of one return given the others c] further improves
predictions reducing also the variability of the predictive distribution.
We additionally obtain well calibrated predictive intervals unlike
competing methods.
REFERENCES
- Ba, A., Goude Y.,
Sinn M., & Pompey P. (2012). “Adaptive Learning of Smoothing
Functions: Application to Electricity Load Forecasting.” In NIPS (Neural
Information Processing Systems).
- Baig, T., & Goldfajn, I.
(1999). “Financial Market Contagion in the Asian Crisis.” Staff Papers,
International Monetary Fund, 46, 167-195.
- Claessens, S., &
Forbes, K. (2009). International Financial Contagion, An overview of the
Issues. Springer.
- Donoho, D.L., & Johnstone, J.M. (1994).
“Ideal spatial adaptation by wavelet shrinkage.” Biometrika, 81, 425-455.
- Fox, E., & Dunson, D.B. (2011). “Bayesian Nonparametric
Covariance Regression.” arXiv:1101.2017.
- Ghosal, S., & Roy,
A. (2006). “Posterior consistency of Gaussian process prior for
nonparametric binary regression.” The Annals of Statistics,34, 2413-2429.
- Kastner, G., Früwirth-Schatter, S., & Lopes H.F., (2013).
“Efficient Bayesian inference for multivariate factor stochastic
volatility models”. In BAYSM2013 (Bayesian Young Statistician Meeting).
- Van der Vaart, A.W., & Van Zanten, J.H. (2008). “Rates of
contraction of posterior distributions based on Gaussian process priors.”
The Annals of Statistics, 36, 1435-1463.
- Zhu, B., & Dunson,
D.B. (2012). “Locally Adaptive Bayes Nonparametric Regression via Nested
Gaussian Processes.” arXiv:1201.4403.
| |