{"title": "Cross-sectional Learning of Extremal Dependence among Financial Assets", "book": "Advances in Neural Information Processing Systems", "page_first": 3857, "page_last": 3867, "abstract": "We propose a novel probabilistic model to facilitate the learning of multivariate tail dependence of multiple financial assets. Our method allows one to construct from known random vectors, e.g., standard normal, sophisticated joint heavy-tailed random vectors featuring not only distinct marginal tail heaviness, but also flexible tail dependence structure. The novelty lies in that pairwise tail dependence between any two dimensions is modeled separately from their correlation, and can vary respectively according to its own parameter rather than the correlation parameter, which is an essential advantage over many commonly used methods such as multivariate $t$ or elliptical distribution. It is also intuitive to interpret, easy to track, and simple to sample comparing to the copula approach. We show its flexible tail dependence structure through simulation. Coupled with a GARCH model to eliminate serial dependence of each individual asset return series, we use this novel method to model and forecast multivariate conditional distribution of stock returns, and obtain notable performance improvements in multi-dimensional coverage tests. Besides, our empirical finding about the asymmetry of tails of the idiosyncratic component as well as the market component is interesting and worth to be well studied in the future.", "full_text": "Cross-sectional Learning of Extremal Dependence\n\namong Financial Assets\n\nXing Yan\n\nSchool of Data Science\n\nCity University of Hong Kong\n\nyanxing128@gmail.com\n\nQi Wu\u2217\n\nSchool of Data Science\n\nCity University of Hong Kong\n\nqiwu55@cityu.edu.hk\n\nWen Zhang\nJD Digits\n\nzhangwen.jd@gmail.com\n\nAbstract\n\nWe propose a novel probabilistic model to facilitate the learning of multivariate\ntail dependence of multiple \ufb01nancial assets. Our method allows one to construct\nfrom known random vectors, e.g., standard normal, sophisticated joint heavy-\ntailed random vectors featuring not only distinct marginal tail heaviness, but also\n\ufb02exible tail dependence structure. The novelty lies in that pairwise tail dependence\nbetween any two dimensions is modeled separately from their correlation, and\ncan vary respectively according to its own parameter rather than the correlation\nparameter, which is an essential advantage over many commonly used methods\nsuch as multivariate t or elliptical distribution. It is also intuitive to interpret, easy\nto track, and simple to sample comparing to the copula approach. We show its\n\ufb02exible tail dependence structure through simulation. Coupled with a GARCH\nmodel to eliminate serial dependence of each individual asset return series, we use\nthis novel method to model and forecast multivariate conditional distribution of\nstock returns, and obtain notable performance improvements in multi-dimensional\ncoverage tests. Besides, our empirical \ufb01nding about the asymmetry of tails of the\nidiosyncratic component as well as the market component is interesting and worth\nto be well studied in the future.\n\n1\n\nIntroduction\n\nExtreme market movements and rare events, which we call tail risk, play a very important role in\nportfolio investments, institutional risk management, and \ufb01nancial regulation. For a single asset, there\nhas been a large body of literature concluding that the asset return follows non-normal distribution\nwith signi\ufb01cant heavy tails. A further complication is that the joint multiple asset return often exhibits\nnonnegligible tail dependence, which implies a higher chance of extreme co-movements than in the\njoint normal case or the independence case. Realizing that much effort has been made on univariate\nheavy tail modeling, it is in great need of specially designed models for multivariate tail dependence.\nTo measure tail dependence, the most common de\ufb01nition of the down-tail dependence coef\ufb01cient of\ntwo random variables X and Y is de\ufb01ned as (see [Frahm et al., 2005]):\n\n(1)\n\n\u03bbD\nX,Y = lim\n\u03c4\u21920+\n\nP{X < QX (\u03c4 ), Y < QY (\u03c4 )}/\u03c4,\n\nwhere QX (\u03c4 ) and QY (\u03c4 ) are \u03c4-quantiles of marginal distributions of X and Y respectively. Sim-\nX,Y = lim\u03c4\u21921\u2212 P{X > QX (\u03c4 ), Y >\nilarly, the up-tail dependence coef\ufb01cient is de\ufb01ned as: \u03bbU\nQY (\u03c4 )}/(1 \u2212 \u03c4 ). The tail dependence coef\ufb01cient measures the degree of extremal, not typical, co-\nmovements between two random variables. It is beyond the usual correlation which is a measure of\naverage dependence. The reason why we should need tail dependence modeling is that the correlation\n\n\u2217Corresponding author.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fis limited in assessing extreme dependence risk during \ufb01nancial crisis [Poon et al., 2003]. Ignoring\ntail dependence will also incur the huge volume of mis-pricing of credit derivatives that may cause\ndisasters [Peng and Kou, 2009]. Actually, researchers have designed a systemic risk indicator using\ntail dependence and predicted well crisis-period stock returns [Balla et al., 2014]. Thus it is of great\nimportance to the regulatory agencies. Besides, tail dependence of multivariate data is a general\nproblem. It appears not only in \ufb01nancial markets, but also in energy markets [Aderounmu and Wolff,\n2014], climatic data [Schoelzel and Friederichs, 2008], and hydrological data [Poulin et al., 2007].\nThe correlation also affects tail dependence since it is an average measure. But if one can separate\nthe effect of correlation, should tail dependencies be different for different pairs of assets? Before\n\ufb01guring out this problem, let us review some existing approaches for modeling tail dependence. First,\ntwo normally distributed random variables with non-zero correlation have no tail dependence (the\nup/down-tail dependence coef\ufb01cient is 0). So multivariate normal distribution may not be appropriate\nfor modeling \ufb01nancial markets. Successful attempts to alleviate this shortcoming include switching\nto non-Gaussian multivariate distributions such as t and elliptical, as well as resorting to the copula\napproach for constructing non-trivial dependence structures.\nMultivariate t is a direct extension of univariate t-distribution. Its parameter, the degrees of freedom,\nrepresents tail heaviness of each individual dimension and controls pairwise tail dependence as well.\nAlthough it is non-zero, the tail dependence coef\ufb01cient is the same for each pair of dimensions\nwhen removing the correlation, and for up and down sides (see [Chan and Li, 2008]). Elliptical\ndistribution is a more general distribution family than multivariate t is. It covers lots of known\ndistribution families including multivariate t, Laplace, power exponential, Kotz distribution, etc. From\n[Lesniewski et al., 2016], the tail dependence coef\ufb01cient of elliptical distribution is mathematically\ndif\ufb01cult to be expressed exactly. But like multivariate t, its pairwise tail dependence coef\ufb01cients vary\nonly according to pairwise correlations. There are no freedoms for tail dependencies themselves.\nCopula approach is \ufb02exible enough to describe many tail dependence structures theoretically (see\nintroductions in [Demarta and McNeil, 2005], [Embrechts et al., 2001], and [Aas et al., 2009]). There\nhas been some literature using copula to model tail dependence of \ufb01nancial returns, such as [Jondeau\nand Rockinger, 2006] and [Frahm et al., 2005]. But most of them only deal with two markets or two\nassets. This is due to the mathematical complexity of copula when modeling three or more variables.\nSome other approaches model tail dependence of assets using non-parametric method or quantile\nregression, see [Poon et al., 2003] and [Beine et al., 2010].\nSo far, no \ufb02exible but compact tail dependence model for multiple assets exists. Noticing that joint\nextreme events come from not only joint daily \ufb02uctuation of asset prices but also some tail shock that\nhas widespread impacts (e.g., the collapse of Lehman Brothers), we realize that it is quite necessary to\nmodel tail dependence separately from the correlation. This is also more realistic if we look at assets\nhaving different \ufb01nancial fundamentals, or coming from different sectors or asset classes. While their\npairwise correlations are de\ufb01nitely different, their pairwise tail dependencies should be different too.\nIn this paper, we solve the problem by proposing a transformation of some known random vector.\nThe resulting new random vector has distinct marginal tail heaviness. More importantly, the novelty\nlies in that pairwise tail dependence of any two dimensions can vary according to its own parameter\nrather than the correlation parameter. So tail dependence is modeled separately from the correlation.\nWe show its \ufb02exible tail dependence structure through simulation. We also propose an intuitive\nlearning algorithm to \ufb01t the novel model with data. Coupled with a GARCH \ufb01lter to describe\nthe conditional multivariate distribution of asset returns, we evaluate our model by doing multi-\ndimensional coverage tests on the forecasts of conditional distribution and achieve better performances\nthan all the competitors do.\n\n2 Lower-triangular Tail Dependence Model\n\nIn [Yan et al., 2018], researchers proposed a novel parametric quantile function to represent a\nunivariate distribution with asymmetrical heavy tails, and used it to model the time-varying conditional\ndistribution of single asset returns. It is a monotonically increasing nonlinear transformation of the\nquantile function of some known distribution Z\u03c4 , \u03c4 \u2208 (0, 1):\n\nQ(\u03c4|\u00b5, \u03c3, u, v) = \u00b5 + \u03c3Z\u03c4\n\n+ 1\n\n+ 1\n\n,\n\nA\n\n(2)\n\n(cid:19)(cid:18) e\u2212vZ\u03c4\n\n(cid:19)\n\n(cid:18) euZ\u03c4\n\nA\n\n2\n\n\f(cid:18) uZ\u03c4\n\nv\u2212Z\u03c4\nA\n\n+\n\nA\n\n(cid:19)\n\nwhere Z\u03c4 can come from standard normal, t-distribution, or some other distribution. The resulting\nparametric function Q(\u03c4|\u00b5, \u03c3, u, v) is a quantile function too, meaning its inverse function exists and\nis a distribution. u \u2265 0 and v \u2265 0 control its right and left-tail heaviness respectively. If in the usual\ncase where Z\u03c4 is standard normal, this new quantile function has an inverted S-shaped Q-Q plot\nagainst standard normal so that it can represent heavy-tailed distribution. In this paper, we propose a\nsimpler form:\n\nQ(\u03c4|\u00b5, \u03c3, u, v) = \u00b5 + \u03c3Z\u03c4\n\n+ 1\n\n(3)\nNow u \u2265 1 and v \u2265 1 are forced. g is used for simplifying the notation. This produces very similar\nheavy tail effects and makes the tail heaviness less sensitive to u and v when they become large,\nwhich is good for numerical computing and analysis.\nThen we argue that this is equivalent to making the same transformation to the corresponding known\nrandom variable z: y = \u00b5 + \u03c3g(z|u, v). The new random variable y has quantile function being\nexactly Equation (3), because of the following lemma:\n\n:= \u00b5 + \u03c3g(Z\u03c4|u, v).\n\nLemma 1 If X is a continuous random variable and has continuous quantile function QX (\u03c4 ),\n\u03c4 \u2208 (0, 1). Y is a function of X: Y = f (X), where f is continuous and strictly increasing. Then Y\nhas quantile function QY (\u03c4 ) = f (QX (\u03c4 )).\n\nThis inspires us to extend the univariate heavy-tailed quantile function in Equation (3) to the multi-\nvariate case by transforming random variables. Recall that a set of i.i.d. standard normal random\nvariables z = [z1, . . . , zn](cid:62) can compose any multivariate normal random vector by linear combina-\ntion: \u00b5 + Bz, where \u00b5 is the mean vector and one can restrict B to be a lower triangular matrix with\nstrictly positive diagonal entries. We make a direct extension to this and propose a new random vector\ny = [y1, . . . , yn](cid:62) with individual heavy tails and pairwise tail dependencies by transforming the\nlatent i.i.d. random variables z = [z1, . . . , zn](cid:62) (zi can follow standard normal, t-distribution, etc.):\n\ny1 = \u00b51 + \u03c311g(z1|u11, v11),\ny2 = \u00b52 + \u03c321g(z1|u21, v21) + \u03c322g(z2|u22, v22),\n. . .\n\nyn = \u00b5n + \u03c3n1g(z1|un1, vn1) + \u03c3n2g(z2|un2, vn2) + \u00b7\u00b7\u00b7 + \u03c3nng(zn|unn, vnn).\n\n(4)\n\nHere \u03c3ii > 0, uij \u2265 1 and vij \u2265 1. A is a positive constant satisfying A \u2265 3 (see [Wu and\nYan, 2019]). We set A = 4 in this paper. Now we have totally n location parameters \u00b51, . . . , \u00b5n,\n(n2 +n)/2 usual correlation/scale parameters \u03c311, \u03c321, \u03c322, . . . , \u03c3nn, (n2 +n)/2 right-tail parameters\nu11, u21, u22, . . . , unn, and (n2 + n)/2 left-tail parameters v11, v21, v22, . . . , vnn. The total number\nof parameters is n + 3(n2 + n)/2.\nOur transformation is analogous to \u00b5 + Bz but we add different tail heaviness for every zj in every\nyi, i.e., zj is replaced by g(zj|uij, vij) in the equation of yi (j \u2264 i). Because y1 and y2 both have\nthe latent variable z1, they are correlated and have tail dependence as well. Intuitively, the new\nrandom vector y has marginally different tail heaviness and distinct pairwise tail dependencies,\nwhich will be veri\ufb01ed by us later. In addition, to make the model robust, sometimes we may want\nto reduce the number of parameters in Equation (4). One effective way to achieve this is to force\nu11 = ui1, v11 = vi1,\u2200i \u2265 1, u22 = ui2, v22 = vi2,\u2200i \u2265 2, and so on. Now the total number of\nparameters is reduced to 3n + (n2 + n)/2.\n\n2.1 Pairwise Tail Dependencies\n\nWe have realized that it is challenging to obtain the exact tail dependence coef\ufb01cients for pairs of\ndimensions of our proposed lower-triangular y. So in this subsection, we qualitatively show that y has\ndistinct pairwise tail dependencies. To reveal that, we numerically approximate the tail dependencies\nof y1 & y2 and y2 & y3, and analyze how they depend on model parameters. Noticing that the\nde\ufb01nition in Equation (1) is a limit, we approximate the down-tail dependence coef\ufb01cient of yi &\nij (\u03c4 ) = P{yi < Qyi(\u03c4 ), yj < Qyj (\u03c4 )}/\u03c4 as the proxy\nyj by choosing a very small \u03c4 and using \u03bbD\ndown-tail dependence. We set \u03c4 = 10\u22123, simulate 107 observations of yi and yj, and calculate the\nempirical value of \u03bbD\n\nij (\u03c4 ). The latent z is standard normal in this analysis.\n\n3\n\n\fFigure 1: The plots of proxy down-tail dependencies (blue lines) and correlations (red lines) against\nvarying model parameters (only one varies every time). The two subplots above are those of the\npair y1 & y2 and the two blow are those of the pair y2 & y3. We change the parameters \u03c321 and v21\nrespectively for the pair y1 & y2 and change the parameters \u03c332 and v32 for the pair y2 & y3.\n\n12(\u03c4 ) and \u03bbD\n\nWe \ufb01rst set \u00b5i = 0, \u03c3ii = 1, \u03c3ij = 0.5 (i > j), uii = 1, and vii = 1.5 for a 3-dimensional y of\nours, and then change the value of one of these parameters every time to examine how the proxy\ndown-tail dependencies \u03bbD\n23(\u03c4 ) change accordingly. To distinguish tail dependence from\nusual correlation, we also examine how the usual correlation coef\ufb01cients change accordingly. The\nresults are shown in Figure 1. In the \ufb01rst subplot of Figure 1, when \u03c321 varies, \u03bbD\n12(\u03c4 ) varies (blue\nline) in a much similar way as the usual correlation of y1 & y2 does (red line). This is a sign that\nthe varying \u03bbD\n12(\u03c4 ) can be attributed to the effect of varying correlation. So \u03c321 is the parameter that\ndetermines mainly the usual correlation. In the second subplot where v21 varies, the proxy down-tail\ndependence \u03bbD\n12(\u03c4 ) changes nearly from 0 to 1 while the correlation stays nearly the same. This\nmakes us conclude that v21 is the dominant parameter determining the down-tail dependence of\ny1 & y2 separately from the correlation. The same analysis applies to the pair of y2 & y3, whose\nresults are shown in the third and fourth subplots of Figure 1, where one can see \u03c332 determines usual\ncorrelation and v32 determines down-tail dependence of y2 & y3.\nOur \ufb01ndings are consistent with our intuition about the tail dependence structure of y. It can be\nextended that vij is the parameter that determines mainly the down-tail dependence of yi & yj. It\nalso applies to the up-tail dependence situation, where uij determines that of yi & yj. Thus, our\nlower-triangular tail dependence model has a rather rich structure of tail dependence comparing\nto the commonly used multivariate normal, multivariate t, and elliptical distribution. We will also\nexperimentally verify this in Section 4. Note that vi1, vi2, . . . , vij should all contribute to the down-\ntail dependence of yi & yj because they share the same latent z1, . . . , zj. We believe vij is the freest\nand dominant one that determines, because others contribute to that of yi & yj\u22121 as well. Besides,\nwe neglected an important situation in which the correlation parameter \u03c3ij is negative. Negative\ncorrelation will lead to negative tail dependence. Different from the positive tail dependence we focus\nin this paper, it measures the dependence when one variable goes to positive extreme and the other\ngoes to negative extreme, or vice versa, which is also common in \ufb01nancial markets. But fortunately,\nour above analysis also applies well and similar conclusions can be made.\n\n2.2 One-factor Tail Dependence Model\n\nReal world data is usually high-dimensional. In high-dimensional case, normally we need to simplify\nthe model structure and reduce the number of parameters. This is also the idea behind factor analysis\nor principal components analysis. In \ufb01nance, there are also factor models for asset returns. In\nthis subsection, we propose a one-factor version of our tail dependence model for relatively high\ndimensional asset returns and we do not consider the asset pricing problem at this moment. Our\none-factor model is a special case of the lower-triangular one in Equation (4).\n\n4\n\n0.511.522.533.544.552100.20.40.60.81Proxy Down Tail Dependence11.21.41.61.822.22.42.62.83v2100.20.40.60.81Proxy Down Tail Dependence0.511.522.533.544.553200.20.40.60.81Proxy Down Tail Dependence11.21.41.61.822.22.42.62.83v3200.20.40.60.81Proxy Down Tail Dependence\f4:\n5:\n\n6:\n\n(7)\n\n(8)\n\nAlgorithm 1 Algorithm for learning parameters of our proposed tail dependence model with data.\nInput: K observations {y:k}K\nParameters: The positive constant A \u2265 3 and the \ufb01ne set of probability levels \u03a8 \u2282 (0, 1).\nOutput: All parameters \u00b5i, \u03c3ij, uij, vij(i \u2265 j) in our model.\n1: for i = 1, . . . , n do\n2:\n3:\n\nk=1. yik is the i-th entry of the column vector y:k.\n\nfor j = 1, . . . , i \u2212 1 do\n\nSolve the following equation system to get the learned \u02c6\u03c3ij, \u02c6uij, \u02c6vij:\njkg(zjk|uij, vij),\nzl\n\njk = \u03c3ij\n\nyikzl\n\n(cid:88)K\n\nl = 1, 3, 5.\n\n(6)\n\n1\nK\n\nk=1\n\nend for\nLet y(cid:48)\nregression problem to get the learned \u02c6\u00b5i, \u02c6\u03c3ii, \u02c6uii, \u02c6vii:\n\nj=1 \u02c6\u03c3ijg(zjk|\u02c6uij, \u02c6vij), k = 1, . . . , K. Solve the following quantile\n\n(cid:88)K\nik = yik \u2212(cid:80)i\u22121\n\n1\nK\n\nk=1\n\nK(cid:88)\n\n(cid:88)\n\nk=1\n\n\u03c4\u2208\u03a8\n\nmin\n\n\u00b5i,\u03c3ii,uii,vii\n\n1\nK\n\nL\u03c4 (y(cid:48)\n\nik, Q(\u03c4|\u00b5i, \u03c3ii, uii, vii)).\n\nSolve the following equation to obtain realizations of zi, i.e., zi1, . . . , ziK:\n\nik = \u02c6\u00b5i + \u02c6\u03c3iig(zik|\u02c6uii, \u02c6vii).\ny(cid:48)\n\n7: end for\n8: return learned parameters \u02c6\u00b5i, \u02c6\u03c3ij, \u02c6uij, \u02c6vij(i \u2265 j).\n\nFor a market-wide or common variable yM , and n single-asset or individual variables y1, . . . , yn, we\nmodel them as:\n\nyM = \u03b1M + \u03b2M g(zM|uM , vM ),\nyi = \u03b1i + \u03b2ig(zM|uM\n\ni\n\ni ) + \u03b3ig(zi|ui, vi),\n, vM\n\ni = 1, . . . , n.\n\n(5)\n\nzM , z1, . . . , zn are latent i.i.d. random variables, e.g., standard normal. In \ufb01nancial context, yM can\nbe the market return like S&P 500 (\ufb01ltered by a GARCH-type model \ufb01rst, see our description later),\nand uM or vM represents its up or down heavy tail. zM is the market factor which is shared by every\nasset return yi (after \ufb01ltering too). yi is decomposed into the market component and the idiosyncratic\ncomponent. While \u03b2i is the average sensitivity of the i-th asset to the market factor, uM\ncan\nbe seen as the tail-side sensitivities of the i-th asset. They cause all yi to have correlation-separated\ntail dependencies with yM , as well as with each other. An extremal realization of zM will cause more\nadditional impact on yi that cannot be captured by \u03b2i solely, which is an average-type sensitivity. And\nincreases. The idiosyncratic component \u03b3ig(zi|ui, vi) of\nthe tail dependence increases as uM\neach asset is also heavy-tailed, and ui, vi represent idiosyncratic heavy tails. In this paper, although\nwe focus on \ufb01nancial modeling, this model can be applied to other \ufb01elds too.\n\ni and vM\ni\n\ni or vM\ni\n\n3 Parameter Learning\n\nWe propose a recursive-type learning algorithm to \ufb01t the proposed tail dependence model with data.\nThis algorithm works for any choice of z. It is a combination of quantile regression and method of\nmoments. Because the one-factor version is a special case of the lower-triangular model, we only\nneed to consider the learning algorithm for the lower-triangular model. The modi\ufb01cations we should\nmake when applying to the one-factor version are straightforward. Suppose we have K observations\n{y:k}K\nk=1, where y:k is a column vector and yik is its i-th entry. From the y1 equation in Equation\n(4), we can conclude that y1 has quantile function being in the form of Equation (3). So quantile\nregression [Koenker and Hallock, 2001] can be applied to learn the parameters of y1 when a \ufb01ne set\nof probability levels \u03a8 \u2282 (0, 1) is chosen:\n\nmin\n\n\u00b51,\u03c311,u11,v11\n\n1\nK\n\nK(cid:88)\n\n(cid:88)\n\nk=1\n\n\u03c4\u2208\u03a8\n\nL\u03c4 (y1k, Q(\u03c4|\u00b51, \u03c311, u11, v11)).\n\n(9)\n\n5\n\n\fHere L\u03c4 is the loss function in quantile regression between the observation and \u03c4-quantile:\nL\u03c4 (y, q) = (\u03c4 \u2212 I(y < q))(y \u2212 q), where I is indicator function. Please see [Yan et al., 2018] for an\nintroduction of multi-quantile regression with a parametric quantile function. In our paper, we set\n\u03a8 = {0.01, 0.02, . . . , 0.98, 0.99} with 99 probability levels. Other smaller set that covers the interval\n(0, 1) suf\ufb01ciently is also acceptable, e.g., {0.01, 0.05, 0.1, . . . , 0.9, 0.95, 0.99} with 21 levels.\nAfter solving the above optimization to get the learned parameters \u02c6\u00b51, \u02c6\u03c311, \u02c6u11, \u02c6v11, one can inverse\nthe y1 equation in Equation (4) to obtain realizations of z1 from y1k. We denote them by z11, . . . , z1K.\nThen, to learn the parameters of y2, we multiply by z1 on both sides of the y2 equation in Equation\n(4) and take expectations. Noticing that z1 and z2 are independent and E[z1] = 0, we have\nE[y2z1] = \u03c321E[z1g(z1|u21, v21)]. Replacing expectations by empirical averages leads to:\n\n1\nK\n\ny2kz1k = \u03c321\n\n1\nK\n\nk=1\n\nz1kg(z1k|u21, v21).\n\n(10)\n\nThis is one equation with three unknowns \u03c321, u21, v21. If multiplying both sides by z3\ninstead, we obtain two more equations:\n\n1 and z5\n1\n\n(cid:88)K\n\n(cid:88)K\n\n(cid:88)K\n\nk=1\n\n(cid:88)K\n\nk=1\n\n1\nK\n\ny2kzl\n\n1k = \u03c321\n\n1\nK\n\nk=1\n\n1kg(z1k|u21, v21),\nzl\n\nl = 3, 5.\n\n(11)\n\nSolving the above three equations jointly gives us the learned parameters \u02c6\u03c321, \u02c6u21, \u02c6v21.\n2 = y2\u2212 \u03c321g(z1|u21, v21) = \u00b52 + \u03c322g(z2|u22, v22)\nAfter this, we consider a new random variable y(cid:48)\n2k = y2k \u2212 \u02c6\u03c321g(z1k|\u02c6u21, \u02c6v21). And its quantile function is exactly in the\nwhose realizations are y(cid:48)\nform of Equation (3), or speci\ufb01cally, is Q(\u03c4|\u00b52, \u03c322, u22, v22). So we can again apply quantile\nregression like in Equation (9) to learn \u00b52, \u03c322, u22, v22. Actually, all the remaining parameters of\ny3, . . . , yn can be learned one by one following the same procedure. We summarize all the steps\nin Algorithm 1. Note that if one wants to reduce the number of parameters and restrict u11 = ui1,\nv11 = vi1, \u2200i \u2265 1, u22 = ui2, v22 = vi2, \u2200i \u2265 2, and so on, there will be only one equation and one\nunknown \u03c3ij in Equation (6).\n\n3.1 Modeling Multivariate Asset Returns\n\nSuppose we have n assets and their returns in T days are rit, i = 1, . . . , n, t = 1, . . . , T . To model\nthe conditional distribution of [r1t, . . . , rnt](cid:62) using information up to time t \u2212 1, we cannot ignore\nthe serial dependence of each individual return series. The most recognized serial dependence of\nsingle asset returns is volatility clustering, which can be well captured by a GARCH-type model\n[Engle, 1982][Bollerslev, 1986]. We \ufb01rst adopt a AR(1)-GARCH(1,1)-like model to describe each\nasset return series individually:\n\nrt = \u00b5t + \u03c3t\u03b5t,\n\u00b5t = \u03b30 + \u03b31rt\u22121,\n\u03c32\nt = \u03b20 + \u03b21(\u03c3t\u22121\u03b5t\u22121)2 + \u03b22\u03c32\n\nt\u22121.\n\n(12)\n\nFor simplicity, we drop the subscript i in the above equations. So for every time t, there are n\ninnovations [\u03b51t, . . . , \u03b5nt](cid:62). We model them with our proposed tail dependence model, the lower-\ntriangular or one-factor version, and suppose they are i.i.d. at time t = 1, . . . , T .\nThe above model for multivariate asset returns is not easy to \ufb01t directly. So we take an indirect\nbut effective way to do this. First, an AR(1)-GARCH(1,1) with t-distribution innovation is \ufb01tted\nto each return series. Then we collect all the residuals [\u02c6\u03b51t, . . . , \u02c6\u03b5nt](cid:62), t = 1, . . . , T and \ufb01t our tail\ndependence model with them using Algorithm 1. For comparisons, other methods like multivariate\nnormal, multivariate t, elliptical distribution, or copula approach can be used instead. We show the\ncomparison results in Section 5.\n\n4 Simulation Experiment\n\nIn this section, we experimentally verify the rich tail dependence structure of our model and compare it\nto the most widely used multivariate heavy-tailed distribution, the multivariate t-distribution, through\nsimulation. On one hand, 106 data points are sampled from a 3-dimensional t-distribution with 5\ndegrees of freedom and then we use this sampled data to \ufb01t our lower-triangular model with standard\n\n6\n\n\fFigure 2: The proxy down-tail dependence \u03bbD\nij (\u03c4 ) against \u03c4. The \ufb01rst subplot is from the 3-\ndimensional t-distribution we specify and the second one is from our model \ufb01tted using samples from\nthe t-distribution. Three lines represent three pairs of dimensions.\n\nFigure 3: The proxy down-tail dependence \u03bbD\nij (\u03c4 ) against \u03c4. The \ufb01rst subplot is from our lower-\ntriangular model and the second one is from the 3-dimensional t-distribution \ufb01tted using samples of\nour model. Three lines represent three pairs of dimensions.\n\nij (\u03c4 ) = P{yi < Qyi(\u03c4 ), yj < Qyj (\u03c4 )}/\u03c4 for\nnormal latent z. Then proxy down-tail dependence \u03bbD\neach pair of dimensions i & j is calculated for both the multivariate t-distribution and our model.\nWe change \u03c4 in [10\u22123, 0.1] and plot \u03bbD\nij (\u03c4 ) against the varying \u03c4 in Figure 2. The \ufb01rst subplot of\nFigure 2 is from the 3-dimensional t-distribution we specify. The second subplot is from our model\n\ufb01tted. Lines of different colors represent different pairs of dimensions. We see two models generate\nvery similar line patterns, indicating that our model does capture the tail dependence structure of the\nt-distribution. Different levels of the 3 lines are due to different correlations of pairs.\nOn the other hand, inversely, 106 samples from our lower-triangular model with standard normal\nlatent z are generated and we \ufb01t a multivariate t-distribution using these samples. Again the proxy\ndown-tail dependencies of every pairs of dimensions of these two models are calculated. We plot\nthese \u03bbD\nij (\u03c4 ) against \u03c4 in Figure 3, where we can see the 3-dimensional t-distribution (the second\nsubplot) cannot generate line patterns that are close to those generated by our model (the \ufb01rst subplot),\nindicating that the t-distribution cannot capture the tail dependence structure of ours. This proves the\n\ufb02exible tail dependence structure of our model.\n\n5 Conditional Distribution Forecasts\n\nOur method described in Section 3.1 can forecast the conditional distribution of multiple asset returns\non testing data after training. The training set and testing set are two successive multivariate time\nseries of daily returns. We use statistical hypothesis testing to evaluate the forecasts. Recall that\nfor evaluating univariate Value-at-Risk (VaR) forecasts, [Kupiec, 1995] proposed an unconditional\ncoverage test that checks if the proportion of VaR violations in testing period is equal to the probability\nlevel of VaR. Inspired by this, we de\ufb01ne a new type of violation for the two-dimensional case. Given\nrandom variables (X, Y ) and a \ufb01xed probability level \u03c4, suppose a \u03c4\u2217 solves the following equation:\n(13)\nwhere QX (\u03c4\u2217) and QY (\u03c4\u2217) are \u03c4\u2217-quantiles of marginal distributions of X and Y respectively. If a\nrealization of (X, Y ) is located in the area [\u2212\u221e, QX (\u03c4\u2217)] \u00d7 [\u2212\u221e, QY (\u03c4\u2217)], we say it is a violation,\notherwise it is not. The probability of violation is obviously \u03c4. To solve Equation (13) to get \u03c4\u2217, we\ncan use bisection method on many samples of (X, Y ) when the analytical distribution is not known.\nSuppose for a sequence of pairs {(Xt, Yt)}, we have forecasted its conditional distribution for every\nt. Given the realization of {(Xt, Yt)}, i.e., the observations in the testing set, there is a sequence\n\nP{X < QX (\u03c4\u2217), Y < QY (\u03c4\u2217)} = \u03c4,\n\n7\n\n00.010.020.030.040.050.060.070.080.090.100.20.40.60.81Proxy Down Tail Dependence00.010.020.030.040.050.060.070.080.090.100.20.40.60.81Proxy Down Tail Dependence00.010.020.030.040.050.060.070.080.090.100.20.40.60.81Proxy Down Tail Dependence00.010.020.030.040.050.060.070.080.090.100.20.40.60.81Proxy Down Tail Dependence\fTable 1: Unconditional coverage test statistic. (a) The \ufb01rst stock group: Apple, IBM, Microsoft. (b)\nThe second: Apple, JP Morgan, Walmart. \u03c4 is 0.01 and \u2217 represents the hypothesis is rejected at 95%\ncon\ufb01dence level. In parentheses it is the number of violations against ideal number of violations.\n\nMethod\\Pair\nNormal\n\nt-distribution\n\nClayton copula\n\nGumbel copula\n\nOur model\n\n(a)\n1&2\n6.96\u2217\n(10/21)\n10.33\u2217\n(8/21)\n12.38\u2217\n(7/21)\n0.83\n(25/21)\n1.19\n(16/21)\n\n1&3\n0.39\n(18/21)\n2.50\n(14/21)\n3.37\n(13/21)\n3.66\n(30/21)\n0.24\n(23/21)\n\n2&3\n0.83\n(25/21)\n0.15\n(19/21)\n2.50\n(14/21)\n15.54\u2217\n(41/21)\n3.66\n(30/21)\n\nMethod\\Pair\nNormal\n\nt-distribution\n\nClayton copula\n\nGumbel copula\n\nOur model\n\n(b)\n1&2\n1.16\n(19/24)\n2.34\n(17/24)\n2.34\n(17/24)\n3.00\n(33/24)\n0.15\n(26/24)\n\n1&3\n8.98\u2217\n(11/24)\n12.53\u2217\n(9/24)\n14.62\u2217\n(8/24)\n3.99\u2217\n(15/24)\n5.01\u2217\n(14/24)\n\n2&3\n0.19\n(22/24)\n0.41\n(21/24)\n2.34\n(17/24)\n4.40\u2217\n(35/24)\n0.15\n(26/24)\n\nof whether the violation happens or not. Ideally, this 0-or-1 sequence should be samples from i.i.d.\nBernoulli distribution with parameter \u03c4. To check if the proportion of violations in this sequence is \u03c4,\nKupiec\u2019s test [Kupiec, 1995] for univariate case can be applied. The statistic of Kupiec\u2019s test is:\n\n)m(cid:17) \u2212 2 log(cid:0)(1 \u2212 \u03c4 )T\u2212m\u03c4 m(cid:1) ,\n\n(14)\n\n(cid:16)\n\nTK = 2 log\n\n(1 \u2212 m\nT\n\n)T\u2212m(\n\nm\nT\n\nwhere T is the length of the sequence, and m is the number of violations. This statistic is asymptoti-\ncally distributed on [0, +\u221e) as a chi-square with 1 degree of freedom. A zero of the statistic means\nthe proportion of violations is exactly \u03c4. A large value of this statistic indicates the failure of forecasts.\nAt 95% con\ufb01dence level, the threshold for rejecting the hypothesis is 3.84. In our experiments, we\nset \u03c4 = 0.01. For more than two assets, we do this test for any pair of assets while the model may be\nhigh-dimensional. From this, we can see if the pairwise tail dependencies are captured by the model.\nOne needs to notice that the latent z is always standard normal in the experiments.\n\n5.1 Lower-triangular Model\n\nWe select two groups of stocks and evaluate our lower-triangular model as well as other competing\nmethods on them. Each group contains 3 stocks that are representatives in the market. In the \ufb01rst\ngroup, 3 stocks from IT sector are selected: Apple, IBM, and Microsoft. In the second, 3 stocks are\nfrom different sectors: Apple, JP Morgan, and Walmart. The return data of these two groups start\nrespectively from 14 March 1986 and 15 December 1980, and both end on 20 February 2019. They\nhave 8,302 and 9,627 observations respectively. We leave the last one-fourth of the time series of\neach group as testing set. All returns are calculated by rt = 100 log(Pt/Pt\u22121), where Pt is the price.\nFor comparison, we also try competing methods including multivariate normal, multivariate t, Clayton\ncopula [Clayton, 1978], and Gumbel copula [Kole et al., 2007]. The two copulas used here are\nbivariate. This is feasible when our evaluation is pairwise. We report the test statistic given by each\nmethod as well as the number of violations against the ideal number of violations in Table 1. In\npart (a) showing the results of the \ufb01rst stock group, we can see that while other methods all get at\nleast one rejection at one of the three dimension pairs, our model performs without one rejection. It\nimplies that our model does capture the distinct pairwise tail dependencies. In part (b) showing the\nresults of the second stock group, our model performs fairly well on dimension pairs 1&2 and 2&3.\nThe numbers of violations are very close to the ideal ones. Notice that all methods get rejected on\ndimension pair 1&3, suggesting the possibility of a regime-switching or similar thing happened from\nthe training set to the testing set on that dimension pair. Overall, our proposed model does reach its\npurpose of design, as veri\ufb01ed by the results shown here.\n\n5.2 One-factor Model\n\nTo evaluate our one-factor model, data of 15 representative Dow-Jones stocks are collected such as\nAAPL, BA, JPM, and PG. The SP500 return serves as the market-wide variable. The multivariate\nreturn data starts from 1980-12-15 and ends at 2019-05-21 with 9,690 observations. Again the last\none-fourth is left for testing. We use Algorithm 1 with very slight modi\ufb01cation to \ufb01t our one-factor\nmodel with the multivariate data. The modi\ufb01cation is easy to be obtained by the readers.\n\n8\n\n\fTable 2: The parameters of every asset in the one-factor model we have learnt (see Equation (5) for\nthe introduction). SP500 is the market-wide variable and 15 representative stocks are selected into\nthe model. The market variable SP500 has no parameters \u03b3i, ui, vi.\n\nAsset\\Parameter\nSP500\nAAPL\nBA\nCAT\nCVX\nDIS\nDWDP\nIBM\nINTC\nJNJ\nJPM\nKO\nMMM\nNKE\nPG\nWMT\n\n\u03b1i\n\n-0.0098\n0.0256\n-0.0223\n0.0653\n0.0368\n0.0075\n-0.0231\n0.0686\n0.0410\n-0.0091\n0.0581\n0.0219\n0.0037\n-0.0083\n0.0335\n-0.0237\n\n\u03b2i\n\n0.6814\n0.3295\n0.3276\n0.3882\n0.3322\n0.3557\n0.3644\n0.4330\n0.4240\n0.3430\n0.4070\n0.3828\n0.4023\n0.3627\n0.3829\n0.3769\n\nuM\ni\n\n1.6914\n1.0000\n1.8424\n1.0000\n1.0000\n1.6818\n1.7973\n1.0000\n1.0000\n1.8771\n1.0000\n1.4783\n1.8597\n2.4380\n1.3750\n1.8415\n\nvM\ni\n\n1.8241\n1.7909\n1.8344\n1.9623\n1.9017\n2.0383\n1.8581\n1.9335\n1.6414\n2.0107\n2.0084\n1.9378\n1.9330\n1.9460\n1.9521\n1.6061\n\n\u03b3i\n\u2014\n\n0.6071\n0.6043\n0.5947\n0.5898\n0.5651\n0.5576\n0.5216\n0.5443\n0.5730\n0.5393\n0.5759\n0.5658\n0.7101\n0.5998\n0.5488\n\nui\n\u2014\n\n2.0504\n1.7252\n1.9417\n1.7723\n1.9214\n1.9340\n2.0673\n1.7529\n2.1325\n1.9392\n1.9990\n1.9396\n3.1765\n2.0163\n1.8287\n\nvi\n\u2014\n\n1.6992\n1.4940\n1.6630\n1.5853\n1.6551\n1.6005\n1.8113\n1.5982\n1.7065\n1.6606\n1.6679\n1.7124\n2.6941\n1.7129\n1.6256\n\nOur competing methods are one-factor Gaussian and one-factor t, in which we replace g(zM| . . . )\nand g(zi| . . . ) in our one-factor Equation (5) by Gaussian/t-distributed zM and zi. The degrees of\nfreedom of the t-distribution can be different for different assets. Since we have 16 assets totally,\nthere are C 2\n16 = 120 pairs of dimensions. We evaluate these three models by reporting their numbers\nof rejections obtained in the 120 tests. Respectively, the one-factor Gaussian, t, and our model obtain\n52, 43, and 32 rejections. The improvements are consistent with the intuition that when heavy tails\nare modeled and when the tails are separately modeled from correlations, the performances are better.\nWe have some interesting \ufb01ndings on the asymmetry of tails. In Table 2, we list the parameters of the\n15 stocks as well as of SP500. It is not a coincidence when idiosyncratic components of all stocks\nare right-skewed, i.e., ui > vi for all i. In contrast, for most stocks uM\n, which means the\nmarket-wide tail impact is greater on the down side. A large SP500 drop will affect nearly all stocks\nseverely while this is not the case on the up side. Actually, many stocks have uM\ni = 1, showing no\ntail sensitivity to the market variable on the up side. This deserves to be well studied in the future.\n\ni < vM\ni\n\n6 Conclusions\n\nIn summary, we propose a novel transformed random vector that is from some known random vector\nlike standard normal, to learn the correlation-separated multivariate tail dependence structure of\n\ufb01nancial assets. We design it to let it have not only different marginal tail heaviness but also distinct\npairwise tail dependencies. Our model has a lower-triangular version and a one-factor version. We\nalso propose an algorithm to \ufb01t it with data. It is proved numerically to have distinct pairwise tail\ndependencies, which is an essential advantage over many commonly used methods. Combined with a\nGARCH-type model, we use it to forecast the conditional distribution of multi-dimensional asset\nreturns and achieve signi\ufb01cant performance improvements.\nThe empirical \ufb01ndings on the asymmetry of tails are interesting and worth to be well studied in the\nfuture. Many related questions need to be answered by further studies. For example, how to interpret\nthe source of this idiosyncratic right skewness, what forms the market variable\u2019s left skewness when\neach component stock is right-skewed, and what their consequences are for asset pricing, either\ntheoretically or empirically. Future works also include theoretical analysis on our model, especially\nthe analytical tail dependence formula and the properties of the \ufb01tting algorithm.\n\nAcknowledgements\n\nQi WU acknowledges the \ufb01nancial support from the City University of Hong Kong grant SRG-\nFd 7005300, and the Hong Kong Research Grants Council, particularly the Early Career Scheme\n\n9\n\n\f24200514 and the General Research Funds 14211316 and 14206117. This work was undertaken in\npart while Xing YAN was working with JD Digits.\n\nReferences\n\n[Aas et al., 2009] Aas, K., Czado, C., Frigessi, A., and Bakken, H. (2009). Pair-copula constructions\n\nof multiple dependence. Insurance: Mathematics and economics, 44(2):182\u2013198.\n\n[Aderounmu and Wolff, 2014] Aderounmu, A. A. and Wolff, R. (2014). Assessing tail dependence\n\nin electricity markets. Available at SSRN 2373591.\n\n[Balla et al., 2014] Balla, E., Ergen, I., and Migueis, M. (2014). Tail dependence and indicators of\n\nsystemic risk for large us depositories. Journal of Financial Stability, 15:195\u2013209.\n\n[Beine et al., 2010] Beine, M., Cosma, A., and Vermeulen, R. (2010). The dark side of global\n\nintegration: Increasing tail dependence. Journal of Banking & Finance, 34(1):184\u2013192.\n\n[Bollerslev, 1986] Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity.\n\nJournal of econometrics, 31(3):307\u2013327.\n\n[Chan and Li, 2008] Chan, Y. and Li, H. (2008). Tail dependence for multivariate t-copulas and its\n\nmonotonicity. Insurance: Mathematics and Economics, 42(2):763\u2013770.\n\n[Clayton, 1978] Clayton, D. G. (1978). A model for association in bivariate life tables and its appli-\ncation in epidemiological studies of familial tendency in chronic disease incidence. Biometrika,\n65(1):141\u2013151.\n\n[Demarta and McNeil, 2005] Demarta, S. and McNeil, A. J. (2005). The t copula and related copulas.\n\nInternational statistical review, 73(1):111\u2013129.\n\n[Embrechts et al., 2001] Embrechts, P., Lindskog, F., and McNeil, A. (2001). Modelling dependence\nwith copulas. Rapport technique, D\u00e9partement de math\u00e9matiques, Institut F\u00e9d\u00e9ral de Technologie\nde Zurich, Zurich.\n\n[Engle, 1982] Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of\nthe variance of united kingdom in\ufb02ation. Econometrica: Journal of the Econometric Society,\npages 987\u20131007.\n\n[Frahm et al., 2005] Frahm, G., Junker, M., and Schmidt, R. (2005). Estimating the tail-dependence\n\ncoef\ufb01cient: properties and pitfalls. Insurance: mathematics and Economics, 37(1):80\u2013100.\n\n[Jondeau and Rockinger, 2006] Jondeau, E. and Rockinger, M. (2006). The copula-garch model of\nconditional dependencies: An international stock market application. Journal of international\nmoney and \ufb01nance, 25(5):827\u2013853.\n\n[Koenker and Hallock, 2001] Koenker, R. and Hallock, K. F. (2001). Quantile regression. Journal\n\nof economic perspectives, 15(4):143\u2013156.\n\n[Kole et al., 2007] Kole, E., Koedijk, K., and Verbeek, M. (2007). Selecting copulas for risk man-\n\nagement. Journal of Banking & Finance, 31(8):2405\u20132423.\n\n[Kupiec, 1995] Kupiec, P. (1995). Techniques for verifying the accuracy of risk measurement models.\n\nFEDS Paper, (95-24).\n\n[Lesniewski et al., 2016] Lesniewski, A., Sun, H., and Wu, Q. (2016). Asymptotics of portfolio tail\n\nrisk metrics for elliptically distributed asset returns. Available at SSRN 2748970.\n\n[Peng and Kou, 2009] Peng, X. and Kou, S. (2009). Default clustering and valuation of collateralized\n\ndebt obligations. Working Paper.\n\n10\n\n\f[Poon et al., 2003] Poon, S.-H., Rockinger, M., and Tawn, J. (2003). Extreme value dependence\nin \ufb01nancial markets: Diagnostics, models, and \ufb01nancial implications. The Review of Financial\nStudies, 17(2):581\u2013610.\n\n[Poulin et al., 2007] Poulin, A., Huard, D., Favre, A.-C., and Pugin, S. (2007). Importance of tail\ndependence in bivariate frequency analysis. Journal of Hydrologic Engineering, 12(4):394\u2013403.\n[Schoelzel and Friederichs, 2008] Schoelzel, C. and Friederichs, P. (2008). Multivariate non-\nnormally distributed random variables in climate research\u2013introduction to the copula approach.\nNonlinear Processes in Geophysics, 15(5):761\u2013772.\n\n[Wu and Yan, 2019] Wu, Q. and Yan, X. (2019). Capturing deep tail risk via sequential learning of\n\nquantile dynamics. Journal of Economic Dynamics and Control, page 103771.\n\n[Yan et al., 2018] Yan, X., Zhang, W., Ma, L., Liu, W., and Wu, Q. (2018). Parsimonious quan-\ntile regression of \ufb01nancial asset tail dynamics via sequential learning. In Advances in Neural\nInformation Processing Systems, pages 1582\u20131592.\n\n11\n\n\f", "award": [], "sourceid": 2122, "authors": [{"given_name": "Xing", "family_name": "Yan", "institution": "City University of Hong Kong"}, {"given_name": "Qi", "family_name": "Wu", "institution": "City University of Hong Kong"}, {"given_name": "Wen", "family_name": "Zhang", "institution": "JD Digital"}]}