{"title": "Stock Selection via Nonlinear Multi-Factor Models", "book": "Advances in Neural Information Processing Systems", "page_first": 966, "page_last": 972, "abstract": null, "full_text": "Stock Selection via Nonlinear \n\nMulti-Factor Models \n\nAsriel U. Levin \n\nBZW Barclays Global Investors \n\nAdvanced Strategies and Research Group \n\n45 Fremont Street \n\nSan Francisco CA 94105 \n\nemail: asriel.levin@bglobal.com \n\nAbstract \n\nThis paper discusses the use of multilayer feed forward neural net(cid:173)\nworks for predicting a stock's excess return based on its exposure \nto various technical and fundamental factors. To demonstrate the \neffectiveness of the approach a hedged portfolio which consists of \nequally capitalized long and short positions is constructed and its \nhistorical returns are benchmarked against T-bill returns and the \nS&P500 index. \n\n1 \n\nIntroduction \n\nTraditional investment approaches (Elton and Gruber, 1991) assume that the return \nof a security can be described by a multifactor linear model: \n\n(1) \nwhere Hi denotes the return on security i, Fl are a set of factor values and Uil are \nsecurity i exposure to factor I, ai is an intercept term (which under the CAPM \nframework is assumed to be equal to the risk free rate of return (Sharpe, 1984)) \nand ei is a random term with mean zero which is assumed to be uncorrelated across \nsecurities. \n\nThe factors may consist of any set of variables deemed to have explanatory power for \nsecurity returns. These could be aspects of macroeconomics, fundamental security \nanalysis, technical attributes or a combination of the above. The value of a factor \nis the expected excess return above risk free rate of a security with unit exposure to \nthe factor and zero exposure to all other factors. The choice offactors can be viewed \nas a proxy for the\" state of the world\" and their selection defines a metric imposed \non the universe of securities: Once the factors are set, the model assumption is that, \n\n\fStock Selection via Nonlinear Multi-factor Models \n\n967 \n\non average, two securities with similar factor loadings (Uil) will behave in a similar \nmanner. \n\nThe factor model (1) was not originally developed as a predictive model, but rather \nas an explanatory model, with the returns It; and the factor values Pi assumed to \nbe contemporaneous. To utilize (1) in a predictive manner, each factor value must \nbe replaced by an estimate, resulting in the model \n\nA \n\nA \n\nA \n\nIt; = ai + UilFl + Ui2 F 2 + ... + UiLFL + ei \n\n(2) \nwhere Ri is a security's future return and F/ is an estimate of the future value \nof factor 1, based on currently available information. The estimation of Fl can be \napproached with varying degree of sophistication ranging from a simple use of the \nhistorical mean to estimate the factor value (setting Fl(t) = Fi), to more elaborate \napproaches attempting to construct a time series model for predicting the factor \nvalues. \n\nFactor models of the form (2) can be employed both to control risk and to enhance \nreturn. \nIn the first case, by capturing the major sources of correlation among \nsecurity returns, one can construct a well balanced portfolio which diversifies specific \nrisk away. For the latter, if one is able to predict the likely future value of a factor, \nhigher return can be achieved by constructing a portfolio that tilts toward \"good\" \nfactors and away from \"bad\" ones. \n\nWhile linear factor models have proven to be very useful tools for portfolio analysis \nand investment management, the assumption of linear relationship between factor \nvalues and expected return is quite restrictive. Specifically, the use of linear models \nassumes that each factor affects the return independently and hence, they ignore the \npossible interaction between different factors. Furthermore, with a linear model, the \nexpected return of a security can grow without bound as its exposure to a factor \nincreases. To overcome these shortcomings of linear models, one would have to \nconsider more general models that allow for nonlinear relationship among factor \nvalues, security exposures and expected returns. \n\nGeneralizing (2), while maintaining the basic premise that the state of the world can \nbe described by a vector of factor values and that the expected return of a security \nis determined through its coordinates in this factor world, leads to the nonlinear \nmodel: \n\nIt; = j(Uil' Ui2,\u00b7\u00b7\u00b7, UiL, Fl , F2, ... , FL ) + ei \n\n(3) \nwhere JO is a nonlinear function and ei is the noise unexplained by the model, or \n\"security specific risk\" . \n\nThe prediction task for the nonlinear model (3) is substantially more complex than \nin the linear case since it requires both the estimation of future factor values as \nwell as a determination of the unknown function j. The task can be somewhat \nsimplified if factor estimates are replaced with their historical means: \n\nIt; \n\nJ(Uil, Ui2, ... , UiL, lA, F2, ... , FL) + ei \n\n(4) \nwhere now Uil are the security's factor exposure at the beginning of the period over \nwhich we wish to predict. \nTo estimate the unknown function t(-), a family of models needs to be selected, \nfrom which a model is to be identified. In the following we propose modeling the re(cid:173)\nlationship between factor exposures and future returns using the class of multilayer \nfeedforward neural networks (Hertz et al., 1991). Their universal approximation \n\n\f968 \n\nA. U. LEVIN \n\ncapabilities (Cybenko, 1989; Hornik et al., 1989), as well as the existence of an ef(cid:173)\nfective parameter tuning method (the backpropagation algorithm (Rumelhart et al., \n1986)) makes this family of models a powerful tool for the identification of nonlinear \nmappings and hence a natural choice for modeling (4). \n\n2 The stock selection problem \n\nOur objective in this paper is to test the ability of neural network based models \nof the form (4) to differentiate between attractive and unattractive stocks. Rather \nthan trying to predict the total return of a security, the objective is to predict its \nperformance relative to the market, hence eliminating the need to predict market \ndirections and movements. \n\nThe data set consists of monthly historical records (1989 through 1995) for the \nlargest 1200-1300 US companies as defined by the BARRA HiCap universe. Each \ndata record (::::::1300 per month) consists of an input vector composed of a security's \nfactor exposures recorded at the beginning of the month and the corresponding \noutput is the security's return over the month. The factors used to build the model \ninclude Earning/Price, Book/Price, past price performance, consensus of analyst \nsentiments etc, which have been suggested in the financial literature as having \nexplanatory power for security returns (e.g. (Fama and French, 1992)). To minimize \nrisk, exposure to other unwarranted factors is controlled using a quadratic optimizer. \n\n3 Model construction and testing \n\nPotentially, changes in a price of a security are a function of a very large number of \nforces and events, of which only a small subset can be included in the factor model \n(4). All other sources of return play the role of noise whose magnitude is probably \nmuch larger than any signal that can be explained by the factor exposures. When \nthis information is used to train a neural network, the network attempts to replicate \nthe examples it sees and hence much of what it tries to learn will be the particular \nrealizations of noise that appeared in the training set. \n\nTo minimize this effect, both a validation set and regularization are used in the \ntraining. The validation set is used to monitor the performance of the model with \ndata on which it has not been trained on. By stopping the learning process when \nvalidation set error starts to increase, the learning of noise is minimized. Regular(cid:173)\nization further limits the complexity of the function realized by the network and, \nthrough the reduction of model variance, improves generalization (Levin et al., \n1994). \n\nThe stock selection model is built using a rolling train/test window. First, M \n\"two layer\" feedforward networks are built for each month of data (result is rather \ninsensitive to the particular choice of M). Each network is trained using stochastic \ngradient descent with one quarter of the monthly data (randomly selected) used as \na validation set. Regularization is done using principal component pruning (Levin \net al., 1994). Once training is completed, the models constructed over N consecutive \nmonth of data (again, result is insensitive to particular choice of N) are combined \n(thus increasing the robustness of the model (Breiman, 1994)) to predict the returns \nin the following month. Thus the predicted (out of sample) return of stock i in \nmonth k is given by \n\n(5) \n\n\fStock Selection via Nonlinear Multi-factor Models \n\n969 \n\n0.4 \n\n0.35 \n\n0.3 \n\n0.25 \n\n0.2 \n\n0.15 \n\n0.1 \n\n0.05 \n\n0 \n\n,-, , , \n, \n! \ni \ni \n\nc \n0 \n:;:::; \nctI \n.... \nQ) \n.... \n0 \n() \n\n0 \n\n5 \n\nNonlinear \nLinear \n\n~ -., \nI \n: \ni ! \n! \nr---: \n\nr---l b \n\ni ~mf ----_]'-T_-I \nr- -I \n-r-\n, \n-\nI \nL _J \n10 \nCell \n\n15 \n\n! 1! i \n\nI ! \n-I \nl -l-.+-r-+-,-...L-J.-l-\n' : : \n-L J. \n\n20 \n\nFigure 1 : Average correlation between predicted alphas and realized returns for \nlinear and nonlinear models \n\nwhere k(k) is stock's i predicted return, N Nk-j(\u00b7) denoted the neural network \nmodel built in month k - j and u71 are stock's i factor exposures as measured at \nthe beginning of month k. \n\n4 Benchmarking to linear \n\nAs a first step in evaluating the added value of the nonlinear model, its performance \nwas benchmarked against a generalized least squares linear model. Each model was \nrun over three universes: all securities in the HiCap universe, the extreme 200 stocks \n(top 100, bottom 100 as defined by each model), and the extreme 100 stocks. As \na comparative performance measure we use the Sharpe ratio (Elton and Gruber, \n1991). As shown in Table 4, while the performance of the two models is quite \ncomparable over the whole universe of stocks, the neural network based model \nperforms better at the extremes, resulting in a substantially larger Sharpe ratio \n(and of course, when constructing a portfolio, it is the extreme alphas that have \nthe most impact on performance). \n\nI Portfolio\\Model \nAll HiCap \n100 long/100 short \n50 long/50 short \n\nII Linear Nonlinear II \n6.92 \n5.49 \n4.23 \n\n6.43 \n4.07 \n3.07 \n\nTable 1: Ex ante Sharpe ratios: Neural network vs. linear \n\nWhile the numbers in the above table look quite impressive, it should be emphasised \nthat they do not represent returns of a practical strategy: turnover is huge and the \nfigures do not take transaction costs into account. The main purpose of the table \n\n\f970 \n\nA. U. LEVIN \n\nis to compare the information that can be captured by the different models and \nspecifically to show the added value of the neural network at the extremes. A \npractical implementation scheme and the associated performance will be discussed \nin the next section. \n\nFinally, some insight as to the reason for the improved performance can be gained \nby looking at the correlation between model predictions and realized returns for \ndifferent values of model predictions (commonly referred to as alphas). For that, \nthe alpha range was divided to 20 cells, 5% of observations in each and correlations \nwere calculated separately for each cell. As is shown in figure 1, while both neural \nnetwork and linear model seem to have more predictive power at the extremes, the \nnetwork's correlations are substantially larger for both positive and negative alphas. \n\n5 Portfolio construction \n\nGiven the superior predictive ability of the nonlinear model at the extremes, a \nnatural way of translating its predictions into an investment strategy is through the \nuse of a long/short construct which fully captures the model information on both \nthe positive as well as the negative side. \n\nThe long/short portfolio (Jacobs and Levy, 1993) is constructed by allocating equal \ncapital to long and short positions. By monitoring and controlling the risk charac(cid:173)\nteristics on both sides, one is able to construct a portfolio that has zero correlation \nwith the market ((3 = 0) - a \"market neutral\" portfolio. By construction, the re(cid:173)\nturn of a market neutral portfolio is insensitive to the market up or down swings \nand its only source of return is the performance spread between the long and short \npositions, which in turn is a direct function of the model (5) discernment ability. \n\nSpecifically, the translation of the model predictions into a realistically imple(cid:173)\nmentable strategy is done using a quadratic optimizer. Using the model predicted \nreturns and incorporating volatility information about the various stocks, the opti(cid:173)\nmizer is utilized to construct a portfolio with the following characteristics: \n\n\u2022 Market neutral (equal long and short capitalization). \n\u2022 Total number of assets in the portfolio <= 200. \n\u2022 Average (one sided) monthly turnover ~ 15%. \n\u2022 Annual active risk ~ 5%. \n\nIn the following, all results are test set results (out of sample), net of estimated \ntransaction costs (assumed to be 1.5% round trip). The standard benchmark for \na market neutral portfolio is the return on 3 month T-bill and as can be seen \nin Table 2, over the test period the market neutral portfolio has consistently and \ndecisively outperformed its benchmark. Furthermore, the results reported for 1995 \nwere recorded in real-time (simulated paper portfolio). \n\nAn interesting feature of the long/short construct is its ease of transportability (Ja(cid:173)\ncobs and Levy, 1993). Thus, while the base construction is insensitive to market \nmovement, if one wishes, full exposure to a desired market can be achieved through \nthe use of futures or swaps (Hull, 1993). As an example, by adding a permanent \nS&P500 futures overlay in an amount equal to the invested capital, one is fully \nexposed to the equity market at all time , and returns are the sum of the long/short \nperformance spread and the profits or losses resulting from the market price move(cid:173)\nments. This form of a long/short strategy is referred to as an \"equitized\" strategy \nand the appropriate benchmark will be overlayed index. The relative performance \n\n\fStock Selection via Nonlinear Multi-factor Models \n\n971 \n\nI Statistics \nTotal Return~%) \nAnnual total(Yr%) \nActive Return(%) \nAnnual active(Yr%) \nActive risk(Yr%) \nMax draw down(%) \nTurnover(Y r%) \n\nII T-Bill I Neutral II S&P500 I Equitized II \n264.5 \n27.0 \n162.5 \n16.6 \n4.8 \n10.0 \n198.4 \n\n131.5 \n16.8 \n103.7 \n12.2 \n4.8 \n3.2 \n198.4 \n\n102.0 \n10.4 \n-\n-\n-\n13.9 \n-\n\n27.8 \n4.6 \n-\n-\n-\n-\n-\n\nTable 2: Comparative summary of ex ante portfolio performance (net of transaction \ncosts) 8/90 - 12/95 \n\nEquitized ---\nSP500 -+---\nNeutral .-0 . .. . \nT-bill ...... _. \n\n4 \n\n3.5 \n\n3 \n\n2.5 \n\n2 \n\n1.5 \n\n(I) \n;:) \n\n\u00abi > \n.2 \n~ \n\n0 \n0.. \nC]) \n> \n~ \n\"S \nE \n::I \n() \n\n91 \n\n92 \n\n93 \n\nYear \n\n94 \n\n95 \n\n96 \n\nFigure 2: Cumulative portfolio value 8/90 - 12/95 (net of estimated transaction \ncosts) \n\nof the equitized strategy with an S&P500 futures overlay is presented in Table 2. \nSummary of the accumulated returns over the test period for the market neutral \nand equitized portfolios compared to T-bill and S&P500 are given in Figure 2. \n\nFinally, even though the performance of the model is quite good, it is very difficult \nto convince an investor to put his money on a \"black box\". A rather simple way to \novercome this problem of neural networks is to utilize a CART tree (Breiman et aI., \n1984) to explain the model's structure. While the performance of the tree on the \nraw data in substantially inferior to the network's, it can serve as a very effective \ntool for analyzing and interpreting the information that is driving the model. \n\n6 Conclusion \n\nWe presented a methodology by which neural network based models can be used \nfor security selection and portfolio construction. In spite of the very low signal to \nnoise ratio of the raw data, the model was able to extract meaningful relationship \n\n\f972 \n\nA. U. LEVIN \n\nbetween factor exposures and expected returns. When utilized to construct hedged \nportfolios, these predictions achieved persistent returns with very favorable risk \ncharacteristics. \nThe model is currently being tested in real time and given its continued consistent \nperformance, is expected to go live soon. \n\nReferences \n\nAnderson, J. and Rosenfeld, E., editors (1988) . Neurocomputing: Foundations of \n\nResearch. MIT Press, Cambridge. \n\nBreiman, L. (1994) . Bagging predictors. Technical Report 416, Department of \n\nStatistics, VCB, Berkeley, CA. \n\nBreiman, L., Friedman, J ., Olshen, R., and Stone, C. (1984). Classification and \n\nRegression Trees. Chapman & Hall. \n\nCybenko, G . (1989) . Approximation by superpositions of a sigmoidal function . \n\nMathematics of Control, Signals, and Systems, 2:303-314. \n\nElton , E. and Gruber, M. (1991). Modern Portfolio Theory and Investment Analysis. \n\nJohn Wiley. \n\nFama, E. and French, K. (1992). The cross section of expected stock returns. Journal \n\nof Finance, 47:427- 465 . \n\nHertz, J., Krogh, A., and Palmer, R. (1991). Introduction to the theory of neural \n\ncomputation, volume 1 of Santa Fe Institute studies in the sciences ofcomplex(cid:173)\nity. Addison Wesley Pub. Co. \n\nHornik, K. , Stinchcombe, M., and White, H. (1989). Multilayer feedforward net(cid:173)\n\nworks are universal approximators. Neural Networks, 2:359-366. \n\nHull, J . (1993). Options, Futures and Other Derivative Securities. Prentice-Hall. \nJacobs , B. and Levy, K. (1993). Long/short equity investing. Journal of Portfolio \n\nManagement, pages 52-63. \n\nLevin, A. V., Leen, T. K., and Moody, J . E. (1994) . Fast pruning using principal \ncomponents. In Cowan, J . D., Tesauro, G., and Alspector, J., editors, Advances \nin Neural Information Processing Systems, volume 6. Morgan Kaufmann. to \napear. \n\nRumelhart, D., Hinton, G., and Williams, R. (1986) . Learning representations by \nback-propagating errors. Nature, 323:533- 536. Reprinted in (Anderson and \nRosenfeld, 1988). \n\nSharpe, W . (1984) . Factor models, CAPMs and the APT. Journal of Portfolio \n\nManagement, pages 21-25. \n\n\f", "award": [], "sourceid": 1114, "authors": [{"given_name": "Asriel", "family_name": "Levin", "institution": null}]}