{"title": "FINANCIAL APPLICATIONS OF LEARNING FROM HINTS", "book": "Advances in Neural Information Processing Systems", "page_first": 411, "page_last": 418, "abstract": null, "full_text": "FINANCIAL APPLICATIONS OF \n\nLEARNING FROM HINTS \n\nYaser s. Abu-Mostafa \n\nCalifornia Institute of Technology \n\nand \n\nNeuroDollars, Inc. \n\ne-mail: yaser@caltech.edu \n\nAbstract \n\nThe basic paradigm for learning in neural networks is 'learning from \nexamples' where a training set of input-output examples is used to \nteach the network the target function. Learning from hints is a gen(cid:173)\neralization of learning from examples where additional information \nabout the target function can be incorporated in the same learning \nprocess. Such information can come from common sense rules or \nspecial expertise. In financial market applications where the train(cid:173)\ning data is very noisy, the use of such hints can have a decisive \nadvantage. We demonstrate the use of hints in foreign-exchange \ntrading of the U.S. Dollar versus the British Pound, the German \nMark, the Japanese Yen, and the Swiss Franc, over a period of 32 \nmonths. We explain the general method of learning from hints and \nhow it can be applied to other markets. The learning model for \nthis method is not restricted to neural networks. \n\n1 \n\nINTRODUCTION \n\nWhen a neural network learns its target function from examples (training data), \nit knows nothing about the function except what it sees in the data. In financial \nmarket applications, it is typical to have limited amount of relevant training data, \nwith high noise levels in the data. The information content of such data is modest, \nand while the learning process can try to make the most of what it has, it cannot \ncreate new information on its own. This poses a fundamental limitation on the \n\n\f412 \n\nYaser S. Abu-Mostafa \n\nlearning approach, not only for neural networks, but for all other models as well. It \nis not uncommon to see simple rules such as the moving average outperforming an \nelaborate learning-from-examples system. \nLearning from hints (Abu-Mostafa, 1990, 1993) is a value-added feature to learning \nfrom examples that boosts the information content in the data. The method allows \nus to use prior knowledge about the target function, that comes from common sense \nor expertise, along with the training data in the same learning process. Different \ntypes of hints that may be available in a given application can be used simulta(cid:173)\nneously. In this paper, we give experimental evidence of the impact of hints on \nlearning performance, and explain the method in some detail to enable the readers \nto try their own hints in different markets. \nEven simple hints can result in significant improvement in the learning performance. \nFigure 1 shows the learning performance for foreign exchange (FX) trading with \nand without the symmetry hint (see section 3), using only the closing price history. \nThe plots are the Annualized Percentage Returns (cumulative daily, unleveraged, \ntransaction cost included), for a sliding one-year test window in the period from \nApril 1988 to November 1990, averaged over the four major FX markets with more \nthan 150 runs per currency. The error bar in the upper left corner is 3 standard de(cid:173)\nviations long (based on 253 trading days, assuming independence between different \nruns). The plots establish a statistically significant differential in performance due \nto the use of hints. This differential holds for all four currencies. \n\n10 r---------~--------_r--------~--------~r_--------~ \n\nI \n\n30 \n\naverage: without hint -;-I\" \n\naverage: with hint;..':':.--, \n\n\" \n~ \n\n/\"_,,, \n\n8 \n\n6 \n\n4 \n\n2 \n\no \n\n\" \" \" \n\n\" \n\n/ \n,-' \n\" \n,-' \n\niI'\u00b7.' \n\n\" ,-\n\nJ \n\n.. , .. I \n\n, ..... r';' \n\n_\",'''.-~''''''''.'-'' \n\n,~_\" .. \" \n\n,-,,-, .. '\" \n\n,.1 \n\n.. ,,-\" \n\n_A' \n\n'.~:<.~.~ .......................................... . \n\n-2 ~--------~--------~--______ ~ ________ L-________ ~ \n250 \n\n100 \n\n200 \n\no \n\n50 \n\n150 \nTest Day Number \n\nFigure 1: Learning performance with and without hint \n\nSince the goal of hints is to add information to the training data, the differential \nin performance is likely to be less dramatic if we start out with more informative \ntraining data. Similarly, an additional hint may not have a pronounced effect if \n\n\fFinancial Applications of Learning from Hints \n\n413 \n\nwe have already used a few hints in the same application. There is a saturation in \nperformance in any market that reflects how well the future can be forecast from the \npast. (Believers in the Efficient Market Hypothesis consider this saturation to be at \nzero performance). Hints will not make us forecast a market better than whatever \nthat saturation level may be. They will, however, enable us to approach that level \nthrough learning. \nThis paper is organized as follows. Section 2 characterizes the notion of very noisy \ndata by defining the '50% performance range'. We argue that the need for extra \ninformation in financial market applications is more pronounced than in other pat(cid:173)\ntern recognition applications. In section 3, we discuss our method for learning from \nhints. We give examples of different types of hints, and explain how to represent \nhints to the learning process. Section 4 gives result details on the use of the sym(cid:173)\nmetry hint in the four major FX markets. Section 5 provides experimental evidence \nthat it is indeed the information content of the hint, rather than the incidental \nregularization effect, that results in the performance differential that we observe. \n\n2 FINANCIAL DATA \n\nThis section provides a characterization of very noisy data that applies to the finan(cid:173)\ncial markets. For a broad treatment of neural-network applications to the financial \nmarkets, the reader is referred to (Abu-Mostafa et al, 1994). \n\nOther Information \n\nInput \n\nX \n\nInput \n\nX \n\n--\n\n--\n\nMARKET \n\nTarget Ouput \n\n... -\n\ny \n\nNEURAL \n\nNETWORK \n\nForecast \n\n- y \n-\n\nFigure 2: Illustration of the nature of noise in financial markets \n\nConsider the market as a system that takes in a lot of information (fundamentals, \nnews events, rumors, who bought what when, etc.) and produces an output y (say \nup/down price movement for simplicity) . A model, e.g., a neural network, attempts \n\n\f414 \n\nYaser S. Abu-Mostafa \n\nto simulate the market (figure 2), but it takes an input x which is only a small \nsubset of the information. The 'other information' cannot be modeled and plays \nthe role of noise as far as x is concerned. The network cannot determine the target \noutput y based on x alone, so it approximates it with its output y. It is typical that \nthis approximation will be correct only slightly more than half the time. \nWhat makes us consider x 'very noisy' is that y and y agree only! + f. of the time \n(50% performance range). This is in contrast to the typical pattern recognition \napplication, such as optical character recognition, where y and y agree 1 - f. of the \ntime (100% performance range). It is not the poor performance per se that poses \na problem in the 50% range, but rather the additional difficulty of learning in this \nrange. Here is why. \nIn the 50% range, a performance of ! + f. is good, while a performance of ! -\nf. is disastrous. During learning, we need to distinguish between good and ~ad \nhypotheses based on a limited set of N examples. The problem with the 50% range \nis that the number of bad hypotheses that look good on N points is huge. This is \nin contrast to the 100% range where a good performance is as high as 1 - f.. The \nnumber of bad hypotheses that look good here is limited. Therefore, one can have \nmuch more confidence in a hypothesis that was learned in the 100% range than \none learned in the 50% range. It is not uncommon to see a random trading policy \nmaking good money for a few weeks, but it is very unlikely that a random character \nrecognition system will read a paragraph correctly. \nOf course this problem would diminish if we used a very large set of examples, \nbecause the law of large numbers would make it less and less likely that y and y can \nagree! + f. of the time just by 'coincidence'. However, financial data has the other \nproblem of non-stationarity. Because of the continuous evolution in the markets, \nold data may represent patterns of behavior that no longer hold. Thus, the relevant \ndata for training purposes is limited to fairly recent times. Put together, noise and \nnon-stationarity mean that the training data will not contain enough information \nfor the network to learn the function. More information is needed, and hints can \nbe the means of providing it. \n\n3 HINTS \n\nIn this section, we give examples of different types of hints and discuss how to \nrepresent them to the learning process. We describe a simple way to use hints \nthat allows the reader to try the method with minimal effort. For a more detailed \ntreatment, please see (Abu-Mostafa, 1993). \nAs far as our method is concerned, a hint is any property that the target function \nis known to have. For instance, consider the symmetry hint in FX markets as it \napplies to the U.S. Dollar versus the German Mark (figure 3). This simple hint \nasserts that if a pattern in the price history implies a certain move in the market, \nthen this implication holds whether you are looking at the market from the U.S. \nDollar viewpoint or the German Mark viewpoint. Formally, in terms of normalized \nprices, the hint translates to invariance under inversion of these prices. \nIs the symmetry hint valid? The ultimate test for this is how the learning perfor(cid:173)\nmance is affected by the introduction of the hint. The formulation of hints is an art. \n\n\fFinancial Applications of Learning from Hints \n\n415 \n\nWe use our experience, common sense, and analysis of the market to come up with \na list of what we believe to be valid properties of this market. We then represent \nthese hints in a canonical form as we will see shortly, and proceed to incorporate \nthem in the learning process. The improvement in performance will only be as good \nas the hints we put in. \n\nu.s. \n\nDOLLAR \n\n? \u2022 \n\nGERMAN \n\nMARK \n\nFigure 3: Illustration of the symmetry hint in FX markets \n\nThe canonical representation of hints is a more systematic task. The first step in \nrepresenting a hint is to choose a way of generating 'virtual examples' of the hint. \nFor illustration, suppose that the hint asserts that the target function y is an odd \nfunction of the input. An example of this hint would have the form y( -x) = -y(x) \nfor a particular input x. One can generate as many virtual examples as needed by \npicking different inputs. \nAfter a hint is represented by virtual examples, it is ready to be incorporated in \nthe learning process along with the examples of the target function itself. Notice \nthat an example of the function is learned by minimizing an error measure, say \n(y(x) - y(x))2, as a way of ultimately enforcing the condition y(x) = y(x). In \nthe same way, a virtual example of the oddness hint can be learned by minimizing \n(y(x) + y(-x))2 as a way of ultimately enforcing the condition y(-x) = -y(x). \nThis involves inputting both x and -x to the network and minimizing the dif(cid:173)\nference between the two outputs. It is easy to show that this can be done using \nbackpropagation (Rumelhart et al, 1986) twice. \nThe generation of a virtual example of the hint does not require knowing the value \nof the target function; neither y(x) nor y( -x) is needed to compute the error for \nthe oddness hint. In fact, x and -x can be artificial inputs. The fact that we do not \nneed the value of the target function is crucial, since it was the limited resource of \nexamples for which we know the value of the target function that got us interested \nin hints in the first place. On the other hand, for some hints, we can take the \nexamples of the target function that we have, and employ the hint to duplicate \nthese examples. For instance, an example y(x) = 1 can be used to infer a second \nexample y( -x) = -1 using the oddness hint. Representing the hint by duplicate \nexamples is an easy way to try simple hints using the same software that we use for \nlearning from examples. \n\n\f416 \n\nYaser S. Abu-Mostafa \n\nLet us illustrate how to represent two common types of hints. Perhaps the most \ncommon type is the invariance hint. This hint asserts that i)(x) = i)(x/) for certain \npairs x, x'. For instance, \"i) is shift-invariant\" is formalized by the pairs x, x' that \nare shifted versions of each other. To represent the invariance hint, an invariant \npair (x, x') is picked as a virtual example. The error associated with this example \nis (y(x) - y(x/\u00bb2. Another related type of hint is the monotonicity hint. The hint \nasserts for certain pairs x, x' that i)(x) :5 i)(x/). For instance, \"i) is monotonically \nnon decreasing in x\" is formalized by the pairs x, x' such that x < x'. One application \nwhere the monotonicity hint occurs is the extension of personal credit. If person A \nis identical to person B except that A makes less money than B, then the approved \ncredit line for A cannot exceed that of B. To represent the monotonicity hint, a \nmonotonic pair (X,X/) is picked as a virtual example. The error associated with this \nexample is given by (y(x) - Y(X/\u00bb2 if y(x) > y(x/) and zero if y(x) :5 y(x'). \n\n4 FX TRADING \n\nWe applied the symmetry hint in the four FX markets of the U.S. Dollar versus \nthe British Pound, the German Mark, the Japanese Yen, and the Swiss Franc. In \neach case, only the closing prices for the preceding 21 days were used for inputs. \nThe objective (fitness) function we chose was the total return on the training set, \nand we used simple filtering methods on the inputs and outputs of the networks. \nIn each run, the training set consisted of 500 days, and the test was done on the \nfollowing 253 days. \nAll four currencies show an improved performance when the symmetry hint is used. \nRoughly speaking, we are in the market half the time, each trade takes 4 days, the \nhit rate is close to 50%, and the A.P.R. without hint is 5% and with hint is 10% (the \nreturns are annualized, unleveraged, and include the transaction cost; spread and \naverage slippage). Notice that having the return as the objective function resulted \nin a fairly good return with a modest hit rate. \n\n5 CROSS CHECKS \n\nIn this final section, we report more experimental results aimed at validating our \nclaim that the information content of the hint is the reason behind the improved \nperformance. Why is this debatable? A hint plays an incidental role as a constraint \non the neural network during learning, since it restricts the solutions the network \nmay settle in. Because overfitting is a common problem in learning from examples, \nany restriction may improve the out-of-sample performance by reducing overfitting \n(Akaike, 1969, Moody, 1992). This is the idea behind regularization. \nTo isolate the informative role from the regularizing role of the symmetry hint, we \nran two experiments. In the first experiment, we used an uninformative hint, or \n'noise' hint, which provides a random target output for the same inputs used in the \nexamples of the symmetry hint. Figure 4 contrasts the performance of the noise \nhint with that of the real symmetry hint, averaged over the four currencies. Notice \nthat the performance with the noise hint is close to that without any hint (figure \n1), which is consistent with the notion of uninformative hint. The regularization \neffect seems to be negligible. \n\n\fFinancial Applications of Learning from Hints \n\n417 \n\n10 ~--------~~--------~----------~----------~-----------n \n\n8 \n\n6 \n\n4 \n\n2 \n\no \n\n-2 ~ __________ ~ ________ ~ __________ ~ __________ ~ __________ -u \n\no \n\n50 \n\n100 \n\n1 50 \nTest Day Number \n\n200 \n\n250 \n\nFigure 4: Performance of the real hint versus a noise hint \n\n10 .------------r------------.------------r----------~r_----------~ \n\nI 3 a \n\nwith fa 1 s e ):l+tlt--:::=':(cid:173)\nwith JJi!trI' hint ---- . \n,,-.,,-\u2022\u2022 \n\n\"'\"~~.~.~.~.~.~.~.~~~~~~~.~.~.~ .\u2022. ~~~-.~~-.-. \u2022. -.-----.-'-.~-.-.~'.~ .\u2022 ~-------.-.-.-.-. ....... ... \n\n. .......... ..................................... . \n\nI .. '~ .. ~ .. -J .. -\n\n.. .. \n\n5 \n\n0 \n\n-5 \n\nc \n~ \n\" +> \n\nQ) \nIX \n\nQ) \"\" f\\1 \n\n+> \nC \nQ) \ntl \n~ \n\nQ) .,. \n\n-10 \n\n\"0 \nQ) \nN \n\n.-< \nf\\1 \n\n.... \n\" C \n~ \n\n-15 \n\n-20 ~--________ ~ __________ ~ __________ -L __________ ~ __________ --u \n\no \n\n50 \n\n100 \n\n150 \nTest Day Number \n\n200 \n\n2 50 \n\nFigure 5: Performance of the real hint versus a false hint \n\n\f418 \n\nYaser S. Abu-Mostafa \n\nIn the second experiment, we used a harmful hint, or 'false' hint, in place of the \nsymmetry hint. The hint takes the same examples used in the symmetry hint and \nasserts antisymmetry instead. Figure 5 contrasts the performance of the false hint \nwith that of the real symmetry hint. As we see, the false hint had a detrimental effect \non the performance. This is consistent with the hypothesis that the symmetry hint \nis valid, since its negation results in worse performance than no hint at all. Notice \nthat the transaction cost is taken into consideration in all of these plots, which \nworks as a negative bias and amplifies the losses of bad trading policies. \n\nCONCLUSION \n\nWe have explained learning from hints, a systematic method for combining rules and \ndata in the same learning process, and reported experimental results of a statistically \nsignificant improvement in performance in the four major FX markets that resulted \nfrom using a simple symmetry hint. We have described different types of hints and \nsimple ways of using them in learning, to enable the readers to try their own hints \nin different markets. \n\nAcknowledgements \n\nI would like to acknowledge Dr. Amir Atiya for his valuable input. I am grateful \nto Dr. Ayman Abu-Mostafa for his expert remarks. \n\nReferences \n\nAbu-Mostafa, Y. S. (1990), Learning from hints in neural networks, Journal of \nComplexity 6, pp. 192-198. \n\nAbu-Mostafa, Y. S. (1993), A method for learning from hints, Advances in Neu(cid:173)\nral Information Processing Systems 5, S. Hanson et al (eds), pp. 73-80, Morgan(cid:173)\nKaufmann. \n\nAbu-Mostafa, Y. S. et al (eds) (1994), Proceedings of Neural Networks in the Capital \nMarkets, Pasadena, California, November 1994. \n\nAkaike, H. (l969), Fitting autoregressive models for prediction, Ann. Inst. Stat. \nMath. 21, pp. 243-247. \n\nMoody, J. (1992), The effective number of parameters: An analysis of generalization \nand regularization in nonlinear learning systems, in Advances in Neural Information \nProcessing Systems 4, J. Moody et al (eds), pp. 847-854, Morgan Kaufmann. \n\nRumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986), Learning internal rep(cid:173)\nresentations by error propagation, Parallel Distributed Processing 1, D. Rumelhart \net al, pp. 318-362, MIT Press. \n\nWeigend, A., Rumelhart, D., and Huberman, B. (1991), Generalization by weight \nelimination with application to forecasting, in Advances in Neural Information Pro(cid:173)\ncessing Systems 3, R. Lippmann et al (eds), pp. 875-882, Morgan Kaufmann. \n\n\f", "award": [], "sourceid": 930, "authors": [{"given_name": "Yaser", "family_name": "Abu-Mostafa", "institution": null}]}