{"title": "Catching heuristics are optimal control policies", "book": "Advances in Neural Information Processing Systems", "page_first": 1426, "page_last": 1434, "abstract": "Two seemingly contradictory theories attempt to explain how humans move to intercept an airborne ball. One theory posits that humans predict the ball trajectory to optimally plan future actions; the other claims that, instead of performing such complicated computations, humans employ heuristics to reactively choose appropriate actions based on immediate visual feedback. In this paper, we show that interception strategies appearing to be heuristics can be understood as computational solutions to the optimal control problem faced by a ball-catching agent acting under uncertainty. Modeling catching as a continuous partially observable Markov decision process and employing stochastic optimal control theory, we discover that the four main heuristics described in the literature are optimal solutions if the catcher has sufficient time to continuously visually track the ball. Specifically, by varying model parameters such as noise, time to ground contact, and perceptual latency, we show that different strategies arise under different circumstances. The catcher's policy switches between generating reactive and predictive behavior based on the ratio of system to observation noise and the ratio between reaction time and task duration. Thus, we provide a rational account of human ball-catching behavior and a unifying explanation for seemingly contradictory theories of target interception on the basis of stochastic optimal control.", "full_text": "Catching heuristics are optimal control policies\n\nBoris Belousov*, Gerhard Neumann*, Constantin A. Rothkopf**, Jan Peters*\n\n*Department of Computer Science, TU Darmstadt\n\n**Cognitive Science Center & Department of Psychology, TU Darmstadt\n\nAbstract\n\nTwo seemingly contradictory theories attempt to explain how humans move to\nintercept an airborne ball. One theory posits that humans predict the ball trajectory\nto optimally plan future actions; the other claims that, instead of performing such\ncomplicated computations, humans employ heuristics to reactively choose appro-\npriate actions based on immediate visual feedback. In this paper, we show that\ninterception strategies appearing to be heuristics can be understood as computa-\ntional solutions to the optimal control problem faced by a ball-catching agent acting\nunder uncertainty. Modeling catching as a continuous partially observable Markov\ndecision process and employing stochastic optimal control theory, we discover\nthat the four main heuristics described in the literature are optimal solutions if the\ncatcher has suf\ufb01cient time to continuously visually track the ball. Speci\ufb01cally, by\nvarying model parameters such as noise, time to ground contact, and perceptual\nlatency, we show that different strategies arise under different circumstances. The\ncatcher\u2019s policy switches between generating reactive and predictive behavior based\non the ratio of system to observation noise and the ratio between reaction time\nand task duration. Thus, we provide a rational account of human ball-catching\nbehavior and a unifying explanation for seemingly contradictory theories of target\ninterception on the basis of stochastic optimal control.\n\nIntroduction\n\n1\nHumans exhibit impressive abilities of intercepting moving targets as exempli\ufb01ed in sports such as\nbaseball [6]. Despite the ubiquity of this visuomotor capability, explaining how humans manage to\ncatch \ufb02ying objects is a long-standing problem in cognitive science and human motor control. What\nmakes this problem computationally dif\ufb01cult for humans are the involved perceptual uncertainties,\nhigh sensory noise, and long action delays compared to arti\ufb01cial control systems and robots. Thus,\nunderstanding action generation in human ball interception from a computational point of view\nmay yield important insights on human visuomotor control. Surprisingly, there is no generally\naccepted model that explains empirical observations of human interception of airborne balls. McIn-\ntyre et al. [15] and Hayhoe et al. [13] claim that humans employ an internal model of the physical\nworld to predict where the ball will hit the ground and how to catch it. Such internal models allow for\nplanning and potentially optimal action generation, e.g., they enable optimal catching strategies where\nhumans predict the interception point and move there as fast as mechanically possible to await the ball.\nClearly, there exist situations where latencies of the catching task require such strategies (e.g., when\na catcher moves the arm to receive the pitcher\u2019s ball). By contrast, Gigerenzer & Brighton [11] argue\nthat the world is far too complex for suf\ufb01ciently precise modeling (e.g., a catcher or an out\ufb01elder\nin baseball would have to take air resistance, wind, and spin of the ball into account to predict its\ntrajectory). Thus, humans supposedly extract few simple but robust features that suf\ufb01ce for successful\nexecution of tasks such as catching. Here, immediate feedback is employed to guide action generation\ninstead of detailed modeling. Policies based on these features are called heuristics and the claim\nis that humans possess a bag of such tricks, the \u201cadaptive toolbox\u201d. For a baseball out\ufb01elder, a\nsuccessful heuristic could be \u201cFix your gaze on the ball, start running, and adjust your running speed\nso that the angle of gaze remains constant\u201d [10]. Thus, at the core, \ufb01nding a unifying computational\naccount of the human interception of moving targets also contributes to the long-lasting debate about\nthe nature of human rationality [20].\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fIn this paper, we propose that these seemingly contradictory views can be uni\ufb01ed using a single\ncomputational model based on a continuous partially observable Markov decison process model\n(POMDP). In this model, the intercepting agent is assumed to choose optimal actions that take\nuncertainty about future movement into account. This model prescribes that both the catcher and the\nout\ufb01elder act optimally for their respective situation and uncertainty. We show that an out\ufb01elder agent\nusing a highly stochastic internal model for prediction will indeed resort to purely reactive polices\nresembling established heuristics from the literature. The intuitive reason for such short-sighted\nbehavior being optimal is that ball predictions over suf\ufb01ciently long time horizons with highly\nstochastic models effectively become guessing. Similarly, our model will yield optimally planned\nactions based on predictions if the uncertainty encountered by the catcher agent is low while the\nlatency is non-negligible in comparison to the movement duration. Moreover, we identify catching\nscenarios where the only strategy to intercept the ball requires to turn away from it and run as fast as\npossible. While such strategies cannot be explained by the heuristics proposed so far, the optimal\ncontrol approach yields a plausible policy exhibiting both reactive and feedforward behavior. While\nother motor tasks (e.g., reaching movements [9, 22], locomotion [1]) have been explained in terms of\nstochastic optimal control theory, to the best of our knowledge this paper is the \ufb01rst to explain ball\ncatching within this computational framework. We show that the four previously described empirical\nheuristics are actually optimal control policies. Moreover, our approach allows predictions for settings\nthat cannot be explained by heuristics and have not been studied before. As catching behavior has\npreviously been described as a prime example of humans not following complex computations but\nusing simple heuristics, this study opens an important perspective on the fundamental question of\nhuman rationality.\n2 Related work\nA number of heuristics have been proposed to explain how humans catch balls, see [27, 8, 16] for an\noverview. We focus on three theories well-supported by experiments: Chapman\u2019s theory, the general-\nized optic acceleration cancellation (GOAC) theory, and the linear optical trajectory (LOT) theory.\nChapman [6] considered a simple kinematic problem (see Fig-\nure 1) where the ball B follows a parabolic trajectory B0:N\nwhile the agent C follows C0:N to intercept it. Only the position\nof the agent is relevant\u2014his gaze is always directed towards the\nball. Angle \u03b1 is the elevation angle; angle \u03b3 is the bearing angle\nwith respect to direction C0B0 (or C2G2 which is parallel). Due\nto delayed reaction, the agent starts running when the ball is\nalready in the air. Chapman proposed two heuristics, i.e., the\noptic acceleration cancellation (OAC) that prescribes maintain-\ning d tan \u03b1/dt = const, and the constant bearing angle (CBA),\nwhich requires \u03b3 = const. However, Chapman did not explain\nhow these heuristics cope with disturbances and observations.\nTo incorporate visual observations, McLeod et al. [16] intro-\nduced the \ufb01eld of view of the agent into Chapman\u2019s theory and\ncoupled the agent\u2019s running velocity to the location of the ball\nin the visual \ufb01eld. Instead of the CBA heuristic, a tracking\nheuristic is employed to form the generalized optic acceleration\ncancellation (GOAC) theory. This tracking heuristic allows re-\nactions to uncertain observations. In our example in Figure 1,\nFigure 1: Well-known heuristics.\nthe agent might have moved from C0 to C2 while maintaining a\nconstant \u03b3. To keep ful\ufb01lling this heuristic, the ball needs to arrive at B2 at the same time. However,\nif the ball is already at B(cid:48)\n2, the agent will see it falling into the right side of his \ufb01eld of view and he\nwill speed up. Thus, the agent internally tracks the angle \u03b4 between CD and C0B0 and attempts to\nadjust \u03b4 to \u03b3.\nIn Chapman\u2019s theory and the GOAC theory, the elevation angle \u03b1 and the bearing angle \u03b3 are\ncontrolled independently. As such, separate control strategies are implausible, therefore McBeath\net al. [14] proposed the linear optical trajectory (LOT) heuristic that controls both angles jointly.\nLOT suggests that the catching agent runs such that the projection of the ball trajectory onto the\nplane perpendicular to the direction CD remains linear, which implies that \u03b6 = \u2220E2B0F2 remains\nconstant. As tan \u03b6 = tan \u03b12/ tan \u03b22 can be observed from the pyramid B0F2C2E2 with the right\nangles at F2, there exist a coupling between the elevation angle \u03b1 and the horizontal optical angle \u03b2\n(de\ufb01ned as the angle between CB0 and CD), which can be used for directing the agent.\n\n2\n\n\fIn contrast to the literature on out\ufb01elder\u2019s catching in baseball, other strands of research in human\nmotor control have focused on predictive models [17] and optimality of behavior [9, 22]. Tasks\nsimilar to the catcher\u2019s in baseball have yielded evidence for prediction. Humans were shown to\nanticipate where a tennis ball will hit the \ufb02oor when thrown with a bounce [13], and humans also\nappear to use an internal model of gravity to estimate time-to-contact when catching balls [15].\nOptimal control theory has been used to explain reaching movements (with cost functions such\nas minimum-jerk [9], minimum-torque-change [23] and minimum end-point variance [12]), motor\ncoordination [22], and locomotion (as minimizing metabolic energy [1])\n3 Modeling ball catching under uncertainty as an optimal control problem\nTo parsimoniously model the catching agent, we rely on an optimal control formulation (Sec. 3.1)\nwhere the agent is described in terms of state-transitions, observations and a cost function (Sec. 3.2).\n3.1 Optimal control under uncertainty\nIn optimal control, the interaction of the agent with the environment is described by a stochastic\ndynamic model or system (e.g., describing ball \ufb02ight and odometry). The system\u2019s state\n\nxk+1 = f (xk, uk) + \u0001k+1,\n\nk = 0 . . . N \u2212 1,\n\nzk = h(xk) + \u03b4k,\n\nk = 1 . . . N,\n\n(1)\nat the next time step k + 1 is given as a noisy function of the state xk \u2208 Rn and the action uk \u2208 Rm\nat the current time step k. The mean state dynamics f are perturbed by zero-mean stationary white\nGaussian noise \u0001k \u223c N (0, Q) with a constant system noise covariance matrix Q modeling the\nuncertainty in the system (e.g., the uncertainty in the agent\u2019s and ball\u2019s positions).\nThe state of the system is not always fully observed (e.g., the catching agent can only observe a\nball when he looks at it), lower-dimensional than the system\u2019s state (e.g., only ball positions can\ndirectly be observed) and the observations are generally noisy (e.g., visuomotor noise affects ball\nposition estimates). Thus, at every time step k, sensory input provides a noisy lower-dimensional\nmeasurement zk \u2208 Rp of the true underlying system state xk \u2208 Rn with p < n described by\n(2)\nwhere h is a deterministic observation function and \u03b4k \u223c N (0, Rk) is zero-mean non-stationary\nwhite Gaussian noise with a state-dependent covariance matrix Rk = R(xk). For catching, such\nstate-dependency is crucial to modeling the effect of the human visual \ufb01eld. When the ball is\nat its center, measurements are least uncertain; whereas when the ball is outside the visual \ufb01eld,\nobservations are maximally uncertain.\nThe agent obviously can only generate actions based on the observations collected so far, while\naffecting his and the environment\u2019s true next state. The history of observations allows forming\nprobability distributions over the state at different time-steps called beliefs. Taking the uncertainty\nin (1) and (2) into account, the agent needs to plan and control in the belief space (i.e., the space of\nprobability distributions over states) rather than in the state space. We approximate belief bk about\nthe state of the system at time k by a Gaussian distribution with mean \u00b5k and variance \u03a3k. For\nbrevity, we write bk = (\u00b5k, \u03a3k), associating the belief with its suf\ufb01cient statistics. Belief dynamics\n(bk\u22121, uk\u22121, zk) \u2192 bk is approximated by the extended Kalman \ufb01lter [21, Chapter 3.3].\nA cost function J can be a parsimonious description of the agent\u2019s objective. The agent will choose\nthe next action by optimizing such a cost function with respect to all future actions at every time-step.\nTo make the resulting optimal control computations numerically tractable, future observations need\nto be assumed to coincide with their most likely values (see e.g., [19, 5]). Thus, at every time step,\nthe agent solves a constrained nonlinear optimization problem\nJ(\u00b50:N , \u03a30:N ; u0:N\u22121)\nuk \u2208 Ufeasible,\n\u00b5k \u2208 Xfeasible,\n\nwhich returns an optimal sequence of controls u0:N\u22121 minimizing the objective function J. The\nagent executes the \ufb01rst action, obtains a new observation, and replans again; such an approach is\nknown as model predictive control. The policy resulting from such computations is sub-optimal\nbecause of open-loop planning and limited time horizon, but with growing time horizon it approaches\nthe optimal policy. Reaction time \u03c4r can be incorporated by delaying the observations. An interesting\nproperty of this model is that the catching agent decides on his own in an optimal way when to gather\ninformation by looking at the ball and when to exploit already acquired knowledge depending on the\nlevel of uncertainty he agrees to tolerate.\n\nk = 0 . . . N \u2212 1,\nk = 0 . . . N,\n\nmin\nu0:N\u22121\ns.t.\n\n(3)\n\n3\n\n\f3.2 A computational model of the catching agent for belief-space optimal control\nHere we explain the modeling assumptions concerning states, actions, state transitions, and observa-\ntions. After that we describe the cost function that the agent has to minimize.\n\nStates and actions. The state of the system x consists of the location and velocity of the ball in\n3D space, the location and velocity of the catching agent in the ground plane, and the agent\u2019s gaze\ndirection represented by a unit 3D vector. The agent\u2019s actions u consist of the force applied to the\ncenter of mass and the rate of change of the gaze direction.\nState transitions and observations. Several model components are essential to faithfully describe\ncatching behavior. First, the state transfer is described by the damped dynamics of the agent\u2019s center\nof mass \u00a8rc = F \u2212 \u03bb \u02d9rc, where rc = [x, y] are the agent\u2019s Cartesian coordinates, F is the applied\nforce resulting from the agent\u2019s actions, and \u03bb is the damping coef\ufb01cient. Damping ensures that the\ncatching agent\u2019s velocity does not grow without bound when the maximum force is applied. The\nmagnitude of the maximal force and the friction coef\ufb01cient are chosen to \ufb01t Usain Bolt\u2019s sprint\ndata1. Second, the gaze vector\u2019s direction d is controlled through the \ufb01rst derivatives of the two\nangles that de\ufb01ne it. These are the angle between d and its projection onto the xy-plane and the\nangle between d\u2019s projection onto the xy-plane and the x-axis. Such parametrization of the actions\nallows for realistically fast changes of gaze direction. Third, the maximal running speed depends\non the gaze direction, e.g., running backwards is slower than running forward or even sideways.\nThis relationship can be incorporated through dependence of the maximal applicable force F max\non the direction d. It can be expressed by limiting the magnitude of the maximal applicable force\n|F max(\u03b8)| = F1 + F2 cos \u03b8, where \u03b8 is the angle between F (i.e., the direction into which the catcher\naccelerates) and the projection of the catcher\u2019s gaze direction d onto the xy-plane. The parameters F1\nand F2 are chosen to \ufb01t human data on forward and backwards running2. The resulting continuous\ntime dynamics of agent and ball are converted into discrete time state transfers using the classical\nRunga-Kutta method. Fourth, the observation uncertainty depends on the state, which re\ufb02ects the\nfact that humans\u2019 visual resolution falls off across the visual \ufb01eld with increasing distance from the\nfovea. When the ball falls to the side of the agent\u2019s \ufb01eld of view, the uncertainty about ball\u2019s position\ngrows according to \u03c32\nmin) depending on the distance to the ball s and\nthe angle \u2126 between gaze direction d and the vector pointing from the agent towards the ball. The\nparameters {\u03c3min, \u03c3max} control the scale of the noise. The ball is modeled as a parabolic \ufb02ight\nb .\nperturbed by Gaussian noise with variance \u03c32\nCost function. The catching agent has to trade-off success (i.e., catching the ball) with effort.\nIn other words, he aims at maximizing the probability of catching the ball with minimal effort. A\nball is assumed to be caught if it is within reach, i.e., not further away from the catching agent\nthan \u03b5threshold at the \ufb01nal time. Thus, the probability of catching the ball can be expressed as\nPr(|\u00b5b \u2212 \u00b5c| \u2264 \u03b5threshold), where \u00b5b and \u00b5c are the predicted positions of the ball and the agent at\nthe \ufb01nal time (i.e., parts of the belief state of the agent). Since such beliefs are modeled as Gaussians,\nthis probability has a unique global maximum at \u00b5b = \u00b5c and \u03a3N \u2192 0+. Therefore, a \ufb01nal cost\n2 + w1 tr \u03a3N can approximate the negated log-probability of successfully\nJ\ufb01nal = w0(cid:107)\u00b5b \u2212 \u00b5c(cid:107)2\ncatching the ball while rendering the optimal control problem solvable. The weights w0 and w1\nare set to optimally approximate this negated log-probability. The desire of the agent to be energy\nk M uk with the \ufb01xed\nduration \u03c4 of the discretized time steps and a diagonal weight matrix M to trade-off controls. Finally,\nwe add a term that penalizes agent\u2019s uncertainty at every time step Jrunning = \u03c4 w2\nk=0 tr \u03a3k that\nencodes the agent preference of certainty over uncertainty. It appears naturally in optimal control\nproblems when the maximum likelihood observations assumption is relaxed [24] and captures how\n\ufb01nal uncertainty distributes over the preceding time steps, but has to be added explicitly within\nthe model predictive control framework in order to account for replanning at every time step. The\ncomplete cost function is thus given by the sum\n(cid:125)\nJ = J\ufb01nal+Jrunning+Jenergy = w0(cid:107)\u00b5b\u2212\u00b5c(cid:107)2\n\nef\ufb01cient is encoded as a penalty on the control signals Jenergy = \u03c4(cid:80) N\u22121\n\nmax(1 \u2212 cos \u2126) + \u03c32\n\n(cid:80) N\u22121\n\n(cid:88)N\u22121\n(cid:123)(cid:122)\n\n(cid:88)N\u22121\n(cid:123)(cid:122)\n\nk=0 uT\n\nk M uk\n\n+ w1 tr \u03a3N\n\n2\n\n+ \u03c4 w2\n\nk=0 tr \u03a3k\n\n+ \u03c4\n\no = s(\u03c32\n\nk=0 uT\n\n(cid:123)(cid:122)\n\n(cid:124)\n\n(cid:123)(cid:122)\n\n(cid:125)\n\n(cid:125)\n\n(cid:124)\n\n(cid:124)\n\n\ufb01nal position\n\n\ufb01nal uncertainity\n\nrunning uncertainty\n\ntotal energy\n\nthat the catching agent has to minimize in order to successfully intercept the ball.\n\n1Usain Bolt\u2019s world record sprint data http://datagenetics.com/blog/july32013/index.html\n2World records for backwards running http://www.recordholders.org/en/list/backwards-running.html\n\n4\n\n(cid:124)\n\n(cid:125)\n\n\fImplementation details\n\n3.3\nTo solve Problem (3), we use the covariance-free multiple shooting method [18] for trajectory\noptimization [7, 3] in the belief space. Derivatives of the cost function are computed using CasADi [2].\nNon-linear optimization is carried out by Ipopt [26]. L-BFGS and warm-starts used.\n4 Simulated experiments and results\nIn this section, we present the results of two simulated scenarios and a comparative evaluation. First,\nusing the optimal control approach, we show that continuous tracking (where the ball always remains\nin the \ufb01eld of view of the out\ufb01elder) naturally leads to the heuristics from literature [6, 16, 14] if\nthe catching agent is suf\ufb01ciently fast in comparison to the ball independent of whether he is running\nforward, backwards, or sideways. Subsequently, we show that more complex behavior arises when\nthe ball is too fast to be caught while running only sideways or backwards (e.g., as in soccer or long\npasses in American football). Here, tracking is interrupted as the agent needs to turn away from the\nball to run forward. While the heuristics break, our optimal control formulation exhibits plausible\nstrategies similar to those employed by human catchers. Finally, we systematically study the effects\nof noise and time delay onto the agent\u2019s policy. The optimal control policies arising from our model\nswitch between reactive and predictive behaviors depending on uncertainty and latency.\n\n[16].\n\n4.1 Continuous tracking of an out\ufb01elder\u2014heuristics hold\nTo directly compare our model against empirical catching data that has been described as\nresulting from a heuristic, we reproduce the settings from [16] where a ball \ufb02ew 15 m in\n3 s and a human subject starting about 6 m away from the impact point had to intercept it.\nThe optimal control\npolicy can deal with\nsuch situations and\nyields the behavior\nobserved by McLeod\net al.\nIn\nfact, even when dou-\nbling all distances\nthe reactive control\npolicy exhibits all\nfour major heuristics\n(OAC, GOAC, CBA\nFigure 2: A typical simulated trajectory of a successful catch in the contin-\nand LOT) with ap-\nuous tracking scenario as encountered by the out\ufb01elder. The uncertainty in\nproximately the same\nthe belief state is kept low by the agent by \ufb01xating the ball. Such empirically\nprecision as in the\nobserved scenarios [6, 16, 14] have led to the proposition of the heuristics\noriginal human exper-\nwhich arise naturally from our optimal control formulation.\niments.\nFigure 2\nshows a typical simulated catch viewed from above. The ball and the agent\u2019s true trajectories\nare depicted in green (note that the ball is frequently hidden behind the belief state trajectory). The\nagent\u2019s observations and the mean belief trajectory of the ball are represented by magenta crosses and\na magenta line, respectively. The belief uncertainty is indicated by the cyan ellipsoids that capture\n95% of the probability mass. The gaze vectors of the agent are shown as red arrows. The catching\nagent starts suf\ufb01ciently close to the interception point to continuously visually track the ball, therefore\nhe is able to ef\ufb01ciently reduce his uncertainty on the ball\u2019s position and successfully intercept it\nwhile keeping it in sight. Note that the agent does not follow a straight trajectory but a curved one in\nagreement with human experiments [16].\nFigure 3 shows plots of the relevant angles over time to compare the behavior exhibited by human\ncatchers to the optimal catching policy. The tangent of the elevation angle tan \u03b1 grows linearly\nwith time, as predicted by the optic acceleration cancellation heuristic (OAC). The bearing angle\n\u03b3 remains constant (within a 5 deg margin) as predicted by the constant bearing angle heuristic\n(CBA). The rotation angle \u03b4 oscillates around \u03b3 as predicted by the generalized optic acceleration\ncancellation theory (GOAC). The tangent of the horizontal optical angle tan \u03b2 is proportional to\ntan \u03b1, as predicted by the linear optical trajectory theory (LOT). The small oscillations in the rotation\nangle and in the horizontal optical angle are due to reaction delay and uncertainty; they are also\npredicted by GOAC and LOT. Thus, in this well-studied case, the model produces an optimal policy\nthat exhibits behavior which is fully in accordance with the heuristics.\n\n5\n\n05101520253035Distancex[m]0246810Distancey[m]Catcher\u2019strajectoryCatcher\u2019sgazeBalltrajectoryObservedballtrajectoryBelieftrajectory,meanBelieftrajectory,covariance\fInterrupted tracking during long passes\u2014heuristics break but prediction is required\n\nFigure 3: During simulations of successful catches for the continuous tracking scenario encountered\nby the out\ufb01elder (shown in Figure 2), the policies resulting from our optimal control formulation\nalways ful\ufb01ll the heuristics (OAC, GOAC, CBA, and LOT) from literature with approximately the\nsame precision as in the original human experiments.\n4.2\nThe competing theory to the heuristics claims that a predictive internal model allows humans to\nintercept the ball [15, 13]. Brancazio [4] points out that \"the best out\ufb01elders can even turn their\nbacks to the ball, run to the landing point, and then turn and wait for the ball to arrive\". Similar\nbehavior is observed in football and american football during long passes. To see whether predictions\nbecome necessary, we reproduced situations where the agent cannot catch the ball when acting purely\nreactively. For example, if the running time to interception point when running backwards (i.e.,\nthe ratio between the distance to\nthe interception point divided by\nthe maximal backwards running\nvelocity) is substantially higher\nthan the \ufb02ight time of the ball,\nno backwards running strategy\nwill be successful. Thus, by\nvarying the initial conditions for\nthe catching agent and the ball,\nnew scenarios can be generated\nusing our optimal control model.\nThe agent\u2019s control policy can\nbe tested on reliance on predic-\ntions as it is available in form\nFigure 4: An interception plan that leads to successful catch despite\nof a computational model, i.e.,\nviolating heuristics. Here, the agent would not be able to reach the\nif the computed policy makes\ninterception point in time while running backwards and, thus, has\nuse of the belief states on future\nto turn forward to run faster. The resulting optimal control policy\ntime steps, the agent clearly em-\nrelies on beliefs on the future generated by an internal model.\nploys an internal model to pre-\ndict the interception point. By choosing appropriate initial conditions for the ball and the agent,\nwe can pursue such scenarios. For example, if the ball \ufb02ies over the agent\u2019s head, he has to turn\n\n6\n\n0.00.51.01.52.02.53.0Timet[sec]0.00.20.40.60.81.0Tangentofelevationangletan\u03b1Opticaccelerationcancellation(OAC)SimulationLinear\ufb01t0.00.51.01.52.02.53.0Timet[sec]\u221230\u221220\u2212100102030Angle[deg]Trackingheuristic(partofGOAC)Rotationangle\u03b4Bearingangle\u03b30.00.51.01.52.02.53.0Timet[sec]\u221250510152025Bearingangle\u03b3[deg]Constantbearingangle(CBA)SimulationConstant\ufb01t0.000.250.500.751.001.251.50Tangentofelevationangletan\u03b10.000.050.100.150.200.250.30Tangentofhor.opt.angletan\u03b2Linearopticaltrajectory(LOT)Linear\ufb01tSimulation05101520253035Distancex[m]051015Distancey[m]PlanPosteriorPriorPrior+posteriorCatcher\u2019sgaze\fFigure 5: For initial conditions (positions of the ball and the agent) which do not allow the agent\nto reach the interception point by running backwards or sideways, the optimal policy will include\nrunning forward with maximal velocity (as shown in Figure 4). In this case, the agent cannot\ncontinuously visually track the ball and, expectedly, the heuristics do not hold.\n\naway from it for a moment in order to gain speed by running forward, instead of running backwards\nor sideways and looking at the ball all the time. Figure 4 shows such an interception plan where\nthe agent decides to initially speed up and, when suf\ufb01ciently close, turn around and track the ball\nwhile running sideways. Notice that the future belief uncertainty (i.e., the posterior uncertainty \u03a3\nreturned by the extended Kalman \ufb01lter), represented by red ellipses, grows when the catcher is not\nlooking at the ball and shrinks otherwise. The prior uncertainty (obtained by integrating out future\nobservations), shown in yellow, on the other hand, grows towards the end of the trajectory because\nfuture observations are not available at planning time. Similar to [5, 25], we can show for our model\npredictive control law that the sum of prior and posterior uncertainties (shown as green circles)\nequals the total system uncertainty obtained by propagating the belief state into the future without\nincorporating future observations. Figure 5 shows that the heuristics fail to explain this catch\u2014even\nin the \ufb01nal time steps where the catching agent is tracking the ball to intercept it. OAC deviates from\nlinearity, CBA is not constant, the tracking heuristic wildly deviates from the prediction, and LOT\nis highly non-linear. GOAC and LOT are affected more dramatically because they directly depend\non the catcher\u2019s gaze, in contrast to OAC and CBA. Since the heuristics were not meant to describe\nsuch situations, they predictably do not hold. Only an internal model can explain the reliance of the\noptimal policy on the future belief states.\n4.3 Switching behaviors when uncertainty and reaction time are varied\nThe previous experiment has pointed us towards policies that switch between predictive subpolicies\nbased on internal models and reactive policies based on current observations. To systematically\nstudy what behaviors arise, we use the scenario from Section 4.2 and vary two essential model\nparameters: system to observation noise ratio \u03b71 = log \u03c32\no and reaction time to task duration\nratio \u03b72 = \u03c4r/T , where T is the duration of the ball \ufb02ight. The system to observation noise\nratio effectively determines whether predictions based on the internal model of the dynamics are\nsuf\ufb01ciently trustworthy for (partially) open-loop behavior or whether reactive control based on\nthe observations of the current state of the system should be preferred. The reaction time to task\nduration ratio sets the time scale of the problem. For example, an out\ufb01elder in baseball may have\nabout 3 s to catch a ball and his reaction delay of about 200 ms is negligible, whereas a catcher in\nbaseball often has to act within a fraction of a second, and, thus, the reaction latency becomes crucial.\n\nb /\u03c32\n\n7\n\n0.00.51.01.52.02.53.0Timet[sec]0.00.20.40.60.81.01.21.41.6Tangentofelevationangletan\u03b1Opticaccelerationcancellation(OAC)SimulationLinear\ufb01t0.00.51.01.52.02.53.0Timet[sec]020406080100120140Angle[deg]Trackingheuristic(partofGOAC)Rotationangle\u03b4Bearingangle\u03b30.00.51.01.52.02.53.0Timet[sec]\u221250510152025Bearingangle\u03b3[deg]Constantbearingangle(CBA)SimulationConstant\ufb01t0.00.51.01.52.02.53.03.54.0Tangentofelevationangletan\u03b1\u221230\u221225\u221220\u221215\u221210\u2212505Tangentofhor.opt.angletan\u03b2Linearopticaltrajectory(LOT)Linear\ufb01tSimulation\fWe run the experiment at different noise lev-\nels and time delays and average the results\nover 10 trials. In all cases, the agent starts at\nthe point (20, 5) looking towards the origin,\nwhile the ball \ufb02ies from the origin towards the\npoint (30, 15) in 3 s. All parameters are kept\n\ufb01xed apart from the reaction time and system\nnoise; in particular, task duration and observa-\ntion noise are kept \ufb01xed. Figure 6 shows how\nthe agent\u2019s policy depends on the parameters.\nBoundaries correspond to contour lines of the\nfunction that equals number of times the agent\nturns towards the ball. We count turns by an-\nalyzing trajectories for gaze direction changes\nand reduction of uncertainty (e.g., in Figure 4\nFigure 6: Switches between reactive and feedforward\nthe agent turns once towards the ball). When\npolicies are determined by uncertainties and latency.\nreaction delays are long and predictions are\nreliable, the agent turns towards the interception points and runs as fast as he can (purely predictive\nstrategies; lower right corner in Figure 6). When predictions are not suf\ufb01ciently trustworthy, the\nagent has to switch multiple times between a reactive policy to gather information and a predictive\nfeedforward strategy to successfully ful\ufb01ll the task (upper left corner). When reaction time and\nsystem noise become suf\ufb01ciently large, the agent fails to intercept the ball (upper right grayed out\narea). Thus, seemingly substantially different behaviors can be explained by means of a single model.\nNote that in this \ufb01gure a purely reactive strategy (as required for only using the heuristics) is not\npossible. However, if different initial conditions enabling the purely reactive strategy are used, the\nupper left corner is dominated by the purely reactive strategy.\n5 Discussion and conclusion\nWe have presented a computational model of human interception of a moving target, such as an\nairborne ball, in form of a continuous state-action partially observable Markov decision problem.\nDepending on initial conditions, the optimal control solver either generates continuously tracking\nbehavior or dictates the catching agent to turn away from the ball in order to speed up. Interception\ntrajectories in the \ufb01rst case turn out to demonstrate all properties that were previously taken as\nevidence that humans avoid complex computations by employing simple heuristics. In the second\ncase, we have shown that different regimes of switches between reactive and predictive behavior\narise depending on relative uncertainty and latency. When the agent has suf\ufb01cient time to gather\nobservations (bottom-left in Figure 6), he turns towards the ball as soon as possible and continuously\ntracks it till the end (e.g., out\ufb01elder in baseball acts in this regime). If he is con\ufb01dent in the interception\npoint prediction but the task duration is so short relative to the latency that he does not have suf\ufb01cient\ntime to gather observations (bottom-right), he will rely entirely on the internal model (e.g., catcher\nin baseball may act in this regime). If the agent\u2019s interception point prediction is rather uncertain\n(e.g., due to system noise), the agent will gather observations more often regardless of time delays.\nConclusions regarding the trade-off between reactive and predictive behaviors may well generalize\nbeyond ball catching to various motor skills. Assuming an agent has an internal model of a task and\ngets noisy delayed partial observations, he has to tolerate a certain level of uncertainty; if moreover\nthe agent has a limited time to perform the task, he is compelled to act based on prediction instead\nof observations. As our optimal control policy can explain both reactive heuristics and predictive\nfeedforward strategies, as well as switches between these two kinds of subpolicies, it can be viewed\nas a unifying explanation for the two seemingly contradictory theories of target interception.\nIn this paper, we have provided a computational level explanation for a range of observed human\nbehaviors in ball catching. Importantly, while previous interpretations of whether human catching\nbehavior is the result of complex computations or the result of simple heuristics have been inconclu-\nsive, here we have demonstrated that what looks like simple rules of thumb from a bag of tricks is\nactually the optimal solution to a continuous partially observable Markov decision problem. This\nresult therefore fundamentally contributes to our understanding of human rationality.\nAcknowledgements\nThis project has received funding from the European Union\u2019s Horizon 2020 research and innovation\nprogramme under grant agreement No 640554.\n\n8\n\n\fReferences\n[1] F. C. Anderson and M. G. Pandy. Dynamic optimization of human walking. Journal of\n\nbiomechanical engineering, 123(5):381\u2013390, 2001.\n\n[2] J. Andersson, J. \u00c5kesson, and M. Diehl. CasADi: A symbolic package for automatic differenti-\nation and optimal control. In Recent Advances in Algorithmic Differentiation, pages 297\u2013307.\nSpringer, 2012.\n\n[3] J. T. Betts. Survey of Numerical Methods for Trajectory Optimization. Journal of Guidance,\n\nControl, and Dynamics, 21(2):193\u2013207, 1998.\n\n[4] P. J. Brancazio. Looking into Chapman\u2019s homer: The physics of judging a \ufb02y ball. American\n\nJournal of Physics, 53(9):849, 1985.\n\n[5] A. Bry and N. Roy. Rapidly-exploring random belief trees for motion planning under uncertainty.\n\nProceedings - IEEE ICRA, pages 723\u2013730, 2011.\n\n[6] S. Chapman. Catching a baseball. American Journal of Physics, 36(10):868, 1968.\n[7] M. Diehl, H. G. Bock, H. Diedam, and P. B. Wieber. Fast direct multiple shooting algorithms\nfor optimal robot control. In Lecture Notes in Control and Information Sciences, volume 340,\npages 65\u201393, 2006.\n\n[8] P. W. Fink, P. S. Foo, and W. H. Warren. Catching \ufb02y balls in virtual reality: a critical test of\n\nthe out\ufb01elder problem. Journal of vision, 9(13):1\u20138, 2009.\n\n[9] T. Flash and N. Hogan. The coordination of arm movements: an experimentally con\ufb01rmed\n\nmathematical model. The Journal of Neuroscience, 5(7):1688\u20131703, 1985.\n\n[10] G. Gigerenzer. Gut feelings: The intelligence of the unconscious. Penguin, 2007.\n[11] G. Gigerenzer and H. Brighton. Homo Heuristicus: Why Biased Minds Make Better Inferences.\n\nTopics in Cognitive Science, 1(1):107\u2013143, 2009.\n\n[12] C. M. Harris and D. M. Wolpert. Signal-dependent noise determines motor planning. Nature,\n\n394(6695):780\u20134, 1998.\n\n[13] M. M. Hayhoe, N. Mennie, K. Gorgos, J. Semrau, and B. Sullivan. The role of prediction in\n\ncatching balls. Journal of Vision, 4(8):156\u2013156, 2004.\n\n[14] M. McBeath, D. Shaffer, and M. Kaiser. How baseball out\ufb01elders determine where to run to\n\ncatch \ufb02y balls. Science, 268(5210):569\u2013573, 1995.\n\n[15] J. McIntyre, M. Zago, A. Berthoz, and F. Lacquaniti. Does the brain model Newton\u2019s laws?\n\nNature neuroscience, 4(7):693\u2013694, 2001.\n\n[16] P. McLeod, N. Reed, and Z. Dienes. The generalized optic acceleration cancellation theory of\ncatching. Journal of experimental psychology. Human perception and performance, 32(1):139\u2013\n48, 2006.\n\n[17] R. C. Miall and D. M. Wolpert. Forward models for physiological motor control, 1996.\n[18] S. Patil, G. Kahn, M. Laskey, and J. Schulman. Scaling up Gaussian Belief Space Planning\nthrough Covariance-Free Trajectory Optimization and Automatic Differentiation. Algorithmic\nFoundations of Robotics XI, pages 515\u2013533, 2015.\n\n[19] R. Platt, R. Tedrake, L. Kaelbling, and T. Lozano-Perez. Belief space planning assuming\n\nmaximum likelihood observations. Robotics: Science and Systems, 2010.\n\n[20] H. A. Simon. A Behavioral Model of Rational Choice. The Quarterly Journal of Economics,\n\n69(1):99\u2013118, 1955.\n\n[21] S. Thrun, W. Burgard, and D. Fox. Probabilistic robotics. 2005.\n[22] E. Todorov and M. I. Jordan. Optimal feedback control as a theory of motor coordination.\n\nNature neuroscience, 5(11):1226\u20131235, 2002.\n\n[23] Y. Uno, M. Kawato, and R. Suzuki. Formation and control of optimal trajectory in human\nmultijoint arm movement. Minimum torque-change model. Biological cybernetics, 61(2):89\u2013\n101, 1989.\n\n[24] J. van den Berg, S. Patil, and R. Alterovitz. Motion planning under uncertainty using iterative\nlocal optimization in belief space. The International Journal of Robotics Research, 31(11):1263\u2013\n1278, 2012.\n\n[25] M. P. Vitus and C. J. Tomlin. Closed-loop belief space planning for linear, Gaussian systems.\n\nIn Proceedings - IEEE ICRA, pages 2152\u20132159, 2011.\n\n[26] A. W\u00e4chter and L. T. Biegler. On the Implementation of a Primal-Dual Interior Point Filter\nLine Search Algorithm for Large-Scale Nonlinear Programming. Mathematical Programming,\n106:25\u201357, 2006.\n\n[27] M. Zago, J. McIntyre, P. Senot, and F. Lacquaniti. Visuo-motor coordination and internal\n\nmodels for object interception, 2009.\n\n9\n\n\f", "award": [], "sourceid": 816, "authors": [{"given_name": "Boris", "family_name": "Belousov", "institution": "TU Darmstadt"}, {"given_name": "Gerhard", "family_name": "Neumann", "institution": "University of Lincoln"}, {"given_name": "Constantin", "family_name": "Rothkopf", "institution": "TU Darmstadt"}, {"given_name": "Jan", "family_name": "Peters", "institution": "TU Darmstadt & MPI Intelligent Systems"}]}