{"title": "A Neural Oscillator Model of Auditory Selective Attention", "book": "Advances in Neural Information Processing Systems", "page_first": 1213, "page_last": 1220, "abstract": null, "full_text": "A Neural Oscillator Model of Auditory \n\nSelective Attention\n\nDepartment of Computer Science, University of Sheffield, Regent Court, \n\nStuart N. Wrigley and Guy J. Brown\n\n211 Portobello Street, Sheffield S1 4DP, UK.\ns.wrigley@dcs.shef.ac.uk, g.brown@dcs.shef.ac.uk\n\nAbstract\n\nA model of auditory grouping is described in which auditory attention\nplays a key role. The model is based upon an oscillatory correlation\nframework, in which neural oscillators representing a single perceptual\nstream are synchronised, and are desynchronised from oscillators\nrepresenting other streams. The model suggests a mechanism by which\nattention can be directed to the high or low tones in a repeating sequence\nof tones with alternating frequencies. In addition, it simulates the\nperceptual segregation of a mistuned harmonic from a complex tone.\n\n1\n\nIntroduction\n\nIn virtually all listening situations, we are exposed to a mixture of sound energy from\nmultiple sources. Hence, the auditory system must separate an acoustic mixture in order to\ncreate a perceptual description of each sound source. It has been proposed that this process\nof auditory scene analysis (ASA) [2] takes place in two conceptual stages: segmentation\nin which the acoustic mixture is separated into its constituent \u2018atomic\u2019 units, followed by\ngrouping in which units that are likely to have arisen from the same source are\nrecombined. The perceptual \u2018object\u2019 produced by auditory grouping is called a stream.\nEach stream describes a single sound source.\nFew studies have investigated the role of attention in ASA; typically, ASA is seen as a\nprecursor to attentional mechanisms, which simply select one stream as the attentional\nfocus. Recently, however, it has been suggested that attention plays a much more\nprominent role in ASA. Carlyon et al. [4] investigated how attention influences auditory\ngrouping with the use of a rapidly repeating sequence of high and low tones. It is known\nthat high frequency separations and/or high presentation rates encourage the high tones\nand low tones to form separate streams, a phenomenon known as auditory streaming [2].\nCarlyon et al. demonstrated that auditory streaming did not occur when listeners attended\nto an alternative stimulus presented simultaneously. However, when they were instructed\nto attend to the tone sequence, auditory streaming occurred as normal. From this, it was\nconcluded that attention is required for stream formation and not only for stream selection.\nIt has been proposed that attention can be divided into two different levels [9]: low-level\nexogenous attention which groups acoustic elements to form streams, and a higher-level\nendogenous mechanism which performs stream selection. Exogenous attention may over-\nrule conscious (endogenous) selection (e.g. in response to a sudden loud bang). The work\npresented here incorporates these two types of attention into a model of auditory grouping\n(Figure 1). The model is based upon the oscillatory correlation theory [10], which\nsuggests that neural oscillations encode auditory grouping. Oscillators corresponding to\ngrouped auditory elements are synchronised, and are desynchronised from oscillators\nencoding other groups. This theory is supported by neurobiological findings that report\n\n\fSignal\n\nCorrelogram\n\nALI\n\nAttentional\n\nStream\n\nCochlear\nFiltering\n\nHair\ncell\n\nCross\nChannel\n\nCorrelation\n\nNeural\n\nOscillator\nNetwork\n\nFigure 1: Schematic diagram of the model (the attentional leaky integrator is labelled ALI).\n\nsynchronised oscillations in the auditory system [6]. Within the oscillatory correlation\nframework, attentional selection can be implemented by synchronising attentional activity\nwith the stream of interest. \n2 The model\n\n2.1 Auditory periphery\n\nCochlear filtering is modelled by a bank of 128 gammatone filters with centre frequencies\nequally spaced on the equivalent rectangular bandwidth (ERB) scale between 50 Hz and 2.5\nkHz [3]. Auditory nerve firing rate is approximated by half-wave rectifying and square root\ncompressing the output of each filter. Input to the model is sampled at a rate of 8 kHz.\n2.2 Pitch and harmonicity analysis\n\nIt is known that a difference in fundamental frequency (F0) can assist the perceptual\nsegregation of complex sounds [2]. Accordingly, the second stage of the model extracts pitch\ninformation from the simulated auditory nerve responses. This is achieved by computing the\nautocorrelation of the activity in each channel to form a correlogram [3]. At time t, the\nautocorrelation of channel i with lag t is given by:\n\nA i t t\n,\n(\n\n,\n\n)\n\nP 1\u2013\n\n(cid:229)=\n\nk\n\n0=\n\nk\u2013,(\nr i t\n\n)r i t\n,(\n\nk\u2013\n\nt\u2013\n\n)w k( )\n\n(1)\n\nHere, r is the auditory nerve activity. The autocorrelation for channel i is computed using a 25\nms rectangular window w (P = 200) with lag steps equal to the sampling period, up to a\nmaximum lag of 20 ms. When the correlogram is summed across frequency, the resulting\n\u2018summary correlogram\u2019 exhibits a large peak at the lag corresponding to the fundamental\nperiod of the stimulus. An accurate estimate of the F0 is found by fitting a parabolic curve to\nthe three samples centred on the summary peak. \nThe correlogram may also be used to identify formant and harmonic regions due to their\nsimilar patterns of periodicity [11]. This is achieved by computing the correlations between\nadjacent channels of the correlogram as follows:\n\nL 1\u2013\n\nC i( )\n\n=\n\n1\n---\nL\n\nA\u02c6 i t t\n,\n(\n\n,\n\n) A\u02c6 i\n\n1 t t\n,\n,+(\n\n)\n\n(2)\n\n0=\n)\nA\u02c6 i t t\n,\n(\n\n,\n\n is the autocorrelation function of (1) which has been normalised to have zero\n\nHere, \nmean and unity variance; L is the maximum autocorrelation lag in samples (L = 160).\n2.3 Neural oscillator network\n\nThe network consists of 128 oscillators and is based upon the two-dimensional locally\nexcitatory globally inhibitory oscillator network (LEGION) of Wang [10], [11]. Within\n\nt\n(cid:229)\n\fLEGION, oscillators are synchronised by placing local excitatory links between them.\nAdditionally, a global inhibitor receives excitation from each oscillator, and inhibits every\noscillator in the network. This ensures that only one block of synchronised oscillators can be\nactive at any one time. Hence, separate blocks of synchronised oscillators - which correspond\nto the notion of a segment in ASA - arise through the action of local excitation and global\ninhibition.\nThe model described here differs from Wang\u2019s approach [10] in three respects. Firstly, the\nnetwork is one-dimensional rather than two-dimensional; we argue that this is more plausible.\nSecondly, excitatory links can be global as well as local; this allows harmonically-related\nsegments to be grouped. Finally, we introduce an attentional leaky integrator (ALI), which\nselects one block of oscillators to become the attentional stream (i.e., the stream which is in the\nattentional foreground).\nThe building block of the network is a single oscillator, which consists of a reciprocally\nconnected excitatory unit and inhibitory unit whose activities are represented by x and y\nrespectively:\n\nx\u00b7\n\n=\n\n3x\n\nx3\u2013\n\n+\n\n2\n\ny\u2013\n\n+\n\nIo\n\n(3a)\n\n1\n\n+\n\n=\n\ny\u2013\n\nx\ny\u00b7\ntanh\nb---\nHere, e , g and b\n are parameters. Oscillations are stimulus dependent; they are only observed\nwhen Io > 0, which corresponds to a periodic solution to (3) in which the oscillator cycles\nbetween an \u2018active\u2019 phase and a \u2018silent\u2019 phase. The system may be regarded as a model for the\nbehaviour of a single neuron, or as a mean field approximation to a group of connected\nneurons. The input Io to oscillator i is a combination of three factors: external input Ir ,\nnetwork activity and global inhibition as follows:\n\n(3b)\n\nIo\n\n=\n\nIr WzS z q\n,(\n\u2013\n\n(cid:229)+\n\n)\n\nz\n\n(\n\nWikS xk\n\n,\n\n)\n\nx\n\nk\n\ni\n\n(4)\n\nHere, Wik is the connection strength between oscillators i and k; xk is the activity of oscillator\nk. The parameter q x is a threshold above which an oscillator can affect others in the network\nand Wz is the weight of inhibition from the global inhibitor z. Similar to q x, q z acts a threshold\nabove which the global inhibitor can affect an oscillator. S is a squashing function which\ncompresses oscillator activity to be within a certain range:\n\nS n q\n,(\n\n)\n\n=\n\n1\n\u2013+\n\n-------------------------------\n)\n1\n\ne K n\n\nq\u2013(\n\n(5)\n\nHere, K determines the sharpness of the sigmoidal function. The activity of the global inhibitor\nis defined as\n\nz\u00b7\n\n=\n\nH\n\n(\nS xk\n\n,\n\n) 0.1\u2013\n\nx\n\nz\u2013\n\nk\n\nwhere H is the Heaviside function (H(n) = 1 for n \u2021 0, zero otherwise).\n2.3.1 Segmentation\n\n(6)\n\nA block of channels are deemed to constitute a segment if the cross-channel correlation (2) is\ngreater than 0.3 for every channel in the block. Cross-correlations are weighted by the energy\nof each channel in order to increase the contrast between spectral peaks and spectral dips.\nThese segments are encoded by a binary mask, which is unity when a channel contributes to a\nsegment and zero otherwise. To improve the resolution and separation of adjacent segments,\n\ne\ng\n\u0141\n\u0142\n(cid:230)\n(cid:246)\nq\n\u201e\nq\n(cid:229)\n\u0141\n\u0142\n(cid:231)\n(cid:247)\n(cid:230)\n(cid:246)\n\fthe cross-frequency spread of a segment is restricted to 3 channels. Oscillators within a\nsegment are synchronized by excitatory connections. The external input (Ir) of an oscillator\nwhose channel is a member of a segment is set to Ihigh otherwise it is set to Ilow.\n2.3.2 Harmonicity grouping\n\nExcitatory connections are made between segments if they are consistent with the current F0\nestimate. A segment is classed as consistent with the F0 if a majority of its corresponding\ncorrelogram channels exhibit a significant peak at the fundamental period (ratio of peak height\nto channel energy greater than 0.46). A single connection is made between the centres of\nharmonically related segments subject to old-plus-new constraints.\nThe old-plus-new heuristic [2] refers to the auditory system\u2019s preference to \u2018interpret any part\nof a current group of acoustic components as a continuation of a sound that just occurred\u2019 .\nThis is incorporated into the model by attaching \u2018age trackers\u2019 to each channel of the network.\nExcitatory links are placed between harmonically related segments only if the two segments\nare of similar age. The age trackers are leaky integrators:\nB\u00b7\n\u2013[\n]+\n(\n(7)\nd g Mk Bk\u2013\n\u2013\n1 H Mk Bk\u2013\nk\nHere, [n]+ = n if n \u2021\n 0 and [n]+ = 0 otherwise. Mk is the (binary) value of the segment mask at\nchannel k; small values of c and d result in a slow rise (d) and slow decay (c) for the integrator.\ng is a gain factor. \nConsider two segments that start at the same time; the age trackers for their constituent\nchannels receive the same input, so the values of Bk will be the same. However, if two\nsegments start at different times, the age trackers for the earlier segment will have already\nincreased to a non-zero value when the second segment starts. This \u2018age difference\u2019 will\ndissipate over time, as the values of both sets of leaky integrators approach unity.\n2.3.3 Attentional leaky integrator (ALI)\n\n]cBk\n\n=\n\n)\n\n)\n\n[\n\n(\n\nEach oscillator is connected to the attentional leaky integrator (ALI) by excitatory links; the\nstrength of these connections is modulated by endogenous attention. Input to the ALI is given\nby:\n\n\u00b7\nali\n\n=\n\nH\n\n(\nS xk\n\n,\n\n)Tk\n\nx\n\n\u2013\n\n\u2013\n\nali\n\nALI\n\n(8)\n\nk\n\nq ALI is a threshold above which network activity can influence the ALI. Tk is an attentional\nweighting which is related to the endogenous interest at frequency k:\n\n1 Ak\u2013(\n\n)L\n\n(\n\n)\n\n=\n\n1\n\n\u2013\n\n=\n\n(\na b R L\u2013[\n\nTk\nHere, Ak is the endogenous interest at frequency k and L is the leaky integrator defined as:\nL\u00b7\n(10)\nSmall values of f and a result in a slow rise (a) and slow decay (f) for the integrator. b is a gain\nfactor. \n where xmax is the largest output activity of the network. The build-up of\nattentional interest is therefore stimulus dependent. The attentional interest itself is modelled\nas a Gaussian according to the gradient model of attention [7]:\n\n\u2013[\n1 H R L\u2013(\n\nH xmax\n\n]fL\n\n]+\n\nR\n\n=\n\n\u2013\n\n)\n\n)\n\n(9)\n\n\u2013\n\np\u2013\nk\n-----------\n2s 2\n\ne\n\n=\n\nmaxAk\n\n(11)\nAk\nHere, Ak is the normalised attentional interest at frequency channel k and maxAk is the\nmaximum value that Ak can attain. p is the channel at which the peak of attentional interest\noccurs, and s\n\n determines the width of the peak.\n\nq\nq\n(cid:229)\n\u0141\n\u0142\n(cid:231)\n(cid:247)\n(cid:230)\n(cid:246)\n\fA segment or group of segments are said to be attended to if their oscillatory activity coincides\ntemporally with a peak in the ALI activity. Initially, the connection weights between the\noscillator array and the ALI are strong: all segments feed excitation to the ALI, so all segments\nare attended to. During sustained activity, these weights relax toward the Ak interest vector\nsuch that strong weights exist for channels of high attentional interest and low weights exist\nfor channels of low attentional interest. ALI activity will only coincide with activity of the\nchannels within the attentional interest peak and any harmonically related (synchronised)\nactivity outside the Ak peak. All other activity will occur within a trough of ALI activity. This\nbehaviour allows both individual tones and harmonic complexes to be attended to using only a\nsingle Ak peak.\nThe parameters for all simulations reported here were e = 0.4, g = 6.0, b = 0.1, Wz = 0.5, q z =\n0.1, q x = -0.5 and K = 50, d = 0.001, c = 5, g = 3, a = 0.0005, f = 5, b = 3, maxAk = 1, s\n = 3, q ALI\n= 1.5, Ilow = -5.0, Ihigh = 0.2.The inter- and intra- segment connections have equal weights of\n1.1.\n3 Evaluation\n\nThroughout this section, output from the model is represented by a \u2018pseudospectrogram\u2019 with\ntime on the abscissa and frequency channel on the ordinate. Three types of information are\nsuperimposed on each plot. A gray pixel indicates the presence of a segment at a particular\nfrequency channel, which is also equivalent to the external input to the corresponding\noscillator: gray signifies Ihigh (causing the oscillator to be stimulated) and white signifies Ilow\n(causing the oscillator to be unstimulated). Black pixels represent active oscillators (i.e.\noscillators whose x value exceeds a threshold value). At the top of each figure, ALI activity is\nshown. Any oscillators which are temporally synchronised with the ALI are considered to be\nin the attentional foreground.\n3.1 Segregation of a component from a harmonic complex\n\nDarwin et al. [5] investigated the effect of a mistuned harmonic upon the pitch of a 12\ncomponent complex tone. As the degree of mistuning of the fourth harmonic increased\ntowards 4%, the shift in the perceived pitch of the complex also increased. This effect was less\npronounced for mistunings of more than 4%; beyond 8% mistuning, little pitch shift was\nobserved. Apparently, the pitch of a complex tone is calculated using only those channels\nwhich belong to the corresponding stream. When the harmonic is subject to mistunings below\n8%, it is grouped with the rest of the complex and so can affect the pitch percept. Mistunings\nof greater than 8% cause the harmonic to be segregated into a second stream, and so it is\nexcluded from the pitch percept. \n\nA\n\n120\n100\n80\n60\n40\n20\n\nl\n\ne\nn\nn\na\nh\nC\n\nB\n\n120\n100\n80\n60\n40\n20\n\nl\n\ne\nn\nn\na\nh\nC\n\nC\n\n120\n100\n80\n60\n40\n20\n\nl\n\ne\nn\nn\na\nh\nC\n\nDarwin\n\nModel\n\nD\n\n1.5\n\n)\nz\nH\n\n(\n \nt\nf\ni\nh\ns\n \nh\nc\nt\ni\n\nP\n\n1\n\n0.5\n\n0\n\n20\n\n40\n\n60\nTime (ms)\n\n80\n\n0\n\n20\n\n40\n\n60\nTime (ms)\n\n80\n\n0\n\n20\n\n40\n\n60\nTime (ms)\n\n80\n\n0\n\n0\n8\nMistuning of 4th harmonic (%)\n\n4\n\n2\n\n6\n\nFigure 2: A,B,C: Network response to mistuning of the fourth harmonic of a 12 harmonic\ncomplex (0%, 6% and 8% respectively). ALI activity is shown at the top of each plot.\nGray areas denote the presence of a segment and black areas denote oscillators in the\nactive phase. Arrows show the focus of attentional interest. D: Pitch shift versus degree of\nmistuning. A Gaussian derivative is fitted to each data set. \n\n\f120\n\n100\n\n80\n60\n\n40\n20\n\nl\n\ne\nn\nn\na\nh\nC\n\n0\n\n100\n\n200\n\n300\n\nTime (ms)\n\n400\n\n500\n\n600\n\nFigure 3: Captor tones preceding the complex capture the fourth harmonic into a separate\nstream. ALI activity (top) shows that this harmonic is the focus of attention and would be\n\u2018heard out\u2019 . The attentional interest vector (Ak) is shown to the right of the figure.\n\nThis behaviour is reproduced by our model (Figure 2). All the oscillators at frequency\nchannels corresponding to harmonics are temporally synchronised for mistunings up to 8%\n(plots A and B) signifying that the harmonics belong to the same perceptual group. Mistunings\nbeyond 8% cause the mistuned harmonic to become desychronised from the rest of the\ncomplex (plot C) - two distinct perceptual groups are now present: one containing the fourth\nharmonic and the other containing the remainder of the complex tone. A comparison of the\npitch shifts found by Darwin et al. and the shifts predicted by the model is shown in plot D.\nThe pitch of the complex was calculated by creating a summary correlogram (similar to that\nused in section 2.2) using frequency channels contained within the complex tone group. Only\nsegment channels below 1.1 kHz were used for this summary since low frequency (resolved)\nharmonics are known to dominate the pitch percept [8]. \nDarwin et al. also showed that the effect of mistuning was diminished when the fourth\nharmonic was \u2018captured\u2019 from the complex by four preceding tones at the same frequency. In\nthis situation, no matter how small the mistuning, the harmonic is segregated from the\ncomplex and does not influence the pitch percept. Figure 3 shows the capture of the harmonic\nwith no mistuning. Attentional interest is focused on the fourth harmonic: oscillator activity\nfor the captor tone segments is synchronised with the ALI activity. During the 550 ms before\nthe complex tone onset, the age tracker activities for the captor tone channels build up. When\nthe complex tone begins, there is a significant age difference between the frequency channels\nstimulated by the fourth harmonic and those stimulated by the remainder of the complex. Such\na difference prevents excitatory harmonicity connections from being made between the fourth\nharmonic and the remaining harmonics. This behaviour is consistent with the old-plus-new\nheuristic; a current acoustic event is interpreted as a continuation of a previous stimulus.\nThe old-plus-new heuristic can be further demonstrated by starting the fourth harmonic before\nthe rest of the complex. Figure 4 shows the output of the model when the fourth harmonic is\nsubject to a 50 ms onset asynchrony. During this time, the age trackers of channels excited by\nthe fourth harmonic increase to a significantly higher value than those of the remaining\nharmonics. Once again, this prevents excitatory connections being made between the fourth\nharmonic and the other harmonically related segments. The early harmonic is desynchronised\nfrom the rest of the complex: two streams are formed. However, after a period of time, the\nimportance of the onset asynchrony decreases as the channel ages approach their maximal\nvalues. Once this occurs, there is no longer any evidence to prevent excitatory links from\nbeing made between the fourth harmonic and the rest of the complex. Grouping by\nharmonicity then occurs for all segments: the complex and the early harmonic synchronise to\nform a single stream.\n3.2 Auditory streaming\n\nWithin the framework presented here, auditory streaming is an emergent property; all events\nwhich occur over time, and are subject to attentional interest, are implicitly grouped. Two\ntemporally separated events at different frequencies must both fall under the Ak peak to be\n\n\fgrouped. It is the width of the Ak peak that determines frequency separation-dependent\nstreaming, rather than local connections between oscillators as in [10]. The build-up of\nstreaming [1] is modelled by the leaky integrator in (9). Figure 5 shows the effect of two\ndifferent frequency separations on the ability of the network to perform auditory streaming and\nshows a good match to experimental findings [1], [4]. At low frequency separations, both the\nhigh and low frequency segments fall under the attentional interest peak; this allows the\noscillator activities of both frequency bands to influence the ALI and hence they are\nconsidered to be in the attentional foreground. At higher frequency separations, one of the\nfrequency bands falls outside of the attentional peak (in this example, the high frequency tones\nfall outside) and hence it cannot influence the ALI. Such behaviour is not seen immediately,\nbecause the attentional interest vector is subject to a build up effect as described in (9).\nInitially the attentional interest is maximal across all frequencies; as the leaky integrator value\nincreases, the interest peak begins to dominate and interest in other frequencies tends toward\nzero.\n4 Discussion\n\nA model of auditory attention has been presented which is based on previous neural oscillator\nwork by Wang and colleagues [10], [11] but differs in two important respects. Firstly, our\nnetwork is unidimensional; in contrast, Wang\u2019s approach employs a two-dimensional time-\nfrequency grid for which there is weak physiological justification. Secondly, our model\nregards attention as a key factor in the stream segregation process. In our model, attentional\ninterest may be consciously directed toward a particular stream, causing that stream to be\nselected as the attentional foreground. \nFew auditory models have incorporated attentional effects in a plausible manner. For example,\nWang\u2019s \u2018shifting synchronisation\u2019 theory [3] suggests that attention is directed towards a\nstream when its constituent oscillators reach the active phase. This contradicts experimental\nfindings, which suggest that attention selects a single stream whose salience is increased for a\nsustained period of time [2]. Additionally, Wang\u2019s model fails to account for exogenous\nreorientation of attention to a sudden loud stimulus; the shifting synchronisation approach\nwould multiplex it as normal with no attentional emphasis. By ensuring that the minimum Ak\nvalue for the attentional interest is always non-zero, it is possible to weight activity outside of\nthe attentional interest peak and force it to influence the ALI. Such weighting could be derived\nfrom a measure of the sound intensity present in each frequency channel.\nWe have demonstrated the model\u2019s ability to accurately simulate a number of perceptual\nphenomena. The time course of perception is well simulated, showing how factors such as\nmistuning and onset asynchrony can cause a harmonic to be segregated from a complex tone.\nIt is interesting to note that a good match to Darwin\u2019s pitch shift data (Figure 2D) was only\nfound when harmonically related segments below 1.1 kHz were used. The dominance of lower\n(resolved) harmonics on pitch is well known [8], and our findings suggest that the correlogram\ndoes not accurately model this aspect of pitch perception.\n\n120\n\n100\n\n80\n60\n\n40\n20\n\nl\n\ne\nn\nn\na\nh\nC\n\n0\n\n50\n\n100\n\n150\n\n200\n\n250\n\n300\n\n350\n\nTime (ms)\n\nFigure 4: Asynchronous onset of the fourth harmonic causes it to segregate into a separate\nstream. The attentional interest vector (Ak) is shown to the right of the figure.\n\n\f100\n\n90\n\n80\n\nl\n\ne\nn\nn\na\nh\nC\n\n100\n\n90\n\n80\n\nl\n\ne\nn\nn\na\nh\nC\n\n0\n\n200\n\n400\nTime (ms)\n\n600\n\n0\n\n200\n\n400\nTime (ms)\n\n600\n\nFigure 5: Auditory streaming at frequency separations of 5 semitones (left) and 3\nsemitones (right). Streaming occurs at the higher separation. The timescale of adaptation\nfor the attentional interest has been reduced to aid the clarity of the figures. \n\nThe simulation of two tone streaming shows how the proposed attentional mechanism and its\ncross-frequency spread accounts for grouping of sequential events according to their proximity\nin frequency. A sequence of two tones will only stream if one set of tones fall outside of the\npeak of attentional interest. Frequency separations for streaming to occur in the model (greater\nthan 3 to 4 semitones) are in agreement with experimental data, as is the timescale for the\nbuild-up of the streaming effect [1]. \nIn summary, we have proposed a physiologically plausible model in which auditory streams\nare encoded by a unidimensional neural oscillator network. The network creates auditory\nstreams according to grouping factors such as harmonicity, frequency proximity and common\nonset, and selects one stream as the attentional foreground. Current work is concentrating on\nexpanding the system to include binaural effects, such as inter-ear attentional competition [4].\n\nReferences\n\n[1] Anstis, S. & Saida, S. (1985) Adaptation to auditory streaming of frequency-modulated tones. J. Exp.\nPsychol. Human 11 257-271.\n[2] Bregman, A. S. (1990) Auditory Scene Analysis. Cambridge MA: MIT Press.\n[3] Brown, G. J. & Cooke, M. (1994) Computational auditory scene analysis. Comput. Speech Lang. 8,\npp. 297-336. \n[4] Carlyon, R. P., Cusack, R., Foxton, J. M. & Robertson, I. H. (2001) Effects of attention and unilateral\nneglect on auditory stream segregation. J. Exp. Psychol. Human 27(1) 115-127.\n[5] Darwin, C. J., Hukin, R. W. & Al-Khatib, B. Y. (1995) Grouping in pitch perception: Evidence for\nsequential constraints. J. Acoust. Soc. Am. 98(2)Pt1 880-885.\n[6] Joliot, M., Ribary, U. & Llin\u00e1s, R. (1994) Human oscillatory brain activity near 40 Hz coexists with\ncognitive temporal binding. Proc. Natl. Acad. Sci. USA 91 11748-51.\n[7] Mondor, T. A. & Bregman, A. S. (1994) Allocating attention to frequency regions. Percept.\nPsychophys. 56(3) 268-276.\n[8] Moore, B. C. J. (1997) An introduction to the psychology of hearing. Academic Press.\n[9] Spence, C. J., Driver, J. (1994) Covert spatial orienting in audition: exogenous and endogenous\nmechanisms. J. Exp. Psychol. Human 20(3) 555-574.\n[10] Wang, D. L. (1996) Primitive auditory segregation based on oscillatory correlation. Cognitive Sci.\n20 409-456.\n[11] Wang, D. L. & Brown, G. J. (1999) Separation of speech from interfering sounds based on\noscillatory correlation. IEEE Trans. Neural Networks 10 684-697.\n\n\f", "award": [], "sourceid": 2110, "authors": [{"given_name": "Stuart", "family_name": "Wrigley", "institution": null}, {"given_name": "Guy", "family_name": "Brown", "institution": null}]}