Activation of Human Auditory Cortex in Retrieval Experiments: An fMRI Study

In a previous functional magnetic resonance (fMRI) study, a subdivision of the human auditory cortex into four distinct territories was achieved. One territory (T1a) exhibited functional specialization in terms of a foreground-background decomposition task involving matching-to-sample monitoring on tone sequences. The present study more specifically determined whether memory-guided analysis of tone sequences is part of the T1a specialization. During the encoding periods, an arbitrary and unfamiliar four-tone-sequence (melody) played by one instrument was presented. The melody-instrument-combination was different in each period. During subsequent retrieval periods, learned and additional combinations were presented, and the tasks were either to detect the target melodies (experiment I) or the target instruments (experiment II). T1a showed larger activation during the melody retrieval. The results generally suggest that (1) activation of T1a during retrieval is determined less by the sound material than by the executed task, and (2) more specifically, that memory-guided sequential analysis in T1a is dominant over recognition of characteristic complex sounds.


INTRODUCTION
In a recent functional magnetic resonance (tMRI) study of the human auditory cortex (AC), we analyzed activation during sequential matchingto-sample tasks with tones or notes played by musical instruments and distinguished three separate territories of activation (T1, T2, T3) on the supratemporal plane (Scheich, 1998). By adding a distractive high-level auditory background (4 Hz continuous broad band frequency modulations), we identified a functional subdivision of T1 (secondary cortex field T la). The activation of this subdivision was resistant to the masking influence of the background. Namely, the activation was higher than that produced by the background alone in contrast to all other territories, including the primary auditory cortex in which the activity was saturated by this background. This finding, in conjunction with other unique aspects of the response, led to the conclusion that T la is relatively specialized for selective listening under conditions of complex masking, presumably reflecting a process of foreground-background decomposition.
Not distinguished in these experiments were the possibilities that T la contained additional specializations for auditory tonal matching-to-sample tasks or more generally speaking, for tasks involving tone sequence (melody) perception (Dowling, 1986;Zatorre, 1994).
The issue of tone sequence processing in the auditory cortex has been tackled in the awake monkey auditory cortex with single unit recording (Hocherman, 1976;Hocherman, 1981;Vaadia, 1982;Gilat, 1984;Gottlieb, 1989) and with bilateral ablation experiments (see Neff, 1975). Specific unit responses related to a delayed matching-to-sample (C) Freund and Pettman, U.K., 1998 Period "silence" (30s) "encoding" (30s) "retrieval" (1 min) presentation of one melody played by the target instrument presentation of different instruments (including target instrument) playing the same melody as in the preceding encoding periods task were preferentially found in belt areas around the primary AC. Conversely, ablation impaired the performance in such tasks when belt areas were included. A recent study in the monkey cortex searched more specifically for mechanisms that permit the temporal integration of sequences of different tones (Brosch, 1998a). Thereby, forward and backward interactions between the responses to consecutive tones were found which are different from the masking effects that are seen over short intervals.
The present experiments address this issue with respect to learning by using a melody retrieval task: Thereby, arbitrary four-note sequences (melodies) played by different instruments were encoded, keeping either the melody or the instrument constant, in the consecutive retrieval period, various melody-instrument-combinations were presented, and the task was to retrieve either the encoded instrument (control) or the encoded melody. EXPERIMENTAL Subjects Seven fight-handed subjects (6 males, 11 females, age range from 21 to 45 yr, mean age 26.8 yr) with normal heating participated in this investigation and gave written consent. The subjects were familiar with the laboratory and had participated frequently in tMRI experiments, but they were unfamiliar with the special study design. The study was approved by the Ethics Committee of the Otto-von-Guericke University, Magdeburg.

Machine noise and acoustic stimulation
In a multi-step procedure, the gradient noise of the 3T scanner was reduced by >50 dB at the expense of slower imaging. The headphone system (see below) gave >20 dB suppression of background noise for frequencies above 0.5 kHz (2 kHz >30 dB). The use of a FLASH-sequence offers the possibility to slow down the gradient switching [in this study by a factor of 40 from 150 (s to 6 ms)]. Together with an optimized excitation pulse and modified spoiler gradients, the noise level was reduced by >30 dB below 500 Hz. The final imaging sequence focusing on a few slices produced a noise peak level of 48 dB SPL at the position of the ear. This "low noise" protocol ensures that unwanted auditory foreground, background interaction (stimuli: acoustic machine noise) could be diminished as far as possible in the tMRI experiments. The stimuli were presented through modified electrodynamic headphones that are integrated into ear muffs (Baumgart, 1998 For each of the six cycles, the particular target melody and target instrument were changed. In experiment I, 10 different melodies (sequences of four arbitrary harmonic tones representing no familiar tune) were presented. In experiment II, 21 different musical instruments were presented. The difference of total melodies and instruments was adjusted to match the degree of difficulty between the two tasks (similar performance). In the encoding periods, the target was presented 8-12 times, whereas a retrieval period contained 16-21 events with 5-6 targets. Detection of targets in the retrieval period was indicated by a key press (hit rate in both experiments 75% to 85 %, no significant change during runs).

Data acquisition
Subjects were scanned in a BRUKER BIOSPEC 3T/60em system with a birdeage head-coil and an asymmetric gradient system (30 mT/m). Functional images were collected using a conventional gradient echo sequence with a repetition time of 188 ms per slice, an echo time of 40 ms, and a low flip angle (8 ) to avoid functional mismatches from inflow artifacts (Frahm, 1994). High T 1-contrast anatomical imaging (MDEFT) to obtain landmarks followed the tMRI. Functional images of three contiguous, nearly horizontal slices with a thickness of 8 mm each and in plane resolution of 2.5 mm (64x40 voxels, 16x 16 em 2 field of view) were scanned. For each slice, 96 functional images were acquired in a total time of 12 min. Orientation of an ideal "horizontal" plane was individually adjusted starting at the leg Sylvian fissure from parasagittal anatomical images and were readjusted in coronal images to cover the superior temporal plane in both hemispheres.
To prevent motion artifacts, the head of the subject was fixed with a vacuum cushion, which included the ear muffs, and sealed them to the skin by pressure.

Data analysis
In each slice, the area of interest in both hemispheres was analyzed by correlation analysis (Bandettini, 1993) to obtain a statistical parametric map. In all experiments, activated voxels (p <0.05 over all cycles) in each subject were attributed to one of three territories (T 1 (T 1 a and T lb), T2, T3) by the following procedure: On the basis of a previous study (Scheich, 1998), landmarks in each individual brain served for definition of territories in conjunction with anatomical accounts (Galaburda & Sanides, 1980, Steinmetz, 1989. In consecutive horizontal slices, the wedge or y-shaped Hesehl's gyms (T1, red; including primary cortex, see Fig. 1) was readily distinguishable by its caudal boundary that is, the second transverse suleus or Hesehl's suleus, which laterally cuts into the outer rim of the gyms temporalis superior. The rostroeaudal pareellation of T 1 into T 1 a and T lb was based on the independent activations in the previous study (Seheieh, 1998). The intersection of the first transverse suleus and the insular suleus at the bend of the latter served as an approximate landmark. Adjacent to Heschl's sulcus, already on the anterior planum temporale, T2 activity (green) was allocated. T3 activity (blue) on the intermediate and caudal parts of the planum temporale was delimited functionally from T2 by a non-activated area. From these territories, absolute intensity weighted volumes (IWV) (product of the activated volume and average signal change of the activated voxels in each territory in all slices) were determined. The IWV for each territory was tested for differences between experiment I and II (two independent groups of subjects, one-tailed Mann-Whitney U test, p <0.05), p-values are given in the text.

RESULTS
Experiments I and II led to an activation in four distinct territories (T la, T lb, T2, T3), as described in the previous study (Seheich, 1998) (see Fig. 1).
As a measure of activation the absolute intensity weighted volume (IWV) of T la was significantly larger (p=0.045) in the melody retrieval experiment I as compared to instrument retrieval experiment II (see Fig. 2). In contrast, the activation in territories Tlb (p=0.43), which includes primary auditory cortex and T2 (p=0.1), mainly on the rostral planum temporale, did not differ in the two experiments. In the territory T3 on the intermediate planum temporale there was a slight tendency for a higher absolute IWV in experiment II, but it did not reach significance.
The high level of activation of T la in both experiments becomes even more evident in comparison with the pure tone matching-to-sample task of the recent study (Scheich, 1998). The absolute IWV of the territory T 1 a were significantly larger in both experiments of the present study (p <0.012). is split into Tl.a and Tlb. The number of subjects participating in each experiment is indicated in parentheses. Absolute IWV in Tla significantly decreased from experiment I to II (p=0.045). There was no significant difference between experiment I and II in the other territories.

DISCUSSION
The performance of the subjects in experiment II (retrieval among 21 instruments) was comparable to that in experiment I (retrieval among 10 melodies). In spite of this, the results of the present experiments showed that on average the retrieval of melodies independent of the playing instrument activates territory T la more than the retrieval of instruments independent of the melody they are playing. This suggests that T la was activated more specifically by memory processes involving sequential analysis of frequencies rather than analysis of characteristic sounds. Neurophysiological studies have addressed the questions of specializations in awake monkey AC with respect to tone sequence processing using matching-to-sample tasks. In one study (Gottlieb, 1989), a high proportion of units, especially in belt areas around primary AC was found, which showed special firing patterns during the delays between tones. Bilateral ablation of AC of various monkey species (see Neff, 1975) impaired the performance of matching-to-sample tasks when belt areas were included. Special mechanisms relating to tone sequence processing are frequently found among neuronal response properties, even in the anesthetized monkey. In the search for mechanisms that permit the integration of consecutive tones, response properties of units in the primary AC and especially in the belt areas were found, which act forwards and quasi backwards in time (Brosch, 1998a), (Brosch, 1998b). The late response to a tone could be backward suppressed by a consecutive different tone, whereas the first response to a tone could be forward facilitated by a preceding different tone. Likewise, it was shown by McKenna et al. (1989) that the responses of single neurons in the auditory cortex of the cat often depends on the serial position of a tone in a sequence. In view of these frequently encountered neurons, fields for even more specialized aspects of sequential tone processing may be assumed in AC. Tla was also activated in the previous experiment using pair-wise matching-to-sample tasks with randomly varied pure tones or notes played by different musical instruments (Scheich, 1998). This effect even survived the masking influence of a jamming background sound (continuous frequency modulation). Both the present and the previous experiments are compatible with the idea that T la is specialized for sequential analysis of frequencies.
The present experiment used a short term memory task (retrieval over seconds) in contrast to the continuous pair-wise matching-to-sample tasks. The latter involves only continuous pair-wise samedifferent comparisons (working memory), whereas melody retrieval requires recognition of a "temporal Gestalt". Therefore, it may be assumed that memory-guided processing in T la is of a more general type related to sequential analysis applying to both. In this context it should be noted that true sequential analysis per definition requires at least a short term memory process. It could be argued that T la activation could be explained mainly by differential involvement of filters specialized for stimulus properties. In the present experiments, melody retrieval, as well as instrument retrieval, used melodies as a common stimulus aspect. Thus, the difference between the two experiments in T la activation may not be due to the stimulus material but rather to the selectively memory-guided type of processing that is performed in T la on the given stimulus material. This is a "top down" characteristic that has also been attributed to the late auditory evoked potentials (AEP), such as the N400 wave, which are sometimes called endogenous responses (see Kraus, 1992). Late AEP are typically found to extend from the region of the primary AC anteriorly or posteriorly over the cortex.
The present experiment extends the previous characterization of a secondary human AC field as being involved in auditory foreground-background decomposition (Scheich, 1998). In this previous study that tried to construct some essential components of the so-called Cocktail Party effect (Cherry, 1953), sequences of tones were used as a foreground while a continuous frequency modulation served as a background. The specificity of T 1 a activity with respect to the decomposition, that is, the separation of the two simultaneous sound patterns, was demonstrated. The present analysis attributes specificity to T la also with respect to the sequential listening aspect, which is characteristic of the Cocktail Party effect. Thus, it is not unlikely that T la contains mechanisms that relate to various components of selective listening capabilities under jamming conditions that are biologically of fundamental importance.