Four male Long-Evans rats (Harlan; Mississauga, Canada), 6-10 months old at the beginning of behavioral training, were first habituated to wearing a LED backpack, used for video tracking during behavioral sessions. During this first week, rats were food- restricted such that they gradually approached approximately 90%, but never less than 85%, of their free-feeding weight. Next, rats were introduced to the apparatus, an elevated linear track 1.8m in length with food pellet reward receptacles at both ends (Figure 1). On this track, different audio cues were associated with different reward outcome distributions (described in detail in the next section). Reward receptacles were equipped with photobeams, such that delays could be imposed between rats nosepoking into the receptacle and the time of reward delivery. As rats learned the task, this delay was gradually increased to 500ms.
Once rats reliably ran >100 trials in a daily 40-minute session (average 11.25 daily sessions from start of track training, range 5-21 sessions), they were surgically implanted with an array of tetrodes targeting the ventral striatum. Following recovery, rats were re-trained on the task. Once they were running >100 trials reliably and recording electrodes reached their targets, neural data acquisition during behavior commenced (average 11.5 sessions after surgery, range 6-18 sessions); at this stage video tracking relied on LEDs attached to the recording headstage rather than the backpack.
Recording sessions with (1) at least one recording electrode in the vStr, and (2) in which rats ran at least 100 trials, were eligible for initial behavioral analysis (25 sessions total: 3 from R014, 7 from R016, 8 from R018, and 7 from R020, where R014-020 are subject IDs). As described in the Results, in 16 of these 25 sessions there was behavioral evidence for successful discrimination between the reward-predictive cues; neural data from these sessions only was analyzed further. All procedures were pre-approved by the University of Waterloo Animal Care Committee, and performed in accordance with Canadian Council for Animal Care (CCAC) guidelines.
The behavioral task design had two objectives: first, to elicit behavioral evidence that rats distinguished between different reward outcomes, and second, to include a stereotyped period during which neural signals could be compared without confounding overt behavioral differences. To accomplish both in a setting in which vStr gamma oscillations have been previously found, we constructed an elevated linear track from wood, painted matte black, 1.8m in length and 10cm wide. The ends of the linear track were equipped with custom- built food pellet reward receptacles, into which rats could nosepoke to break an infrared photobeam (Coulbourn; Figure 1A).
To trigger reward delivery, rats had to hold the nosepoke for 500ms, at which point an automated pellet dispenser (Coulbourn) released a number of food pellets (described below; pellets are 45mg Test Diet 5TUL). The first pellet arrived in the receptacle between 750 and 1000ms after reward delivery is triggered, resulting in a period of at least 1250ms during which rats await reward delivery while stationary at the reward sites. A run from one receptacle to the other was defined as a trial, which could be successful (if nosepoke held for at least 500ms) or error (no nosepoke made, or withdrawn before 500ms; no reward dispensed). Only successful trials were included for analysis.
Figure 1: Behavioral apparatus. A: Rats shuttled back and forth on an 1.8m linear track, with food pellet reward receptacles at each end. To obtain reward, rats were required to hold a nosepoke for 500ms. The number of pellets received was signaled by audio cues, presented when rats traversed a specific location near the center of the track (jittered by a random distance of up to 15 cm on a trial by trial basis, to prevent cue onset from being predictable to the rats), and played from a speaker placed behind the currently rewarded receptacle.
The number of pellets delivered on a given trial was signaled by one of five audio cues, triggered when rats entered the center zone of the track (Figure 1A). Random jitter between +15 to -15cm was added to the cue presentation trigger zone on each trial, to prevent cue onset from being predictable by the rats. The five audio cues were:
Cues were played from a speaker placed behind the currently armed receptacle, such that the average sound intensity at the center of the track was measured at 75 dB. Cues remained on until either (1) an unsuccessful (early unpoke) nosepoke was made, (2) one second after a successful nosepoke, or (3) the rat re-entering the trigger zone in the center of the track. Each cue was associated with a different reward outcome distribution:
The mapping between audio cues to outcome distributions was counterbalanced between subjects to ensure that differences in behavior between cues could not be the result of intrinsic salience or unconditioned responding to specific cues. To determine if rats learned the association between cue and outcome distribution, we computed their running speed in the ”run” epoch between cue onset and nosepoke; based on classic results (e.g. Crespi 1942) we expected rats to run faster in response to the 5-pellet cue than to the 1-pellet cue.
Daily training and recording sessions included two 20-minute blocks: a ”value” block and a ”risk” block. During the ”value” block, outcomes with certain reward of 1 (low value), 3 and 5 (high value) pellets were pseudorandomly assigned to trials with a frequency of 0.4, 0.2 and 0.4 respectively (i.e. of 100 total trials, 40 are 1-pellet, 20 are 3-pellet, and 40 are 5-pellet) such that the same cue could not occur more than twice in succession. Similarly, the ”risk” block consisted of low risk (2 or 4 pellets, frequency 0.4), no- risk (certain 3 pellets, frequency 0.2) and high risk (1 or 5 pellets, frequency 0.4). The certain 3-pellet cue was included in both blocks to provide a consistent reference point for tracking possible changes in behavior across blocks; the comparisons of interest are between low and high value, and between low and high risk. Recording sessions additionally included 5 minutes of ”off-task” recording in a separate container (a terra cotta flower pot filled with towels) before and after running on the track.