Cholecystokinin Attenuates Incentive Learning in Rats

4 downloads 0 Views 1MB Size Report
The hypothesis that endogenous Cholecystokinin (CCK) reduces the incentive value assigned to food was examined by training undeprived rats to lever press ...
Behavioral Neuroscience 1995, Vo). 109, No. 2,312-319

Copyright 1995 by the American Psychological Association, Inc. ()735-7044/95/$3.00

Cholecystokinin Attenuates Incentive Learning in Rats Bernard Balleine, Alison Davies, and Anthony Dickinson University of Cambridge The hypothesis that endogenous Cholecystokinin (CCK) reduces the incentive value assigned to food was examined by training undeprived rats to lever press and chain pull, with one action earning food pellets and the other maltodextrin solution. All animals were then food deprived and reexposed to one outcome after an injection of CCK-8 and the other after an injection of vehicle (VEH). When maintained food deprived and given a choice between the lever and chain in an extinction test, the rats performed fewer of the action trained with the outcome that was reexposed under CCK whether tested under CCK or VEH. In a subsequent experiment, this preference was attenuated by coadministration of the 5-hydroxytryptaminejA (5-HTiA) agonist 8-hydroxy-2(di-npropylamino)tetralin during reexposure, suggesting that CCK interacts with 5-HT to modify incentive value.

Although for many years instrumental conditioning was thought to be mediated by basic stimulus-response processes, it is now generally accepted that instrumental actions reflect learning about the contingency between the action and reinforcer or outcome, whether this learning is conceptualized in terms of a belief concerning the consequences of an action or in terms of the formation of an action-outcome association (cf. Dickinson, 1989). In addition, it has become clear that the experienced incentive value of the outcome (i.e., its affective and motivationally relevant properties) controls the expression of instrumental learning in performance (Balleine, 1992; Balleine & Dickinson, 1994; Dickinson & Balleine, 1994). The role of the animal's evaluation of the instrumental outcome is particularly relevant to understanding the process by which instrumental performance is controlled by primary motivational states, such as hunger and thirst. Rather than being directly sensitive to changes in motivational state, instrumental performance is largely controlled by the way these states affect the incentive value of the outcome. Thus, a shift in the degree of primary motivation often influences instrumental performance only if animals are allowed consummatory contact with the instrumental outcome after the shift, thereby allowing an evaluation of the effect of the new motivational state on the incentive properties of the outcome. We have referred to the process by which motivational states determine and control outcome value as incentive learning (Balleine, 1992; Dickinson & Balleine, 1994). Evidence that incentive learning plays a role in the way primary motivational states control instrumental performance

is derived from experiments assessing the effects of a posttraining shift in food deprivation. In one such study, for example, Balleine and Dickinson (1994) trained food-deprived rats to perform two actions—lever pressing and chain pulling—on a concurrent schedule, with one action earning access to food pellets and the other, maltodextrin solution. After training, the rats were given a number of reexposure sessions in which they were allowed to consume one of the instrumental outcomes in the training state (i.e., food deprived) and the other following a period of free feeding (i.e., undeprived). The rats were then given a single-choice extinction test between the chain and the lever when undeprived with the test conducted in extinction. It was found that the animals performed significantly fewer of the action that, in training, had delivered the outcome to which they were subsequently reexposed in the undeprived state. Balleine and Dickinson (1994) interpreted this result as indicating that, during reexposure, the animals assigned a low incentive value to the outcome reexposed in the undeprived state, a differential evaluation that was then manifest in subsequent instrumental performance during the test. As such, this experiment suggests that, in instrumental conditioning, animals learn about the consequences of their actions and that motivational control of performance is then a matter of the way in which changes in deprivation act to modify the incentive value of those consequences. An investigation of the physiological bases of incentive learning has focused on the role of the gut peptide Cholecystokinin (CCK) in the changes in the incentive value of foods found to accompany a reduction in food deprivation (cf. Balleine & Dickinson, 1994). The finding that nutritive substances act as a stimulus for the release of CCK, when coupled with the demonstrated anorectic effect of exogenously administered CCK, argues for a role of CCK in the short-term satiety that follows a period of feeding, the so-called CCK-satiety hypothesis (Gibbs, Young, & Smith, 1973a, 1973b; Smith & Gibbs, 1992). We (Balleine & Dickinson, 1994) extended this hypothesis by suggesting that CCK may also play a role in the determination of the low incentive value assigned to a food outcome when it is experienced in the undeprived state. Support for this proposal was gained initially through

Bernard Balleine, Alison Davies, and Anthony Dickinson, Department of Experimental Psychology, University of Cambridge, Cambridge, England. This research was supported by a Research Fellowship from Jesus College, Cambridge, to Bernard Balleine and by a Biotechnology and Biological Sciences Research Council grant. We thank Gerry Dawson of Merck Sharp & Dohme for the gift of the cholecystokinin-8 and Colin Dourish of Wyeth Research UK Ltd for the gift of the 8-OH-DPAT used in this study. Correspondence concerning this article should be addressed to Bernard Balleine, Jesus College, Cambridge, CB5 8BL, England.

312

313

CHOLECYSTOKININ AND INSTRUMENTAL ACTION investigating whether the CCK-A antagonist devazepide would block the assignment of a low incentive value when administered during the reexposure stage of a motivational shift study (Balleine & Dickinson, 1994, Experiment 4). As in the experiment described previously, hungry rats were trained to lever press and chain pull for food pellets and maltodextrin solution before receiving reexposure sessions to both outcomes when they were undeprived. This treatment should ordinarily be expected to produce an equivalent reduction in the incentive value of both outcomes. In this experiment, however, reexposure to one of the outcomes in the undeprived state was preceded by an injection of devazepide. Access to the other outcome was given undeprived after an injection of vehicle. If endogenous CCK is a signal released in the feeding system that reduces the incentive value of foods when animals are undeprived, devazepide should block the action of circulating CCK in these animals and so should act to increase the incentive value of foods reexposed after its injection. As a consequence, in a choice test conducted in extinction, undeprived animals should have performed more of that action that, in training, delivered the outcome subsequently reexposed under devazepide. This is exactly what Balleine and Dickinson (1994) found. In summary, CCK appears to act as a signal within the feeding system that reduces the incentive value of foods after a period of feeding. The first experiment reexamined this hypothesis by investigating the capacity of exogenous CCK to modulate the determination of incentive value.

Experiment 1 Incentive learning mediates not only the impact of a posttraining reduction in food deprivation on instrumental performance but also that of a posttraining increase in food deprivation (cf. Balleine, 1992, Experiment 2). Balleine (1992) trained rats to lever press and chain pull for food pellets and maltodextrin solution when undeprived before they were then given a choice between the lever and the chain when food deprived. Before this extinction test, the rats were given an incentive learning treatment, with one outcome exposed when the rats were food deprived and the other exposed when they were undeprived. According to the incentive learning account, this exposure treatment should have resulted in the assignment of a higher incentive value to the outcome contacted under food deprivation than to the outcome contacted undeprived, a difference that should then be manifest in the subsequent choice extinction test. In accord with this prediction, Balleine (1992) found that, on testing, the animals performed more of the action that, in training, had delivered the outcome to which they had been previously exposed when in the food-deprived state than the other action. Experiment 1 utilized Balleine's (1992) basic design (see Figure 1). Rats were trained undeprived to lever press and chain pull for food pellets and maltodextrin. After training, animals were food deprived and given a series of reexposure sessions to both of the instrumental outcomes. In addition, animals received an injection of CCK-8 before reexposure to one outcome and an injection of saline vehicle (VEH) before reexposure to the other outcome. Reexposure to an outcome

TRAINING

REEXPOSURE

TEST

H: (CCK) O1 H: (VEH or CCK) A1 v's A2

L:A1 -»O1;A2-»O2

H: (VEH) O2

Figure 1. A schematic representation of the design of Experiment 1. Al and A2 refer to the lever press and chain pull actions and Ol and O2 refer to the pellet and maltodextrin outcomes. L = low (or undeprived) food deprivation; H = high food deprivation; CCK = cholecystokinin; VEH = vehicle.

when food deprived should, according to incentive learning theory, cause animals to assign a higher incentive value to that outcome. If, however, exogenously administered CCK acts in similar fashion to endogenous CCK, the hypothesis that CCK acts to reduce the incentive value of nutritive commodities predicts that an injection of CCK before reexposure should attenuate the high incentive value ordinarily induced by food deprivation. Thus, if animals are subsequently given an extinction test food deprived, they should perform fewer of the action that, in training, delivered the outcome reexposed under CCK. It may be argued, however, that reexposure to an outcome in a food-deprived state yields a high incentive value in spite of the presence of exogenous CCK but that this value becomes conditional on a CCK-induced state. From this statedependency account, a test conducted in the absence of the CCK state predicts the same pattern of results as the incentive learning hypothesis. In contrast to this hypothesis, however, the state-dependency account further predicts that, if the test is conducted under CCK, the high value of the outcome conditional on the CCK state should then be expressed in performance. From this perspective, therefore, the action whose training outcome was reexposed under CCK should be performed at least as much if not more than the other action. If, however, as Balleine and Dickinson (1994) argued, CCK acts only to attenuate the incentive value of nutrient outcomes when they are contacted in the food-deprived state, the performance of the action that, in training, delivered the outcome reexposed under CCK should be reduced irrespective of whether the extinction test follows VEH or CCK injections. To assess this state-dependency account, half of the animals were tested after an injection of CCK, whereas the remaining animals were tested after an injection of VEH. Method Subjects and apparatus. Sixteen naive adult male hooded Lister rats were used in Experiment 1. They were housed in squads of four and maintained on rat and mouse no. 1 (modified) low-protein, high-fiber, expanded pellets (Special Diet Services, UK). Training and testing took place in four Campden Instruments (Manchester, UK) operant chambers housed in sound- and light-resistant shells. Each chamber was equipped with a pellet dispenser, which delivered a 45-mg Noyes pellet (Formula A; Noyes, Lancaster, NH) into a recessed magazine that the rats could enter using a flap door positioned in the center of the front wall. In addition to the pellet dispenser, each chamber was also equipped with a dipper, which could deliver 0.05 ml of a 20% solution of Snowflake maltodextrin (Cerestar,

314

B. BALLEINE, A. DAVIES, AND A. DICKINSON

Manchester, UK), a complex polysaccharide flavored with 3% lemon juice, into the same recessed magazine. A lever was located to the right of the magazine flap door, and a chain was lowered through the roof from a microswitch so that it hung to the left of the flap door 3.5 cm from the front wall. Thus, the lever and chain were positioned symmetrically to the right- and left-hand sides of the magazine flap door, respectively. Each chamber was illuminated by a 3-W, 24-V houselight mounted in the center of the front wall above the flap door. A BBC microcomputer equipped with the SPIDER extension for on-line control (Paul Fray Ltd., Cambridge, UK) controlled the equipment and recorded the lever presses and chain pulls. The reexposure phase was conducted in four feeding cages. These were plastic boxes (30 cm long x 13 cm wide x 30 cm high) with wire mesh ceilings that were equipped with four small glass food dishes or with graduated plastic drinking tubes as appropriate. Procedure—instrumental training. The experiment was conducted in three stages: training, reexposure, and testing (see Figure 1). The rats were maintained undeprived throughout training with free access to tap water and their maintenance diet in their home cage. Each instrumental training session started with the onset of the houselight and ended with its offset. Initially, all subjects received two sessions of magazine training, in each of which 15 presentations of the food pellets and 15 presentations of the maltodextrin solution were delivered on a random-time 60-s schedule with the levers and chains withdrawn. In the first instrumental training session, the animals were trained to chain pull in the absence of the lever and then, in the subsequent session, to lever press in the absence of the chain. Instrumental responses were reinforced on a random-interval (RI) 2-s schedule during both sessions until 30 outcomes had been earned. For eight of the subjects, pressing the lever delivered the maltodextrin outcome and pulling the chain delivered the pellet outcome; the remaining eight subjects received the opposite action-outcome assignment. Rats were hand shaped where necessary until both instrumental actions were acquired. During concurrent training with both the lever and the chain, the pellet and the maltodextrin outcomes became available on a single RI schedule, with the available outcome specified as either the food pellet or maltodextrin with equal probability. The subroutine controlling the schedule was suspended whenever an outcome became available and restarted only once it had been delivered. During this training, 30 outcomes were delivered in each session; 15 contingent on one action and 15 contingent on the other. As soon as animals had obtained 15 of one type of outcome, this type was no longer delivered and the schedule immediately programmed the interval to the other outcome. The session continued until 15 of the other outcome had been delivered, at which point it ended. This type of concurrent schedule ensures that, on average, the outcomes are evenly distributed to the two actions across the session whatever the relative distribution of performance of the two actions. The nominal RI schedule value was 2 and 7 s for the first two sessions of concurrent training, respectively, and 15 s for each of the next five sessions. After the final session of training, the animals were assigned to conditions for reexposure and groups for testing. Reexposure and test phases. Immediately after the final training session, the animals were placed on a 22.5-hr food-deprivation schedule and were maintained on this deprivation schedule for the remainder of the experiment. On each of the next 4 days, the animals received reexposure to one of the outcomes in the feeding cages. The type of outcome was alternated across days so that two reexposures were given to each type of outcome. One outcome was presented after intraperitoneal injection of 4 jig/kg CCK-8 (sulphated), dissolved in physiological saline vehicle (4 (xg/ml), and the other outcome was presented after an equivolume injection of saline vehicle (VEH). In all cases, injections were given 20 min before reexposure. For half of the animals assigned to each action-outcome condition, the maltodextrin

was reexposed under CCK and the pellets were reexposed under VEH. The remaining animals received the opposite drug-outcome assignment. The order of injections was counterbalanced so that half the animals in each drug-outcome assignment received a CCK-VEHCCK-VEH injection order, with the remainder receiving a VEH-CCKVEH-CCK injection order. During each reexposure session, the animals were placed in the feeding cages for 15 min and were allowed either to consume 30 pellets or to drink maltodextrin solution for the final 5 min of the reexposure session as appropriate. On the day after the final reexposure session, performance on the lever-press and chain-pull actions was assessed in a single 15-min choice test in the absence of the outcomes. Half the animals in each action-outcome and the drug-outcome counterbalancing conditions were tested under CCK (Group CCK). The remaining animals were tested under VEH (Group VEH). The appropriate injections were given 20 min before testing. An a level of .05 was used to assess the statistical significance of the data analyses in each of the experiments.

Results and Discussion The results from the extinction test of Experiment 1 are presented in Figure 2. This figure shows the performance of each action in the test, divided into 3-min bins, for each group. The results from animals in Group CCK are presented in the right panel, and those from Group VEH are presented in the left panel. In general, inspection of Figure 2 suggests that animals performed fewer of that action that, in training, had delivered the outcome subsequently reexposed under CCK than of the other action. This effect was evident whether the animals were tested under VEH or under CCK. To assess this description of the data, we conducted a two-way mixed analysis of variance (ANOVA) using a betweensubjects factor of test, separating animals tested under CCK from those tested under VEH, and a within-subjects factor of reexposure, which distinguished the performance of the action whose training outcome was reexposed under CCK from that GROUP VEH

GROUP CCK