Operant conditioning
|
Operant conditioning, so named by psychologist B. F. Skinner, is the modification of behavior (the actions of animals) brought about by the consequences that follow upon the occurrence of the behavior. In simple terms, behavior operates on the environment producing various effects. The phrase operant conditioning draws out a crucial distinction from Pavlovian conditioning, which Skinner termed respondent - namely that respondent conditioning, like the dog's salivation or the knee-jerk, has neither much effect on the environment, nor is its occurrence changed by its effectiveness or ineffectiveness in the environment. These two types of conditioning are also distinguished because they are conceptually different, as their names imply – operant conditioning is explained by its consequences (that is, functionally) while respondent conditioning is explained by its antecedents (that is, causally).
This distinction opens up a much-missed parallel with involuntary behavior or reflexes and voluntary behavior or acts. The former occur essentially no matter what given some stimulus and have nothing to ensure that they act on the rest of the world, while the latter are affected by how well or poorly they work and hence are much more likely to do work for the animal in the world.
Operant conditioning, sometimes called instrumental conditioning or instrumental learning, was first extensively studied by Edward L. Thorndike (1874-1949). Thorndike's most famous work investigated the behavior of cats trying to escape from various home-made puzzle boxes. When first constrained in the boxes the cats took a long time to escape from each. With experience however, ineffective responses occurred less frequently and successful responses occurred more quickly enabling the cats to escape in less and less time over successive trials. In his Law of Effect, Thorndike theorized that successful responses, those producing satisfying consequences were "stamped in" by the experience and thus occurred more frequently. Unsuccessful responses, those producing annoying consequences, were stamped out and subsequently occurred less frequently. In short, some consequences strengthened behavior and some consequences weakened behavior. This effect was (and sometimes still is) described as involving a strengthening of the association between the response and its effect, suggesting some kind of parallel to Pavlovian conditioning.
The same idea behind the Law of Effect is described in Skinner's terms by the notion of reinforcers. Reinforcers are those events that strengthen a response, i.e., whose rate controls the rate of that response. This neatly sidestepped Thorndike's satisfaction, resulting in a term which was less theoretical and more simply descriptive: any event whose presences and absences control how often a response occurs is by definition a reinforcer for that response. The problem became not what 'satisfying' meant, but the better-defined question of which events would reinforce which responses of which animals under which conditions. Skinner also innovated in making new definitions of stimulus and response which were similarly to be adapted to the behavior actually observed. To Skinner, the discriminative stimulus (SD) was not a single physically defined kind of event, but an entire class of events (possibly quite physically different) which elicited the same response. (In contrast with the reflex notion of stimulus, a discriminative stimulus was held to increase the probability of response.) Skinner's notion of the operant-conditioning response, called an operant, was similarly distinct from the physiologically defined reflex and classically conditioned responses, being a class of responses which shared a consequence - e.g., depressing a lever, which is commonly done by rats in several distinct but functionally equivalent ways. The relation between the discriminative stimulus, the operant response, and the reinforcer has often been called the three-term contingency - under these (functional) conditions, this (functional) response will yield this reinforcer.
The two kinds of reinforcement include positive reinforcement and negative reinforcement. Positive reinforcement occurs when a behavior (response) is followed by a pleasant stimulus that rewards it. In the Skinner box experiment, positive reinforcement is the rat pressing a lever and receiving a food reward. Negative reinforcement occurs when a behavior (response) is followed by an unpleasant stimulus being removed. In the Skinner box experiment, negative reinforcement is a loud noise continuously sounding inside the rat's cage until it presses the lever, when the noise ceases. In both kinds of reinforcement, the response or behavior is increased.
According to Skinner's theory of operant conditioning, there are two methods of decreasing a behavior or response. These can be by punishment or extinction. Punishment occurs when a behavior (response) is followed by the addition of an unpleasant stimulus or the removal of a pleasant stimulus. In the Skinner box experiment, this is the rat pushing the lever and receiving a painful electric shock directly afterward. Extinction occurs when a behavior (response) that had previously been followed by a pleasant stimulus is followed by no stimulus at all. In the Skinner box experiment, this is the rat pushing the lever and being rewarded with a food pellet several times, and then pushing the lever again and never receiving a food pellet again. Eventually the rat would learn that no food would come, and would cease pushing the lever.
Both punishment and extinction serve to decrease behaviors, although Skinner stressed that extinction was the more powerful of the two. Often there are other factors involved in real life situations that cannot simply be eliminated, and punishments are not a great enough deterrent to prevent particular responses, as there are still rewards associated with those said behaviors. According to Skinner, only by completely eliminating the rewards (positive reinforcements) that follow particular behaviors will people (or animals) be sufficiently discouraged from repeating those behaviors. For example, this is one reason why many convicted felons are repeat felons--prison sentences are forms of punishment, but sometimes the punishment is not enough. The felons will go back to the crimes previously committed because they receive the same rewards they previously received for committing those crimes, and that behavior-reward connection is a greater motivator than the punishment-behavior deterrent connection.
References
- Skinner, B. F. (1938). The behavior of organisms: An experimental analysis. Acton, MA: Copley.
- Skinner, B. F. (1953). Science and human behavior. New York. Macmillan.
- Skinner, B. F. (1957). Verbal behavior. Englewood Cliffs, NJ: Prentice Hall.
- Thorndike, E. L. (1901). Animal intelligence: An experimental study of the associative processes in animals. Psychological Review Monograph Supplement, 2, 1-109.
External Links
- Journal of the Experimental Analysis of Behavior (http://www.envmed.rochester.edu/wwwrap/behavior/jeab/jeabhome.htm)
- Journal of Applied Behavior Analysis (http://www.envmed.rochester.edu/wwwrap/behavior/jaba/jabahome.htm)de:Konditionierung