This is actually sort of an addendum to Part 4 of this series, some thoughts I had quite a bit after writing the original series.
Random Reinforcement is Not Reinforcing Randomly
This is just one of those little pedantic notes both to illustrate a different concept and to make sure that I didn’t give people the wrong idea about what I meant by writing confusingly. In Because We Let Them Part 1, I talked about how, once a behavior is established, moving to a random reinforcement schedule tends to reinforce it further. Basically, you end up teaching that it’s worth doing the behavior ‘just in case’ a reward is coming.
And, again, you do this only after reinforcing a behavior (such as “sit” with a dog or whatever you’re trying to get a human to do) consistently enough for it to become a normal behavior (so you reinforce the absolute hell out of it initially).
Then instead of rewarding every time they do it right, you start reinforcing on some random schedule. But the key is that you’re reinforcing a specific behavior that you want to increase the frequency of on a random schedule. What you’re not doing is reinforcing at random for non-specific behaviors. And this distinction will make sense below.
So here’s an example, as I mentioned I taught Alfie Red Light to mean “sit and stay until I give you the command to move” (which is green light). He already knew sit, mind you, so it was easy to change this into red light (I’ll explain at a later date how to get a dog to go from sitting briefly to sitting and staying). And initially I’d give him a high pitched “yes” and a treat every time he sat on the “Red light” command. Every time.
And once it was a well established behavior, I moved to a random reinforcement schedule to reinforce it further. Except that I’m too anal compulsive to do truly random. I do a structured random schedule for as little sense as that makes. That might mean initially treating him every other time he did it. If we were practicing, I’d always treat the first one (make sure to get it reinforced) and get him to do it again but not treat it (he always gets the “Yes” mark, something else I’ll explain later).
And he’d sort of have this “WTF, where’s my treat man?” look on his face the first time he did it and didn’t get treated. Because up until that point he was used to getting a treat every time. So I’d do a third one and treat him to reassure him that treats were still forthcoming, at least sometimes. And not treat the fourth. I think you get the idea. He was getting enough rewards to do it “just in case” but just not every time.
And the next time we worked on it I might treat every third time I asked for “Red light” or whatever. I always treat the first and last (start and end on a high point) but in the middle it just depends. And now I’ve got him on a true random schedule for the really well established behaviors. New things I’m teaching him get reinforced more for what should be obvious reasons and I gradually phase out the treat over time.
Realistically he might get treated once every 10 times when we are out walking for extremely well established behaviors (I “red light” him at EVERY intersection so that I can stop him from running into traffic if he ever gets off leash) but he might still get a treat every time for new things and every other time for stuff in the middle. Enough so that he doesn’t lose interest in it totally but not so often as to use up all my treats.
So that’s point one, just explaining a random reinforcement schedule a bit more. Now let’s look at how this is different than randomly reinforcing behaviors which sounds the same but means something completely different.
This was something BF Skinner (the “father” of behaviorism) experimented with with his pigeons. Rather than rewarding (or not) them for individual behaviors (they would, for example, have to peck a bar to get a treat and they’d start pecking the bar to get a treat over time) he set his machine to reward them totally at random. A treat would just drop out of the sky whenever. And he let it run to see what would develop.
And what he came back to were a bunch of pigeons acting crazy (even by pigeon standards). One would be balancing on one leg, another would be spinning in circles, a third swaying back and forth. What had happened is that with rewards coming at random, the birds had no clue what they were being rewarded for.
So invariably they just assumed that the “last thing they had been doing” before the reward must have been what got them the reward. So if one was standing on one leg when the treat came, they learned that “standing on one leg” got me the treat. If they were spinning around then clearly that got them reward. Better spin some more.
I’d note that the same would hold for random punishments. If you punish something at random, it never learns what it’s being punished for. It just learns that any time, out of the blue, punishment might occur. This not only causes massive stress (for example, people in cities being bombed nightly are actually LESS stressed than those bombed at random; the ones being consistently bombed know it’s coming every night) but generates crazy assed behaviors. Which is great if you want to generate crazy-assed behaviors. It’s not so good for actual behavioral change.
Anyhow, why is this distinction between random reinforcement schedules and randomly reinforcing so important? First it goes to my point about not rewarding dogs (or humans) for completely random things. When you do that all it teaches them is that rewards come for no particular reason.
And no I’m not saying you can’t do nice stuff for people out of the blue. That’s part of being a decent human. I’m talking about behavior change stuff.
Remember NILIF? Nothing in life is free. Do something to get something. Give them something for nothing and they either learn that they have to do nothing and they still get rewarded (brat syndrome), or they don’t know what you expect of them (and they don’t know how to change their behavior). To actively change behavior X, you have to actively reward or punish behavior X or there is no connection between the reward (or punishment) and the behavior.
There is another amusing consequence that comes out of this: sports superstition. Consider a baseball player in the worst slump of his life, or an athlete trying to break through in competition. Fed up with what they have been doing, they try something totally random. Or something random just happens for no particular reason because that’s what random means.
They wash their socks, or don’t wash their socks, or grew a beard or shave it off (or in the case of Derek Parra, ate a sleeve of Fig Newtons the night before) and then play the game/have the competition of their life. Boom, a superstition is developed in a single go. Let’s look at why.
First and foremost, consider the reward. For many athletes, fixing a slump or breaking through to the next level of performance is about the most rewarding thing that could exist in their life.
When that happens, something major is getting reinforced. It’s not like giving a dog a tasty treat, nice but nothing life changing. For an athlete, this is a damn near life changing event. The reward was huge.
For giggles, consider how this might apply to the development of sexual fetishes. How the environment that the person was in when they had their first toe curling orgasm programs them to get off on that for the rest of their life. A single life altering experience can embed a behavior permanently (which is really more about imprinting but I’m not getting into that).
But what got reinforced in this case? Usually it was whatever random-ass (but irrelevant) thing that happened. It’s just an extreme example of a random reinforcement occurring that generates an absolutely massive reward. And that superstition will be locked in for pretty much as long as the athlete competes. Because even though conscious smart humans should know better, we are responsive to this stuff.
Clearly wearing different socks didn’t cause the player to play better, neither did dirty socks of shaving or not shaving or Fig Newtons. Maybe it was just their time to break through, maybe they finally adapted to their training, maybe it was just those random statistical anomalies that explain both hot and cold streaks but don’t mean anything real. It doesn’t matter to the athlete.
Because on some level, the athlete figures it might have played a role and he’ll hedge his bets from then on never playing without his lucky socks or whatever. This is especially the case if the random behavior has no real consequences and can’t “hurt” to engage in. Athletes won’t usually repeat behaviors that do them harm even if might link up with the improved performance in some vague way.
And while silly as hell, this really isn’t any big deal in the big scheme of things with one major caveat: the athlete who is just convinced that the socks/no socks or beard/no beard or Fig Newtons are key to their performance will invariably perform like absolute crap if their mojo isn’t present.
Forget to take the magic socks on the road trip and the baseball player doesn’t get a hit; no time to grow a beard, the athlete performs for crap. Derek Parra used to pack boxes of Fig Newtons for big international travel trips to make sure he had them for every race. Imagine what would have happened if he forgot or customs wouldn’t let them through.
When you’ve learned that X is required for Y, Y won’t happen if X isn’t present (again, think about this in the context of true sexual fetishes where a person can’t perform at all without the fetish present).
Mojo and belief is a powerful thing, you mess with it at your own risk. It’s why I always face my weight plates in. But it’s just a behavior that got randomly reinforced in behaviorist terms. Which isn’t the same as a random reinforcement schedule.
And that brings me back to another quick comment on timing.
Timing of Rewards and Punishment
I mentioned in Because We Let Them Part 2 that, for dogs at least, reward or punishment has to come within about 1 second for it to do any good. Dogs have no real concept of long-term and the only way to have any impact on their behavior is to modify it in some way almost immediately.
If more than 1 second go by, and you punish or reward them it’s just random noise, they have no idea what they are getting rewarded or punished for. For creatures without complex thinking skills, all they can relate is basically the last thing they did and the reward or punishment. Anything else doesn’t come into play.
Humans, at least, are marginally more complex. For example, consider someone who goes out for dinner and then to the opera on a Friday and wakes up Saturday sick as a dog. Strict behaviorism would predict that they’d link the opera with being sick since it was the close event temporally. But almost nobody would do that (unless the opera really sucked I guess).
Rather, we’d assume food poisoning and link it with the food we ate. Because we can logically look at it and see that while it’s logical for something we ate to make us sick, going to the opera is unlikely to do so (again, unless you really hate opera).
Note: as pointed out in detail by a comment below the above is not a perfect example. Read Auggie’s comment below for more of an explanation. I was mainly trying to make a point about humans being able to separate two temporally unrelated actions on top of being able to conceptualize two things that might go together versus two that don’t.
Which is why you can technically explain to a human that “What you did last week was good or bad and you’re getting rewarded/punished for it now” and, in premise anyhow, they can link the two. They can process complex thoughts in a way animals can’t. Unlike the dog, the reward or punishment and the action being rewarded or punished don’t have to be immediately temporally related in practice.
But in principle, it’s probably still better to immediately reward or punish (and I’m using the terms here generally, read the main series for the details) under most circumstances if it’s possible. Giving an athlete a “good job” or pat on the back (whatever they respond to in terms of reward) or whatever right after the effort is likely to have the bigger impact in terms of reinforcement (or extinguishment) or a given behavior than waiting. This is especially true for interpersonal relationships, especially when something is doing something you don’t like.
If you let it happen and then wait a week to bring it up, the lesson you’re really teaching is ‘There are no immediate consequences’. Same as if you let it happen multiple times before addressing it. The lesson you’re sending “This is unacceptable behavior and it will stop” isn’t the lesson you’re really sending which is ‘This is behavior that I’m going to let you get away with 2-3 times or wait a week to bring up’. If possible, reinforce or punish through whatever means as closely to the behavior you want to modify as possible.
A Final Option When You Give the Ultimatum
The punchline of the over-written Because We Let Them series is that, as wonderful as it is to think that you can always modify behaviors to your liking, sometimes (and this is always in the case of someone doing something you dislike) it can’t be changed.
And it reaches a point where you have to sit the person down and give them the ultimatum “This changes or these are the consequences.” And I gave like 3 potential outcomes of that conversation although, really there are only two (either they change or they don’t, and if they don’t you enforce your consequences or you don’t).
Well, I left one out. And while it doesn’t change anything, my own obsessive tendencies make it important for me to spell this one out. The three I covered were:
- The person goes “You’re right, it won’t happen again” and the behavior stops.
- The person goes “I’ll try” and gives it 2-3 weeks of effort before reverting to their old behavior.
- The person goes “No deal, I’m not changing”.
And 2 and 3 are effectively the same end result: the behavior didn’t change and you enforce your consequences. Or you don’t and then you have no right to complain because you’re letting it happen. But again, there’s a fourth option I forgot to write about which is this: the person engages in the exact behavior that you told them bugs you EVEN MORE.
I’d describe this as the human behavioral equivalent of the oppositional reflex. Dogs pull against a tight leash because they like to. They pull against us because we’re pulling against them. And some humans, in response to being told “I don’t like it when you do X” will do X even more just to be a pain in the ass. When you push, they push back even harder just to prove a point/show who’s really in charge (hence the comment about “The one willing to walk away is the one in control.”)
Basically, the more you bug them about not doing something the more likely they are to do it or vice versa. Parents learn this the hard way, the boy that they FORBID their daughter from seeing is the guy she pursues more. When you tell a teenager NOT to do something, they are going to do it just to piss you off. Smart parents learn to use subtle reverse psychology on their kids.
Got a teenaged daughter dating jackasses because she knows you dislike it and she wants to piss you off? Tell the girl that you just love her new boyfriend, that you hope they get married and give you lots of grandchildren. She’ll drop him in a heartbeat because you’ve sidestepped her primary motivation in dating the guy which is to piss you off/get back at you.
She doesn’t want to date a guy you like any more than she wants to listen to music that you like. So tell her you just love her new Emo bullshit album and watch her trash it. Tell your wannabe headbanger kid that you love the new Slayer album. He’ll be listening to boy bands in no time. Maybe. Or you’ll be taking them to the concert.
And people do it in other situations. A coach with an athlete that is a prima-donna who can’t make practice on time will get taken to the carpet and told “Show up on time or else.” And the athlete will deliberately start showing up even later to try and show that they are the ones in control/just to piss the coach off.
And it works for athletes who have coaches who won’t bench them/kick them off the team/etc. That is, when they know that they can’t or won’t be punished for their opposition, they will do what the coach tells them not to just to make a point.
But ultimately this is really the same as #2 and #3 above: whether they were unable to change the behavior, said they were unwilling to change the behavior or went the full blown oppositional route and start doing it more just to piss you off the end result is the same: the behavior didn’t change and you enforce your consequences. I’m only writing this up because it’s something that happens and I’m obsessive compulsive like that.
And if you don’t enforce them, not only do you not get to complain, you’ve just made the situation worse by bluffing. The behavior will continue or even get worse for the same reason: Because you let it happen.
- People Do It Because We Let Them: Part 2
- People Do It Because We Let Them
- People Do It Because We Let Them: Part 3
- People Do It Because We Let Them: Part 4
- What Can We Learn About Behavior Change from Training Dogs?