
In the hands of masters of the art, dog training
through the use of the principles of operant conditioning looks gloriously
effective and deceptively easy.
So how come all of our dogs aren’t getting high scores in competitions? Why
do some dogs that are trained with nothing, but positive motivation and
positive reinforcement refuse to obey their handler at a trial? How come
some dogs trained using only positive motivation obedience classes, would
come when called, or down stay reliably and other don’t?
We are firm believers in the powers of positive thinking. But like any
training tool, misuse can cause more problems than you originally started
with, and you can find very quickly find yourself in a very large hole. We’d
like to present some theories on the nature of positive failures, and
suggest some things that might help you out of that hole.
Let’s start with human factors. Many people’s training attitudes become
factors in their dog’s training. People become devotees of the positive
approach after surviving traumas associated with poorly-used or abusive
negative techniques. They vow never to hurt their dogs again in the name of
the sport of obedience, and by God, they will stick to the rules of operant
conditioning, and simply ignore unwanted behavior, and wait to reward
correct behavior. This militant attitude puts a lot of limitations on the
trainer’s options for methods of communication, while offering no limits to
the dog. This is not as good for the dog as you might think, especially if
the trainer actually has a very specific behavior in mind at the time of the
training session, and the dog has no clue what that behavior might be. The
trainer may have to wait for a long time for the correct behavior to be
offered, and the dog may get frustrated and quit before the happy accident
occurs. Then you have both dog and trainer immobilized. This part of the
syndrome can be worse with a previously trained dog than it is with a new
puppy who is more than thrilled to offer random behaviors at high speed.
With no negative for the unwanted behavior you may inadvertently be shaping
an unwanted behavior.
Our suggestion is to take the word “rules” out of your association with
operant conditioning techniques. Operant conditioning is a broad science
which is based on some generalizations about behavior patterns that have
been observed. For example, “behaviors that a subject finds rewarding will
tend to be repeated” (our italics). It’s an informative generalization, but
it is not a rule. You can tell it is not a rule by the number of dogs that
have been repeatedly rewarded for a behavior, but that don’t repeat the
behavior. Ah, but there’s a reason for that, found in another part of
operant conditioning theory, that is a rule, but is usually perceived as a
definition. This is the part that says, “A positive reinforcer is something
which the subject finds rewarding enough to cause the behavior associated
with it to tend be repeated. “The unspoken part of this little rule is that
the hot dog that your subject finds rewarding enough to cause repetition of
behavior in your back yard does not compare with the rewards of gopher
hunting in the grass at the park. This is where your fanatic devotion to
purely positive reinforcement will be severely challenged, because you will
have to either make yourself and the behavior you desire the dog to perform,
be associated with something more rewarding than gopher hunting; or you will
have to find a way to make gopher hunting less rewarding than the reward
that you are prepared to produce. Now you could go the distance and put a
gopher down the front of your shirt so that your dog will now associate you
with hunting gophers. Or you could consider applying enough negative
reinforcers and/or punishment to lessen the attractiveness of gopher hunting
for the dog, and make heeling seem like a great deal of fun by comparison.
Ideally, this would be a really large nasty gopher that would bite your dog
on the nose (a punishment), and convince your dog to forego gopher hunting
for the rest of its life, Then you would be able to reward your dog for
avoiding gophers (positive reinforcement). In the absence of large
aggressive gophers, you may to resort to a collar pop to interrupt the
gopher hunt as the most practical action. If the collar pop stops the gopher
hunt and makes the dog decide to heel instead, it is a negative reinforcer.
Accept that there are a lot of gray areas surrounding the differences
between “punishment” and “negative reinforcement”. You need to understand
the definitions of these words as they are used in the context of the
operant conditioning theories. Basically, punishment is understood to be a
consequence to a behavior, which the subject finds so annoying, distasteful,
painful, or otherwise de-motivating that the behavior associated with the
consequence tends not to occur again. On the other hand, a negative
reinforcer is a stimulus which the subject also finds annoying, distasteful,
painful, or otherwise undesirable, and tends to make a subject repeat a
behavior because it knows that the stimulus can be stopped by performing
this behavior. Punishment is a constant consequence of a behavior; a
negative reinforcer is something which the subject has the power to prevent
or stop. The prevention or cessation of the reinforcer is the reward.
The general definitions are pretty simple, but when you start thinking of
specific dog trainers’ behaviors, it’s not possible to make two distinct
lists of actions that would be considered punishment, and therefore to be
avoided at all costs, versus negative reinforcers which could be useful to
enhance the dogs perception of the reading aspects of a particular behavior.
In the gopher hunting scenario, a pop on the collar could be considered a
punishment for the behavior of heeling, or doing a recall, or whatever
behavior the dog should have been doing instead of gopher hunting. On the
next try the dog had the power to avoid the collar pop by choosing to
perform the desired exercise.
Now the theory that “purely positive” followers would probably put forth is
that you could make the heeling or recall more fun. Do something unexpected
just prior to the gopher hunt to motivate your dog to make the right choice.
We agree that this approach has merit; but it requires a discerning gifted
trainer that can recognize the “just prior” factor. If you wait until the
dog is in the hunting mode, and then you perform a motivational song and
dance, you may find your dog hunting phantom gophers when he’s bored just to
get you to do that cute thing again, We have actually seen dogs learn to
look away from their owners in order to get the trainers to bring the food
down to the dog’s noses, or to break away and start a game more exciting
than sitting and watching the trainers. Even collar pops can become signals
to a dog that a game is about to start, which causes some dogs to repeat
behaviors that elicit collar pops from their trainers, immediately followed
by a game and praise. If you’re too fast with the praise, the dog cannot see
clear contrast between the negativity he should strive to avoid vs. the
positive reinforcement he should be getting for offering and maintaining a
behavior.
This bit of confusion is why many trainers will limit themselves to praising
and rewarding good behaviors, Rather than misuse negative stimuli, the
trainer will choose to avoid it at all costs. But this cuts the amount of
information you can give to your dog in half. Ideally there is always a
positive and negative feedback loop in learning behaviors. We believe that
if you want to avoid physical aversives, you must find another way to inform
your dog that it is doing something wrong. The more advanced the training,
the more important this conditioned negative reinforcer is in helping your
dog to avoid tremendous amounts of mental stress.
“No,” you say, “I will do nothing negative. I will only praise and reward
good behavior until it becomes habitual.” But here’s the riddle that gets us
all: When grasshopper is praise or reward an aversive stimulus? The answer:
When the praise is withdrawn.
A guide dog for the blind would not cross a street if a car is passing by, a
police dog is re-called in the middle of a pursuit if the criminal gives up,
and a bomb detection dog is to sit quietly to indicate a find. The
similarity in training these dogs is that they must be reliable if possible
100%. If you have visited Marine World (all positive training) and watched a
dolphin show you might witness an announcement like this one: “the dolphins
do not want to perform right now, please come back for our next show in 1
hour”. We believe that positive reinforcement, positive punishment, negative
reinforcement, and negative punishment are all useful training tools to
achieve a reliable well trained dog.
We have classes in which we teach the dogs how to perform tricks; these
classes are based on positive reinforcement only since after all it is not
that important if the dog does not perform a trick at that instance. In our
obedience classes the goal is to teach you how achieve reliability
equivalent to the guide dog, the police dog or any service dog. When you say
“come” your dog comes, when you say “stay” your dog stays, no matter what.
In our classes we use our experience in training service dogs, police dogs,
and dogs that have won world championship competitions. Obedience training
is not about learning tricks, obedience training allows the owner and dog to
communicate and understand one another.
by Ivan Balabanov and Carrie Silva
The Doghouse, LLC