Re: Misc comments on categorisation, problem solving, etc (Was Re: Aaron Somon's....)

From: Allan C Cybulskie (allan.c.cybulskie_at_yahoo.ca)
Date: 08/07/04


Date: Sat, 7 Aug 2004 12:03:24 -0400


"Wolf Kirchmeir" <wwolfkir@sympatico.ca> wrote in message
news:po7Qc.33622$Vm1.688467@news20.bellglobal.com...
> Neil W Rickert wrote:
>
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Wolf Kirchmeir <wwolfkir@sympatico.ca> writes:
> >

> > I guess that makes me a behaviorist of sorts. But I am not a
> > radically stupid behaviorist. As far as I can tell, the "operant
> > conditioning" account cannot adequately explain the acquired ability
> > to solve original problems.
>
> That's because
> a) you persist in thinking of behaviour as wholly external, and ignore
> private behaviors (yet you use signs of private behaviours as criteria
> for "understanding"!);

If you elevate private behaviour to a high role in such behaviours, then you
are not contradicting mentalists, folk psychologists, and cognitivists at
all, since that's the only premise they really disagree with behaviourists
on, other than terminology.

Insist that that internal speech and private behaviour is important and
different in critical ways, and you remove the major conceptual difference
between behaviourism (classically) and everything else.

> b) you fail to keep in mind that only behaviours that are performed can
> be conditioned;
> c) you forget the reward system that operates on "thinking." (Recall
> Archimedes' "Eureka!")

Thinking does not necessarily require a reward system. I mean, how many
people would do a scientific investigation for the feeling of satisfaction
at the end of it? And doesn't the feeling of satisfaction only come about
as a result OF the achievement of the goal as determined by the agent? It
sounds like when you talk about the reward you make it a critical part OF
the learning, instead of what it really is which is merely the result of the
learning itself. You can learn, as I have said, without any feeling reward
at all, or any aim for that.

> d) you forget about the experimentation that occurs during "solving an
> original problem", and don't seem to see that we pursue those
> experiments that are "rewarding" (to use a phrase I've encountered more
> than once in books about the pleasures of math.) **

But the reward is based entirely on the goal of the agent, and the
experimentation is not guided by the feeling, but the feeling is caused by
outcomes that seem to further the goal.

>
> I referred to poetry writing programs in another post: such programs use
> a mix of rules and random selection to prduce phrases. Some of these are
> "striking" and "intersting" metaphors. Poets have reported that they
> experiment withn variations on their pharses that "come to mind", or
> construct such variants according to some rules. Manuscript evdince
> supports their reports, by showing how many variants were tried before
> the darft was finalised. Eventually, the poet _selecst_ the variants
> that "sound right," ie, the ones that provide the reward they are
> seeking. If the poets gets poditive feedback from an audience, (s)he'll
> repeat that behaviour of "composing a poem." The whole process
> illustrates operant conditioning, IMO.

But it does not in a very important way: "positive" feedback is only useful
in shaping behaviour if that feedback is in accordance with the goals of the
agent. To take your own example, if you assumed that positive feedback in
this case was applause or appreciation, then it may not actually get them to
repeat the behaviour. For example, the poet might have been challenged to
prove that his audience didn't just approve any poem he made, not matter how
bad it was, and might have actually tried to write a poor poem. If the
audience approved of it, not only would the poet have not had the reward you
assume as part of the process, but he wouldn't have repeated the actions
taking in composing a poem that you assert he might have.

The more and more I read about this whole behaviourism debate, the more I
realize that the key to this entire picture for intelligent behaviour is the
goals and desires of the agent, and the more I realize that you and Glen --
perhaps unintentionally -- seem to minimize that role which leads your views
to not ring totally true.

Intelligent beings set, for the most part, their own rewards and
punishments. There are few, if any, biological or social constraints that
cannot be turned by the views of the agent against what they're intention
was. Pain can be considered a reward for some people. Hunger can be a
reward for people who are on a diet, and encourage the behaviour even though
it's supposed to discourage it. Pleasure can be a negative reinforcement
for someone trying to avoid it. And so on, and so forth.

In order to interpret the behaviour of the agent, and affect it, you have to
understand their goals and desires, or else they'll turn your attempts at
conditioning completely around on you. In short, an agent will only react
to the rewards and punishments offered IF THEY WANT TO, at least for
behaviours that should be intelligent. Intelligent conditioning is
conditioning in sync with the desires of the agent; brainwashing is attempts
to condition without that synchronicity. And simple behaviourist techniques
don't necessarily promote a sync between desires and outcomes.

>
> ** I personally like thinking about the simpler math problems)I'm not a
> trained mathematician.) The pleasure I feel when arriving at some
> theorem is reinforcer for this behaviour. The pleasure I feel when I
> confirm that my solution is correct is a further reinforcer. You see, my
> "private behaviours" are udner the same constraints and shaping
> influence's as my public ones.

But it seems to me that here your private behaviour and goals ARE the
constraints and shaping influence. If you didn't WANT to get right answers,
then solving them and being right wouldn't be any sort of reinforcers.

>
> Or consider a child that's "frustrated" with a math problem. Frustration
> is a powerful conditioner - allow the frustration to continue or repeat,
> and the child may well be turned off math altogether. So what the good
> teacher do? Provide some reinforcemnent for the problem solving
> behavior, or reconstruct the problem into smaller chunks, so that the
> child will continue to "try" -- until the problem is solved. At which
> point the child feels more or less satisfaction, which tends to
> condition the child to continue to work on math.

But, again, the frustration is only caused by the goals and wants of the
agent themselves. Some people really like doing complicated math problems,
and a stereotypical joke are about the people who get all ramped up and
frustrated over a problem, and then express pleasure or happiness because
that's precisely what they want, since it indicates a challenge, which is
what they want. For the child, there are USUALLY two components to the
frustration: 1) the feeling that they'll never solve the problem and 2) the
feeling that they could be doing something much more fun than try to solve
the math problem. So breaking it into smaller chunks lets them feel that
the problem is solvable and gives them an idea that they are actially making
progress on the problem. Promising a reward for them solving the problem
can also help -- not because it is a behaviourist strategy -- but because it
may make the process of doing it worthwhile.

As a moderate procastinator, I can tell you that I have a tendency to not be
able to complete tasks very well if I think that I could be doing more
useful things [grin].



Relevant Pages

  • Re: Ben G on reinforcement-learning and the wirehead problem
    ... system with a good match between reward and apparent real world goals ... between reward and result occur. ... actions the agent takes better than your characterisation of its ... hardware learning system underneath all that behavior. ...
    (comp.ai.philosophy)
  • Re: Ben G on reinforcement-learning and the wirehead problem
    ... What you are thinking of as "his goals" are in fact all ... the agent will just hire another agent to do the modification for ... know of (direct stimulation of their reward centers). ... systems that learn using one or more scalar reward signals. ...
    (comp.ai.philosophy)
  • Re: Ben G on reinforcement-learning and the wirehead problem
    ... maximizing reward based on it's innate hardware. ... about their goals is another. ... the learning agent. ... actually must more of the same - learned behaviors. ...
    (comp.ai.philosophy)
  • Re: Misc comments on categorisation, problem solving, etc (Was Re: Aaron Somons....)
    ... Thinking does not necessarily require a reward system. ... what makes a reinforcer a reinforcer is the way it feels. ... as a result OF the achievement of the goal as determined by the agent? ... in shaping behaviour if that feedback is in accordance with the goals of the ...
    (sci.cognitive)
  • Re: Temporal Learning
    ... >> learned without the use of external reinforcement. ... There is no mystery about goals as far as I'm concerned. ... there is no need to use reward and punishment to train a truly ...
    (comp.ai.philosophy)