Re: An Intelligent Toy

From: Curt Welch (curt_at_kcwc.com)
Date: 06/14/04


Date: 14 Jun 2004 23:24:56 GMT

lesterDELzick@worldnet.att.net (Lester Zick) wrote:

> I appreciate the compliment, Curt, but I think you're missing the
> point of the exercise. The toy doesn't know what it should be doing.
> It doesn't know what it means to be intelligent; it doesn't know about
> behavior, risks, or rewards. It doesn't know anything. Given the data
> and at least a compare or xor instruction, it operates on the data. So
> what does it do?

So, you want me to explain how to build AI?

Or do you want me to give you insight into how to approach the problem?

The machine doesn't "know" anything. It's just a machine. But I believe
that's what we are as well. Our neurons and the rest of the stuff that
make us up does not "know" anything. It's just mater interacting with
other mater.

When I talk about how a computer works, I talk about what it does. I talk
about what it does externally, and I can talk about what each of the parts
do, and I can talk about how they work together to create the system-wide
effects (behavior) that it does.

I can talk about all these things for what I believe it takes to make an
intelligent macine (toy or otherwise). Do you want me to do that some more
or not?

Because you end your words above with "so what does it do", it makes me
wonder if you have are confusing the machine with the the mind in your
reference to "it" and in your reference to "do". And because of how you
use the words "it" and "do" you don't understand that I have already
answered your question. And if I were to talk more about what the machine
does and what all it's parts do, you would still not understand that I had
answered your question. But maybe that's not why you are asking again.

I of course do not have human level intelligent machines working. So, I
clearly have some important details of what it does missing from the
picture. But, I can still talk for days about the details which I belive
are not missing missing from the picture.

Because you talk about data and an xor or compare instruction, maybe you
want me to talk about what actually happens at the low level. I do more of
that now in case that is what you were getting at.

Remember that I believe the answer to AI comes from building a behavior
learning machine so this is what I am describing.

First, the machine has to have sensory inputs and effector outputs. And it
needs hardware to processes the data in the middle. You need more than
just an xor gate however because you need enough hardware to create
persistant internal state (i.e., memory). And you need hardware that can
mediate conflict. You can't do either of those with just an xor gate. If
you selected an NAND gate instead of an xor, then you could create memory
and all logical operations but you would would still not be able to mediate
conflict (keeping independent signals from fighting each other for
control). This is done on a digital computer with the clock circuits and
other special hardware that force the operation of all the independent
logic gates to take turns and "play nice" together. It's what allows the
machine to follow a well defined sequence of events.

In adition, you need the hardware built in a fashion that it can change
it's behavior in response to input data. That is, you need a compare
instruction on the computer.

In addition to the inputs, the outputs, the persistend memory, the machine
needs a concept of time if it wants to be a real-time toy (which is the
real interesting form of inteligent machine in my mind). A Turning machine
has no concept of time (or sensory connections to the external world). It
only has a concept of sequence of instructions. Real computers however can
deal with time. The issue is that if you receive two inputs, like a user
pressing keys on the keyborad, the intelligent machine needs to know how
much time has passed beteen these two events in terms of wether it's more
time, or less time, than the amount of time between to other past events.

If you connect the intelligent toy to a virual world, then time can be
completly virutal as well. But if you want the intelligent toy to interact
with the real world, then it needs access to some hardware which it can use
to time events.

But, once you have all that, you bascially have a machine of the same power
as our computers and robots and I can now move on to talking about the AI
algortihm.

When you look at the problem from this low level, you find yourself asking
the question you asked above. What does it "do"? Well, the machine needs
to create outputs, so that's what it does. But why does it choose to
create one output instead of another? Why is one output "right" when
another is "wrong"?

This problem is answered by adding the reward input to the learning
algorithm. To the learning algorithm, the reward input is what gives the
machine purpose. It is what motivates the machine to produce one output
instead of some other. It is also the seed from which all the machines
morals will grow. It is the only thing of "value" to the machine. The
purpose of the machine is to create as much value as possible - to create
as much rewards as possible. It is allowed to produce any sequence of
outputs it wants, but it's goal, and purpose, is to always produce the
sequence of outputs which produces the most reward at any point in time.

It does this by using statistics to analize all it's actions and to
correlate the amount of reward received with the actions that preceeded the
reward, in an attempt to assign a value to all possible actions. The
machine then, learns, over time, which actions produce the most value, and
will then, choose the actions which produce the most value, more often,
than the actions which produce less value.

However, since the environment is always too complex to fully predict, and
can be expected to be dynamic (something that worked in the past to produce
value may stop working in the future), and because the machine is learning,
and acting, at the same time, it should never choose only the action with
the highest value. One of the other actions may, over time, prove to be
better than the current high-value action, and if the machine never tries
it, it will never learn the true value of the other actions. So, there
must be some mapping from value, to chose of action, which favoirs the
actions with the highest current value, but yet continues to try other
actions. There are many ways to do that.

The inputs are used to define the context of all the action decisions. The
machine must have the power to statisticaly track actions, and choose
actions, based on the current context as defined by the current, and recent
past, inputs.

The above is enough to explain how to build my version of an intelligent
machine, and I think should answer you low level questions about "what does
it do" with the data. If it doesn't, I can talk in more detail and even
define it in C code if that helps you. The above is also a fairly standard
description of reinforcment learning algorithm and there have been working
versions of these types of algorithms for over 50 years. Yet none of them
so far has shown much promis in producing "intelligence". This is why not
a lot of work has been done with them. They have attracted the attention
of many people over the years, yet no one has produces an algorithm strong
enough to be very intersting. Yet, at the same time, steady progress has
been made at improving our knowledge about how to build this class of
algorithms.

These algorithms have much in common with compression algorithms where
there is no "ultimate goal". You never know how much better you might be
able to make the algorithm until you actually do so.

These algorithms tend to be simple by nature. They have simple formulas
for defining the "context" from the inputs, and simple formulas for
tracking the statistics (value of each possible action), and simple
formulas for selecting actions based on their value. When they are written
in pseudo code, they tend to be about 10 or 20 lines long and that's all
there is to them. When you write real implementations, the details tend to
turn the code into hundreds of lines of code, but the concept is still very
simple.

Some of the most intesting areas for improvement in these algorithms is in
finding better ways to define the context for decision making from the raw
inputs. Much of the past history of these algorthms and techniques have
been based on the asumption that the "state" of the context is both well
known, and small - like the state of a tic-tac-toe game. There are only a
very small and finite number of states the game can be in, and only a very
small and limited number of possible decisions. So tracking statistics for
different contextes (every possible move based on every possible game
board) was easy to do on our current computers.

But real world AI problems have far too many states to track. Even a
simple game like checkers becomes impossible to fully statistically track
all moves from all possible board posistions. When you move up to trying
to analize something interesting like image data, the problem of using
simplistic state defintion systems becomes totaly unworkable.

So, part of the hard problem with this approach is finding strong
algorithms which map input data into a finite number of "decision states"
which can be individually tracked for their value. If the entire game tree
for checkers has 10^20 possible forks in the game tree (that number is just
an example, I have no idea how large the game tree really is), how do you
play "good checkers" when you have hardware which can only track and
analize the value of 10^6 different actions? And likewise, on the output
side, if you want the machine to be able to produce 10^20 different types
of "behavior", how do you do this with hardware that can only statisticaly
track 10^6 different decisions?

This is the hard part of reinforcement learning algorithms that is
currently being worked on.

The TD-Gammon software is one example of how this can be done. It's a
backgammon playing program which used a reinforcment learning algorithm of
this type to learn, on it's own, how to play a good game of backgammon.
>From what I understand, it's the strongest playing backgammon program ever
written, and plays at the level of the top human players.

The number of board positions for backgammon is far too larger to allow the
system using tradaditional reinforcment learning algorthms to simply track
the value of every possible move from every board position. So, instead of
using a memory system which could track 10^20 (or whatever the game tree
size was) move values, the memory was replaced with a function that
calcuated the value for every game posistion. The function took as input
the current game position, and for output returned the value of that
posistion. The non-learning part of the algorithm then was hard-coded to
check the value this function produced for each possible move and see which
one changed the board into the posistion with the highest value. Though
traditional learning algorithms, the value of each game position was
adjusted by adjusting the function defintion.

The function which mapped game posistion to value was a neural network and
was trained (adjusted) using standard neural net training algorithms.

Much of how this was configured was optimized to fit the needs of
backgammon, and would not directly map to other domains. But, what it
demonstrates very nicely is how a very large problem space which seems to
need an intractably large number of "value" statistics, can be very nicely
replaced with a small network of numbers (I think he used something on the
order of only 80x198 weights in the network). So instead of memory that
tracked 10^20 different values, he was able to do with 10^3 values and
still play the game as well as the best human players.

Finding a strong general purpose algorithm to map complex input state to a
small number of tracking values is key to building strong AI. My current
network algorithms show great promise in doing this.

In the backgammon game, the number of "behaviors" is very limited. There
are only a small number of moves (less than 100) available for each turn.
So it's easy for the algorithm to check the value of each possible move and
pick one. In the full AI problem, the number of possible outputs is just
as intractable as the number of possible input states. A system with only
20 binary outputs for example has 2^20 possible "moves" it can make at any
point in time. So you can't track the "value" of every possible move for a
given context any more than you can track the value of every possible
context. So the same type of system of replacing 2^20 output "sates", with
a much smaller number of "value" statistics has to be created to solve
complex output behavior.

Neural networks do this. But they tend to used only in supervised learning
problems where the correct output is always known (as was the case for it's
use in TD-gammon to replace the value memory system). The trick is to
allow each possible output behavior to be a function of not just one
internal value judgment, but to instead be based on a combination of value
judgements.

For the 20 binary outputs, a simple way to change 2^20 value statistics
into 40 value statistics is to make each output independent. Each binary
output has two value functions, one for defining the value of outputing a 1
in the current input context, and one for outputing a 0 in each input
context.

This however would fail to correctly cross-connect or synchronize the
actions of each output. It would make the system act with 20 different and
independent personalities. They would have no ability to act in unison for
the same "purpose"

The grand solution has to come from a double ended mesh network function
where each internal persistent "value" statistic is used as part of many
different input states, and many different output behaviors.

The current "state" of the machine (which represetns the entire "important"
state of the world to the machine) will be represented by the current
subset of active values, and the output generated will likewise be a
combined function of the behavior created by all the active values.

If a machine of this class can play human skill level backgammon using only
15840 (80*198) reinforcment learning values which were self taught by
allowing the machine to play itself, what would a general purpose network
be able to do with 10,000,000,000 values (which could easily be built
today) if the right algorithm was found for mapping input states to subsets
of these values and maping that to different outputs? To me, human level
intelligence just doesn't feel that far away.

My current network does works just like this. And that's why I have high
hopes for this type of design - it seems to be a design that has exactly
the powers to do the things that it needs to in order to create a very
strong learning algorithm. But, I've not spent enough time with this
design to really know how well it will perform.

So, there you have it. This is what an intelligent machine needs to do in
my opinion to have human-level skills.

There's still a big step from understanding how this type of low level
machine behavior could possibly create all the high level thought, and
perception, and desires we think of as "human conscious behavior". But
I've spent enough time thinking about that to believe this is the correct
path to get there. I don't care of the rest of the world belives or not.
If I'm correct, we will be there before long, and then the rest of the
world can figure it out for itself how we got there.

But since it seemed like you were asking about what the low-level hardware
would actually do, I wrote all the above to give you an idea of my vision.

If that is not enough for you to grasp what the hardware is doing, I can
explain in more detail. Just tell me what is not clear.

More likely however, your real problem will be in beliving that this type
of hardware has anything to do with our intelligence and conscious
existence. I can talk more to that, but mostly it's a leap of faith that
few are willing to take. Though I have found many people that believe this
approach is interesting, I've come across no one that belives this could
possible be "all that is needed" like I do. Even the people that find the
ideas interesting don't hold out any real hope that it will be the answer
to AI.

-- 
Curt Welch                                            http://CurtWelch.Com/
curt@kcwc.com                          Webmaster for http://NewsReader.Com/


Relevant Pages

  • Re: An Intelligent Toy
    ... The toy doesn't know what it should be doing. ... To the learning algorithm, the reward input is what gives the ... >very small and finite number of states the game can be in, ...
    (sci.cognitive)
  • Re: new here, my lang project...
    ... > the thread management when playing off game time vs. refresh time vs. ... > shadow algorithm has. ... > behavior Y execute first in an algorithmic sequence. ... the precondition is usually fairly easy to ...
    (comp.object)
  • Re: new here, my lang project...
    ... Just the thread management when playing off game time vs. refresh time ... that the shadow algorithm has. ... the precondition is usually fairly easy to ... The point here is that the sender/receiver associations aren't ...
    (comp.object)
  • Re: Chinese and Japanese scoring
    ... I never understood Chinese scoring until I studied the Wally source code. ... claim victory in a game in which all of Wally's pieces are lost: ... That makes it useless to try to use Wally's scoring algorithm ... lot more than just the positional evaluation), ...
    (rec.games.go)
  • Re: robot with a self image
    ... of reinforcement learning theory. ... At game 1, ... complexity of the problem down to a size that an RL algorithm could deal ... The "critic" is just the part of the hardware which assigns the reward. ...
    (comp.ai.philosophy)