It’s alive… ALIVE!
Not really, but it does learn.
Over the past few days I have been working moderately in getting
my Go5 learning element up and working, and today I confidently exclaim
success (Javadocs found here).
The learning engine is based on the Specialization/Generalization
(candidate elimination) algorithm. Essentially, it learns by observing
positive and negative examples and inferring from them the information
that is important for classifying similar examples in the future. I
won’t go into the implementation of the algorithm itself at this time (I
defer that until the paper is completed), however I will give an example
of how it works.
Envision a scenario where I go to my local comic shop
searching for any Cerebus books in near mint condition only. This shop
is run entirely by robots that use the candidate elimination algorithm
to help customers find what they are looking for. All comics at the shop
have the following attributes: Title, Number, Price, Format, Condition,
and of course its desirability. In order for the clerk robot to figure
out what I’m looking for using the candidate elimination algorithm, it
needs to juggle two sets, G and S. G refers to the most general search
criteria, while S refers to the most refined. Using the CE algorithm,
the learning element is considered to have figured out what I’m looking
for when S and G are the same.
Starting off with the comic [Cerebus, 100, $5, TPB, NM]
which I accept as desirable:
Generalization initialized to
[[?v, ?v, ?v, ?v, ?v]]
Specialization initialized to [[Cerebus,
100, $5, TPB, NM]]
As you can see, G is filled
entirely with variables and means that based on it, I am looking for
anything. S on the other hand is set to the example itself, which means
that I may be looking only for this precise comic.
The
robot then pulls out another comic [Cerebus, 200, $3, Pamphlet, VF],
which I do not like:
G = [[?v, 100, ?v, ?v, ?v], [?v, ?v,
$5, ?v, ?v], [?v, ?v, ?v, TPB, ?v], [?v, ?v, ?v, ?v, NM]]
S =
[[Cerebus, 100, $5, TPB, NM]]
The G set grows
because it is trying to move toward the S set (which was the only
acceptable comic at this point).
The robot then pulls out
another comic [Cerebus, 200, $5, Reprint, NM], which I accept:
G = [[?v, ?v, $5, ?v, ?v], [?v, ?v, ?v, ?v, NM]]
S =
[[Cerebus, ?v, $5, ?v, NM]]
The G set this time
shrinks because the latest comic only matches in two places ($5 and NM)
with the S set. The S set on the other hand starts moving towards the G
set by substituting variables for attributes that were unmatched to the
latest acceptable comic (100 to 200 and TPB to Reprint).
Another comic is presented [FromHell, 1, $5, TPB, NM], and is
absolutely unacceptable:
G = [[Cerebus, ?v, $5, ?v, ?v],
[Cerebus, ?v, ?v, ?v, NM]]
S = [[Cerebus, ?v, $5, ?v, NM]]
Again, the S and G sets converge closer to one another.
Finally, a comic is presented [Cerebus, 150, $4, TPB, NM],
and is acceptable:
G = [[Cerebus, ?v, ?v, ?v, NM]]
S =
[[Cerebus, ?v, ?v, ?v, NM]]
This final comic
provides a eureka moment for the robot because it has finally figured
out that I am looking for NM Cerebus books – all other attributes are
irrelevant.
This process will provide the core of the Go5 program. Wish
me luck.
-m