It’s alive… ALIVE!

Mar 12, 2004

It’s alive… ALIVE!
Not really, but it does learn.

Over the past few days I have been working moderately in getting my Go5 learning element up and working, and today I confidently exclaim success (Javadocs found here). The learning engine is based on the Specialization/Generalization (candidate elimination) algorithm. Essentially, it learns by observing positive and negative examples and inferring from them the information that is important for classifying similar examples in the future. I won’t go into the implementation of the algorithm itself at this time (I defer that until the paper is completed), however I will give an example of how it works.

Envision a scenario where I go to my local comic shop searching for any Cerebus books in near mint condition only. This shop is run entirely by robots that use the candidate elimination algorithm to help customers find what they are looking for. All comics at the shop have the following attributes: Title, Number, Price, Format, Condition, and of course its desirability. In order for the clerk robot to figure out what I’m looking for using the candidate elimination algorithm, it needs to juggle two sets, G and S. G refers to the most general search criteria, while S refers to the most refined. Using the CE algorithm, the learning element is considered to have figured out what I’m looking for when S and G are the same.

Starting off with the comic [Cerebus, 100, \$5, TPB, NM] which I accept as desirable: ``` Generalization initialized to [[?v, ?v, ?v, ?v, ?v]] Specialization initialized to [[Cerebus, 100, \$5, TPB, NM]]```

As you can see, G is filled entirely with variables and means that based on it, I am looking for anything. S on the other hand is set to the example itself, which means that I may be looking only for this precise comic.

The robot then pulls out another comic [Cerebus, 200, \$3, Pamphlet, VF], which I do not like:``` G = [[?v, 100, ?v, ?v, ?v], [?v, ?v, \$5, ?v, ?v], [?v, ?v, ?v, TPB, ?v], [?v, ?v, ?v, ?v, NM]] S = [[Cerebus, 100, \$5, TPB, NM]]```

The G set grows because it is trying to move toward the S set (which was the only acceptable comic at this point).

The robot then pulls out another comic [Cerebus, 200, \$5, Reprint, NM], which I accept:``` G = [[?v, ?v, \$5, ?v, ?v], [?v, ?v, ?v, ?v, NM]] S = [[Cerebus, ?v, \$5, ?v, NM]]```

The G set this time shrinks because the latest comic only matches in two places (\$5 and NM) with the S set. The S set on the other hand starts moving towards the G set by substituting variables for attributes that were unmatched to the latest acceptable comic (100 to 200 and TPB to Reprint).

Another comic is presented [FromHell, 1, \$5, TPB, NM], and is absolutely unacceptable:``` G = [[Cerebus, ?v, \$5, ?v, ?v], [Cerebus, ?v, ?v, ?v, NM]] S = [[Cerebus, ?v, \$5, ?v, NM]]```

Again, the S and G sets converge closer to one another.

Finally, a comic is presented [Cerebus, 150, \$4, TPB, NM], and is acceptable:``` G = [[Cerebus, ?v, ?v, ?v, NM]] S = [[Cerebus, ?v, ?v, ?v, NM]]```

This final comic provides a eureka moment for the robot because it has finally figured out that I am looking for NM Cerebus books — all other attributes are irrelevant.

This process will provide the core of the Go5 program. Wish me luck.

-m