Start of topic | Skip to actions
GNA uses a set of scripts to do AutomatedTopicClassification. This script is a two step process. The first step uses a set of keyword classification to guess the general topic. Once the general topic is guessed, a bayesian analyzer is used to refine the keyword classification.
The system works, but there are two problems with it.
* the first is that we are not sure how it works, and how to make it work better
* the second is that it very tightly bound with GNA data entry tables
What we found when we do Bayesian classification is that naive Bayesian classification does not work very well. Instead what seems to work much better is to "add" the scores instead of multiplying them together.
-- JosephWang
The Automated Topic Classification script is located at
http://www.gnacademy.org/src/bin/assign.topics and requires a number
of libraries at http://www.gnacademy.org/src/lib as well as an
index generation index at http://www.gnacademy.org/src/bin/make.wordtable
The main perl module is at
http://www.gnacademy.org/src/lib/GnaCatalog/Guess.pm
But the name needs to get Changed to fit CPAN standards. Probably Text::Classifier
There are a number of support files scattered throughout the site.
Please contact us if you are interested in any of this.
The way the system works is that it first looks at the class id and
attempts to guess a general topic. For example sci is probably
some sort of science.
After the first pass, it then does a Bayesian analysis with the titles
of the courses already in the database. For example, for the course
Business German, it sees what courses have the words German and
the words Business and then puts the new course in the same category
as the old courses.
| |||||