You are here: TWiki > Gna Web > GnaLabs > AutomatedTopicClassification r3 - 17 Apr 2005 - 20:50 - Main.joe


Start of topic | Skip to actions
GNA uses a set of scripts to do AutomatedTopicClassification. This script is a two step process. The first step uses a set of keyword classification to guess the general topic. Once the general topic is guessed, a bayesian analyzer is used to refine the keyword classification.

The system works, but there are two problems with it.

* the first is that we are not sure how it works, and how to make it work better * the second is that it very tightly bound with GNA data entry tables

What we found when we do Bayesian classification is that naive Bayesian classification does not work very well. Instead what seems to work much better is to "add" the scores instead of multiplying them together.

-- JosephWang

The Automated Topic Classification script is located at http://www.gnacademy.org/src/bin/assign.topics and requires a number of libraries at http://www.gnacademy.org/src/lib as well as an index generation index at http://www.gnacademy.org/src/bin/make.wordtable

The main perl module is at http://www.gnacademy.org/src/lib/GnaCatalog/Guess.pm

But the name needs to get Changed to fit CPAN standards. Probably Text::Classifier

There are a number of support files scattered throughout the site. Please contact us if you are interested in any of this.

The way the system works is that it first looks at the class id and attempts to guess a general topic. For example sci is probably some sort of science.

After the first pass, it then does a Bayesian analysis with the titles of the courses already in the database. For example, for the course Business German, it sees what courses have the words German and the words Business and then puts the new course in the same category as the old courses.

Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r3 < r2 < r1 | More topic actions
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors and is licensed under the terms of the GNU Free Documentation License.
Ideas, requests, problems regarding TWiki? Send feedback