Globewide Network
Academy

Founded 1993
Home page: http://www.gnacademy.org/ E-mail: gna@gnacademy.org

For learners:
Distance learning catalog
Distance learning guides
Student resources
For providers:
Listing in catalog
Teacher resources
GNA Projects:
About us
Open content

E-mail this page | Distance Learning Guides | Donate | FAQ
Tuesday, 2008 October 7 17:23:10 GMT Our catalog has 32123 courses and 6224 programs

All of the code mentioed in this white paper is available for download for free from this site.

How we process listings

Right now the main thing that we are trying to do is to get people to send us information in spreadsheets. Spreadsheets are nice because the information is already divided into fields and it's usually a trivial matter to convert their fields to our fields.

If the data is not structured

Then it gets really tough. What we do is

What didn't work

We are use this system after trying for several years with other mechanisms that we found didn't work.

>Web interfaces didn't work because people didn't submit information often enough to remember a password. Right now all interaction with our listers occurs through e-mail attachments

We tried machine parsing for a while. In contrast with what we are doing with topic classification in which machine learning works beautifully, we have mostly given up on machine parsing for raw data. The assumption was that we would write a parser that using perl regular expressions to extract information from a web page. Then if the web page changed, we'd just run the parser again and get the new data.

The problem with this is that writing a parser needed someone who understood the perl pattern matching language. It wasn't particularly time consuming for someone who understood perl to write a parser.

How we assign topics

First of all, the topic for each entry is denoted by a string that looks like Society;History;Countries;United States.

The topic classification is a two step process. First we look at the class identification for class and then we use a lookup table to make a first guess as to the topic of the class. This lookup table is included in our database download package or you can look at it here.

The next step involves doing a pattern match against titles of courses that are already in the database and then guessing a best match using Bayesian statistics. The code to do this is available in the file topics.pl in our library package or you can also look at it here. I think that the main procedure is refine_topic1

It's interesting to see the reason that we have more or less successfully automated topic assignment, but not field assignment. Basically our computer systems "know" that "Introduction to Physics" is a physics class because courses that have similar names have already been associated with a topic in the database. By contrast, our program doesn't know that "John Smith" is a teacher name rather than a course description because that information isn't stored anywhere. In essence we are using our database itself as a rudimentary neural net.

What didn't work

The one thing that we learned early is we cannot have submitters assigning topics. The reason is that the individual submitter has no idea of the global topic scheme.

Data model

Barcodes and Part Numbers

One thing that makes education challenging from a data model perspective is that it is impossible to assign part numbers.

What the IMSproject/LOM data model is broken

Collaboratively Exchanged Data Models