Start of topic | Skip to actions
Keywords:
wrapper induction screen scraping
see also GnaLabs and AutomatedTopicClassification
The Automated Data Extraction code has been modularized and
uploaded to CPAN as module WWW::Extractor
=Where to get it=
The source for the module is at
http://www.gnacademy.org/src/lib/WWW/Extractor.pm
with a perl package at
http://www.gnacademy.org/beta/WWW/Extractor
Learn.wrapper still exists at
https://www.gnacademy.org/src/bin/learn.wrapper
Pod documentation is also available.
= How it works =
There are a few original ideas in the design of WWW::Extractor
Calculating edit distance: http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Dynamic/Edit.html Dynamic Programming Algorithm (DPA) for Edit-Distance -- TWikiGuest - 24 Apr 2005 | |||||