Commandline tools for training Fathom rulesets
This is the commandline trainer for Fathom, which itself is a supervised-learning system for recognizing parts of web pages. It also includes other commandline tools for ruleset development, like fathom-unzip, fathom-pick, and fathom-list. See docs for the trainer here.
- Add fathom-list tool.
- Further optimize trainer: about 17x faster for a 60-sample corpus, with superlinear improvements for larger ones.
- Move to Fathom repo.
- Add fathom-unzip and fathom-pick.
- Switch to the Adam optimizer, which is significantly more turn-key, to the point where it doesn’t need its learning-rate decay set manually.
- Tolerate pages for which no candidate nodes were collected.
- Add 95% CI for per-page training accuracy.
- Add validation-guided early stopping.
- Revise per-page accuracy calculation and display.
- Shuffle training samples before training.
- Add false-positive and false-negative numbers to per-tag metrics.
- First release, intended for use with Fathom itself 3.0 or later
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size fathom_web-3.1-py2.py3-none-any.whl (10.5 kB)||File type Wheel||Python version py2.py3||Upload date||Hashes View hashes|
|Filename, size fathom-web-3.1.tar.gz (7.9 kB)||File type Source||Python version None||Upload date||Hashes View hashes|
Hashes for fathom_web-3.1-py2.py3-none-any.whl