Commandline tools for training Fathom rulesets
Project description
This is the commandline trainer for Fathom, which itself is a supervised-learning system for recognizing parts of web pages. It also includes other commandline tools for ruleset development, like fathom-unzip, fathom-pick, and fathom-list. See docs for the trainer here.
Version History
- 3.1
Add fathom-list tool.
Further optimize trainer: about 17x faster for a 60-sample corpus, with superlinear improvements for larger ones.
- 3.0
Move to Fathom repo.
Add fathom-unzip and fathom-pick.
Switch to the Adam optimizer, which is significantly more turn-key, to the point where it doesn’t need its learning-rate decay set manually.
Tolerate pages for which no candidate nodes were collected.
Add 95% CI for per-page training accuracy.
Add validation-guided early stopping.
Revise per-page accuracy calculation and display.
Shuffle training samples before training.
Add false-positive and false-negative numbers to per-tag metrics.
- 3.0a1
First release, intended for use with Fathom itself 3.0 or later
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fathom-web-3.1.tar.gz
.
File metadata
- Download URL: fathom-web-3.1.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6da9cd24b854d0bcf1ac8f33c19644a8ed3795511a941b6421e6d367a7bf893d |
|
MD5 | 3f8233da3a944e0b26f6b4ada6232da9 |
|
BLAKE2b-256 | 7566812d1a392d5a29cfde24fff166e5ca5dc720727d62f2b7cc0bc540dde111 |
File details
Details for the file fathom_web-3.1-py2.py3-none-any.whl
.
File metadata
- Download URL: fathom_web-3.1-py2.py3-none-any.whl
- Upload date:
- Size: 10.5 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ecefaca5fa1dd9538985f9fbd14b02b92dedaa4a17e0be53948a164b1c1122b2 |
|
MD5 | 47c39fffa87688dfc5f46403dcdb935d |
|
BLAKE2b-256 | 37d99515fde11cc6ee8be47833a26cfe19dc901e2ebd0f8116ea72ddefe70706 |