Skip to main content

Commandline tools for training Fathom rulesets

Project description

This is the commandline trainer for Fathom, which itself is a supervised-learning system for recognizing parts of web pages. It also includes other commandline tools for ruleset development, like fathom-unzip, fathom-pick, and fathom-list. See docs for the trainer here.

Version History

3.1
  • Add fathom-list tool.

  • Further optimize trainer: about 17x faster for a 60-sample corpus, with superlinear improvements for larger ones.

3.0
  • Move to Fathom repo.

  • Add fathom-unzip and fathom-pick.

  • Switch to the Adam optimizer, which is significantly more turn-key, to the point where it doesn’t need its learning-rate decay set manually.

  • Tolerate pages for which no candidate nodes were collected.

  • Add 95% CI for per-page training accuracy.

  • Add validation-guided early stopping.

  • Revise per-page accuracy calculation and display.

  • Shuffle training samples before training.

  • Add false-positive and false-negative numbers to per-tag metrics.

3.0a1
  • First release, intended for use with Fathom itself 3.0 or later

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fathom-web-3.1.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

fathom_web-3.1-py2.py3-none-any.whl (10.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file fathom-web-3.1.tar.gz.

File metadata

  • Download URL: fathom-web-3.1.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for fathom-web-3.1.tar.gz
Algorithm Hash digest
SHA256 6da9cd24b854d0bcf1ac8f33c19644a8ed3795511a941b6421e6d367a7bf893d
MD5 3f8233da3a944e0b26f6b4ada6232da9
BLAKE2b-256 7566812d1a392d5a29cfde24fff166e5ca5dc720727d62f2b7cc0bc540dde111

See more details on using hashes here.

File details

Details for the file fathom_web-3.1-py2.py3-none-any.whl.

File metadata

  • Download URL: fathom_web-3.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 10.5 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for fathom_web-3.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 ecefaca5fa1dd9538985f9fbd14b02b92dedaa4a17e0be53948a164b1c1122b2
MD5 47c39fffa87688dfc5f46403dcdb935d
BLAKE2b-256 37d99515fde11cc6ee8be47833a26cfe19dc901e2ebd0f8116ea72ddefe70706

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page