Skip to main content

NNTP client and newsgroup harvesting toolkit for Python.

Project description

usenet

NNTP client and newsgroup-harvesting toolkit for Python. Read and post articles, discover public servers, and harvest groups into a text corpus.

Works on Python 3.9–3.13: nntplib was removed from the standard library in 3.13 (PEP 594), so the standard-nntplib backport is pulled in automatically there.

Install

pip install usenet
# optional: anti-bot + Wayback transport for the legacy server-list scrapers
pip install usenet[scrape]

Quickstart

Read a group:

from datetime import timedelta
from usenet import UsenetServer

with UsenetServer("news.eternal-september.org") as server:
    for article in server.get_new_news("comp.lang.python", since=timedelta(days=7)):
        print(article.subject, article.author, article.date)
        print(article.text)

Post an article (most servers need a free account):

from usenet import UsenetServer

with UsenetServer("news.eternal-september.org", user="login", pswd="secret") as server:
    server.post("this is a test", subject="hello", group="misc.test")

Server discovery

A curated, offline list ships with the package:

from usenet import get_known_servers

for s in get_known_servers():
    print(s.url, s._can_post)

The legacy directory scrapers in usenet.scrappers are a secondary "refresh once" path; their source sites are mostly dead, so with usenet[scrape] installed requests fall back to the Wayback Machine. See examples/.

Dataset

dataset.py harvests a newsgroup into a JSONL corpus (one article per line) for publishing to the Hugging Face Hub. See docs/dataset.md.

python dataset.py comp.lang.python --days 30 --out comp.lang.python.jsonl

Testing

pip install -e .[test]
pytest test/

The unit tests are offline; they exercise article parsing, post framing, the bundled server list, and scraper HTML parsing against fixtures.

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

usenet-0.1.0a2.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

usenet-0.1.0a2-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file usenet-0.1.0a2.tar.gz.

File metadata

  • Download URL: usenet-0.1.0a2.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for usenet-0.1.0a2.tar.gz
Algorithm Hash digest
SHA256 bcdfec3e47b0a57c4bdf1adc81b7faf5a682425d29e6901d47c523afc1c54298
MD5 0b6b773795008c6f88eaea52ce71dc6c
BLAKE2b-256 b6a0b3b3a4e464298a31ccd8839b7f83020912d9719a99d29d58635dfad49121

See more details on using hashes here.

File details

Details for the file usenet-0.1.0a2-py3-none-any.whl.

File metadata

  • Download URL: usenet-0.1.0a2-py3-none-any.whl
  • Upload date:
  • Size: 12.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for usenet-0.1.0a2-py3-none-any.whl
Algorithm Hash digest
SHA256 af168191638b754589df375519e278ac3406178ed37d4ad7a3cfbdf04a160df7
MD5 9816ee23ae0d1d0bd95754c4cc1247e0
BLAKE2b-256 245986bf52ffee08dfc310e455be435605452afe002e6277e0e5621e73624ef8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page