NNTP client and newsgroup harvesting toolkit for Python.
Project description
usenet
NNTP client and newsgroup-harvesting toolkit for Python. Read and post articles, discover public servers, and harvest groups into a text corpus.
Works on Python 3.9–3.13: nntplib was removed from the standard library in
3.13 (PEP 594), so the standard-nntplib backport is pulled in automatically
there.
Install
pip install usenet
# optional: anti-bot + Wayback transport for the legacy server-list scrapers
pip install usenet[scrape]
Quickstart
Read a group:
from datetime import timedelta
from usenet import UsenetServer
with UsenetServer("news.eternal-september.org") as server:
for article in server.get_new_news("comp.lang.python", since=timedelta(days=7)):
print(article.subject, article.author, article.date)
print(article.text)
Post an article (most servers need a free account):
from usenet import UsenetServer
with UsenetServer("news.eternal-september.org", user="login", pswd="secret") as server:
server.post("this is a test", subject="hello", group="misc.test")
Server discovery
A curated, offline list ships with the package:
from usenet import get_known_servers
for s in get_known_servers():
print(s.url, s._can_post)
The legacy directory scrapers in usenet.scrappers are a secondary "refresh
once" path; their source sites are mostly dead, so with usenet[scrape]
installed requests fall back to the Wayback Machine. See examples/.
Dataset
dataset.py harvests a newsgroup into a JSONL corpus (one article per line) for
publishing to the Hugging Face Hub. See docs/dataset.md.
python dataset.py comp.lang.python --days 30 --out comp.lang.python.jsonl
Testing
pip install -e .[test]
pytest test/
The unit tests are offline; they exercise article parsing, post framing, the bundled server list, and scraper HTML parsing against fixtures.
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file usenet-0.1.0a2.tar.gz.
File metadata
- Download URL: usenet-0.1.0a2.tar.gz
- Upload date:
- Size: 14.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bcdfec3e47b0a57c4bdf1adc81b7faf5a682425d29e6901d47c523afc1c54298
|
|
| MD5 |
0b6b773795008c6f88eaea52ce71dc6c
|
|
| BLAKE2b-256 |
b6a0b3b3a4e464298a31ccd8839b7f83020912d9719a99d29d58635dfad49121
|
File details
Details for the file usenet-0.1.0a2-py3-none-any.whl.
File metadata
- Download URL: usenet-0.1.0a2-py3-none-any.whl
- Upload date:
- Size: 12.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af168191638b754589df375519e278ac3406178ed37d4ad7a3cfbdf04a160df7
|
|
| MD5 |
9816ee23ae0d1d0bd95754c4cc1247e0
|
|
| BLAKE2b-256 |
245986bf52ffee08dfc310e455be435605452afe002e6277e0e5621e73624ef8
|