Shared library for public-inbox / lore.kernel.org access
Project description
liblore
A Python library for working with public-inbox servers, particularly lore.kernel.org. It fetches email threads, parses mbox files, and provides utilities for working with email messages from mailing list archives.
Requirements
- Python 3.11 or newer
requests>= 2.31
Installation
Install from PyPI:
pip install liblore
Or install from source:
pip install .
Quick Start
The main entry point is the LoreNode class. It connects to a public-inbox
endpoint and lets you fetch threads, search for messages, and work with raw
mbox data. Use it as a context manager so the underlying HTTP session is
cleaned up automatically:
from liblore import LoreNode
with LoreNode("https://lore.kernel.org/all") as node:
msgs = node.get_thread_by_msgid(
"20250101-example@kernel.org",
sort=True,
)
for msg in msgs:
print(msg["Subject"])
If you omit the URL, it defaults to https://lore.kernel.org/all.
API Reference
LoreNode
from liblore import LoreNode
node = LoreNode(url="https://lore.kernel.org/all")
Fetching Threads
node.get_thread_by_msgid(msgid, *, strict=True, sort=False, since=None)
Fetch a thread by its message ID. This is the highest-level method and the one you will reach for most often.
strict(defaultTrue) -- filter results to only messages that belong to the thread rooted atmsgid. When a query returns messages from unrelated threads (common with broad date ranges), strict mode discards them.sort-- sort the returned messages by theirReceivedheader timestamp.since-- a date string appended as ad:filter. This uses public-inbox's approxidate syntax, so you can write things like"20240115","2.weeks.ago", or"last.month".
Returns a list[EmailMessage]. Raises LookupError if no messages match.
with LoreNode() as node:
# Fetch a thread, sorted by date, only looking at recent messages
msgs = node.get_thread_by_msgid(
"20250101-example@kernel.org",
strict=True,
sort=True,
since="20250101",
)
node.get_thread_by_query(query)
Run a search query and return a deduplicated list[EmailMessage]. The query
uses public-inbox's
Xapian search syntax, which supports
prefixes like msgid:, s: (subject), f: (from), d: (date range), and
more.
with LoreNode() as node:
# Find all messages from a sender in the last month
msgs = node.get_thread_by_query("f:alice@example.com d:last.month..")
Batch Fetching
When you need to fetch multiple threads, the batch methods handle the loop for you and add a 100 ms cooldown between requests so you're being a good citizen to the server.
node.batch_get_thread_by_msgid(msgids, *, strict=True, sort=False, since=None)
Fetch threads for a list of message IDs. Calls get_thread_by_msgid() for
each one with a brief pause between requests. Returns a
list[list[EmailMessage]] in the same order as the input.
with LoreNode() as node:
threads = node.batch_get_thread_by_msgid(
["msg1@example.com", "msg2@example.com", "msg3@example.com"],
sort=True,
since="2.weeks.ago",
)
for thread in threads:
print(f"Thread with {len(thread)} messages")
node.batch_get_thread_by_query(queries)
Run multiple search queries. Same pattern -- calls get_thread_by_query() per
query with a 100 ms cooldown. Returns a list[list[EmailMessage]].
with LoreNode() as node:
results = node.batch_get_thread_by_query([
"s:fix f:alice@example.com",
"s:feature f:bob@example.com",
])
Raw Mbox Access
These methods return raw mbox bytes rather than parsed messages. They are useful when you need the unprocessed data, or when you want to feed the output into your own parser.
node.get_mbox_by_msgid(msgid) -- fetch a thread's mbox by message ID.
node.get_mbox_by_query(query) -- run a search query and return the
matching mbox.
with LoreNode() as node:
raw = node.get_mbox_by_msgid("20250101-example@kernel.org")
with open("thread.mbox", "wb") as f:
f.write(raw)
Single Messages
node.get_message_by_msgid(msgid) -- fetch a single raw message (bytes)
by its message ID. Useful when you need exactly one message rather than an
entire thread.
Session Configuration
node.set_user_agent(app_name, version, plus=None) -- set a custom
User-Agent header. Being a good citizen of public infrastructure means
identifying your tool:
node.set_user_agent("my-tool", "1.0")
# User-Agent: my-tool/1.0
node.set_requests_session(session) -- inject your own
requests.Session. Handy when you need custom timeouts, proxies, or
authentication. Note that the session's User-Agent is not overwritten
when you provide your own.
node.validate() -- check that the configured URL actually points to a
public-inbox server. Raises RemoteError if it does not.
node.close() -- close the HTTP session. Called automatically when
using LoreNode as a context manager.
How the API Layers Fit Together
The methods build on each other in layers, from raw bytes up to filtered, sorted thread views:
get_mbox_by_msgid / get_mbox_by_query -> raw mbox bytes
|
get_thread_by_query -> split + dedupe -> list[EmailMessage]
|
get_thread_by_msgid -> strict + sort -> list[EmailMessage]
|
batch_get_thread_by_msgid / batch_get_... -> rate-limited loop -> list[list[EmailMessage]]
You can tap into whichever layer suits your needs. Need raw bytes for
archiving? Use the get_mbox_* methods. Need parsed messages with
deduplication? Use get_thread_by_query. Want the full convenience of
strict filtering and date sorting? Use get_thread_by_msgid.
Utility Functions
The liblore.utils module provides lower-level helpers for parsing and
inspecting email messages.
Header Handling
from liblore.utils import clean_header, get_clean_msgid
# Decode RFC 2047 encoded headers
decoded = clean_header("=?utf-8?q?Re=3A_Some_Subject?=")
# Extract a clean message ID (without angle brackets) from a message
msgid = get_clean_msgid(msg) # reads Message-Id by default
msgid = get_clean_msgid(msg, "In-Reply-To") # or any other header
Parsing Messages
from liblore.utils import parse_message
# Parse raw email bytes into an EmailMessage
msg = parse_message(raw_bytes)
Extracting Message Content
from liblore.utils import (
msg_get_subject,
msg_get_author,
msg_get_payload,
msg_get_recipients,
)
# Get the decoded subject line
subject = msg_get_subject(msg)
# Strip [PATCH v3 2/5] and Re: prefixes to get the bare subject
bare = msg_get_subject(msg, strip_prefixes=True)
# Get the author as a (name, email) tuple
name, addr = msg_get_author(msg)
# Get the plain-text body, stripping the signature
body = msg_get_payload(msg)
# Get the body without quoted lines or signature
body = msg_get_payload(msg, strip_quoted=True, strip_signature=True)
# Get all recipient email addresses (To + Cc + From)
recipients = msg_get_recipients(msg)
Sorting and Threading
from liblore.utils import sort_msgs_by_received, get_strict_thread
# Sort messages by their Received timestamp (falls back to Date)
sorted_msgs = sort_msgs_by_received(msgs)
# Filter a list of messages to only those in a specific thread
thread = get_strict_thread(msgs, "20250101-example@kernel.org")
# Break the thread at msgid, ignoring its parent references
thread = get_strict_thread(msgs, msgid, noparent=True)
Mbox Splitting
from liblore.utils import split_mbox, split_and_dedupe
# Split mboxrd bytes into a list of EmailMessage objects
msgs = split_mbox(mbox_bytes)
# Split and deduplicate by Message-ID (first occurrence wins)
msgs = split_and_dedupe(mbox_bytes)
URL Helpers
from liblore.utils import get_msgid_from_url
# Extract a message ID from a lore URL
msgid = get_msgid_from_url("https://lore.kernel.org/all/20250101-example@kernel.org/")
# -> "20250101-example@kernel.org"
# Also works with bare message IDs
msgid = get_msgid_from_url("<20250101-example@kernel.org>")
# -> "20250101-example@kernel.org"
Exceptions
All exceptions inherit from LibloreError, so you can catch them broadly or
handle specific cases:
from liblore import LibloreError, RemoteError, PublicInboxError
try:
msgs = node.get_thread_by_msgid("nonexistent@example.com")
except RemoteError:
# HTTP request failed (server error, network issue, etc.)
...
except PublicInboxError:
# Something went wrong with the public-inbox operation
...
except LibloreError:
# Catch-all for any liblore error
...
Development
Install with development dependencies:
pip install -e ".[dev]"
Run the test suite:
pytest
Type checking:
mypy src/liblore/ --strict
Linting:
ruff check src/liblore/
Bug Reports
Send bug reports and patches to tools@kernel.org.
Licence
GPL-2.0-or-later. See LICENSES/GPL-2.0-or-later.txt for the full text.
Copyright The Linux Foundation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file liblore-0.1.0.tar.gz.
File metadata
- Download URL: liblore-0.1.0.tar.gz
- Upload date:
- Size: 27.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
838ce6b1157ab64cea879987e910485e7da5c3911f76b1c6e33bc9666c802989
|
|
| MD5 |
9706c06fcbeda069218fb94472ffe579
|
|
| BLAKE2b-256 |
bfee445146dfd1b2a3abe79179ca6206f7a336f8ab4fb4f4e7f7818ad21f0bb0
|
File details
Details for the file liblore-0.1.0-py3-none-any.whl.
File metadata
- Download URL: liblore-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28562e0b2d9c05695c95c8cb3c19f25c311a071a6a6f0978aa1b7811ceb4d6e2
|
|
| MD5 |
4315a5cbc1ee8f6f1b57813cddf2f226
|
|
| BLAKE2b-256 |
a922f940e7f591667e10b223d499559740000a1ae15292a5fd698d152e74d98f
|