Shared library for public-inbox / lore.kernel.org access

These details have not been verified by PyPI

Project links

Project description

liblore

A Python library for working with public-inbox servers, particularly lore.kernel.org. It fetches email threads, parses mbox files, and provides utilities for working with email messages from mailing list archives.

Requirements

Python 3.9 or newer
requests >= 2.31
authheaders >= 0.15 (optional, for DKIM/DMARC/ARC verification)

Installation

Install from PyPI:

pip install liblore

To include optional email authentication support (DKIM, DMARC, ARC):

pip install liblore[auth]

Or install from source:

pip install .

Quick Start

The main entry point is the LoreNode class. It connects to a public-inbox endpoint and lets you fetch threads, search for messages, and work with raw mbox data. Use it as a context manager so the underlying HTTP session is cleaned up automatically:

from liblore import LoreNode

with LoreNode("https://lore.kernel.org/all") as node:
    msgs = node.get_thread_by_msgid(
        "20250101-example@kernel.org",
        sort=True,
    )
    for msg in msgs:
        print(msg["Subject"])

If you omit the URL, it defaults to https://lore.kernel.org/all.

API Reference

LoreNode

from liblore import LoreNode

node = LoreNode(url="https://lore.kernel.org/all")

Caching

LoreNode can optionally cache raw mbox bytes on disk. Pass cache_dir to enable it:

with LoreNode(cache_dir="/tmp/liblore-cache", cache_ttl=600) as node:
    # First call fetches from the network and writes a cache file
    msgs = node.get_thread_by_msgid("20250101-example@kernel.org")
    # Second call reads from cache (if within TTL)
    msgs = node.get_thread_by_msgid("20250101-example@kernel.org")

cache_dir -- directory for cache files (None to disable, the default)
cache_ttl -- time-to-live in seconds (default 600 = 10 minutes)

Caching is applied to get_mbox_by_msgid, get_mbox_by_query, and get_message_by_msgid. Polling methods (get_thread_updates_since) are intentionally not cached. TTL is checked on every read, so stale data is never returned -- even in long-running processes.

Pass nocache=True to any cached method to bypass the cache for that call (the response is still written back to refresh the entry). Call node.clear_cache() to remove all cached entries.

Fetching Threads

node.get_thread_by_msgid(msgid, *, strict=True, sort=False, since=None)

Fetch a thread by its message ID. This is the highest-level method and the one you will reach for most often.

strict (default True) -- filter results to only messages that belong to the thread rooted at msgid. When a query returns messages from unrelated threads (common with broad date ranges), strict mode discards them.
sort -- sort the returned messages by their Received header timestamp.
since -- a date string appended as a d: filter. This uses public-inbox's approxidate syntax, so you can write things like "20240115", "2.weeks.ago", or "last.month".

Returns a list[EmailMessage]. Raises LookupError if no messages match.

with LoreNode() as node:
    # Fetch a thread, sorted by date, only looking at recent messages
    msgs = node.get_thread_by_msgid(
        "20250101-example@kernel.org",
        strict=True,
        sort=True,
        since="20250101",
    )

node.get_thread_updates_since(msgid, since, *, strict=True, sort=False)

Check whether a thread has new messages since a given point in time. This is handy for polling use cases where you want to know if anything new has arrived.

since -- a datetime object. Converted to a UTC epoch timestamp internally and matched against the server-set Received header (rt: prefix), which is more reliable than the client-set Date header.
strict (default True) -- filter results to only messages belonging to the thread rooted at msgid.
sort -- sort the returned messages by their Received header timestamp.

Returns a list[EmailMessage]. Returns an empty list (rather than raising) when there are no updates.

from datetime import datetime, timedelta, timezone

with LoreNode() as node:
    cutoff = datetime.now(timezone.utc) - timedelta(hours=24)
    updates = node.get_thread_updates_since(
        "20250101-example@kernel.org", cutoff,
    )
    if updates:
        print(f"{len(updates)} new message(s)")

node.get_thread_by_query(query, *, full_threads=False)

Run a search query and return a deduplicated list[EmailMessage]. The query uses public-inbox's Xapian search syntax, which supports prefixes like msgid:, s: (subject), f: (from), d: (date range), and more.

When full_threads is True, the server expands results to include the full thread for every matching message. This is useful when searching by patch-id or change-id and you need the complete surrounding thread, not just the matching messages.

with LoreNode() as node:
    # Find all messages from a sender in the last month
    msgs = node.get_thread_by_query("f:alice@example.com d:last.month..")

    # Search by patch-id and fetch the full threads
    msgs = node.get_thread_by_query("patchid:abc123", full_threads=True)

Batch Fetching

When you need to fetch multiple threads, the batch methods handle the loop for you and add a 100 ms cooldown between requests so you're being a good citizen to the server.

node.batch_get_thread_by_msgid(msgids, *, strict=True, sort=False, since=None)

Fetch threads for a list of message IDs. Calls get_thread_by_msgid() for each one with a brief pause between requests. Returns a list[list[EmailMessage]] in the same order as the input.

with LoreNode() as node:
    threads = node.batch_get_thread_by_msgid(
        ["msg1@example.com", "msg2@example.com", "msg3@example.com"],
        sort=True,
        since="2.weeks.ago",
    )
    for thread in threads:
        print(f"Thread with {len(thread)} messages")

node.batch_get_thread_by_query(queries, *, full_threads=False)

Run multiple search queries. Same pattern -- calls get_thread_by_query() per query with a 100 ms cooldown. Returns a list[list[EmailMessage]].

with LoreNode() as node:
    results = node.batch_get_thread_by_query([
        "s:fix f:alice@example.com",
        "s:feature f:bob@example.com",
    ])

Raw Mbox Access

These methods return raw mbox bytes rather than parsed messages. They are useful when you need the unprocessed data, or when you want to feed the output into your own parser.

node.get_mbox_by_msgid(msgid, *, nocache=False) -- fetch a thread's mbox by message ID.

node.get_mbox_by_query(query, *, full_threads=False, nocache=False) -- run a search query and return the matching mbox. Pass full_threads=True to expand results to include full threads.

with LoreNode() as node:
    raw = node.get_mbox_by_msgid("20250101-example@kernel.org")
    with open("thread.mbox", "wb") as f:
        f.write(raw)

Single Messages

node.get_message_by_msgid(msgid, *, nocache=False) -- fetch a single raw message (bytes) by its message ID. Useful when you need exactly one message rather than an entire thread.

Session Configuration

node.set_user_agent(app_name, version, plus=None) -- set a custom User-Agent header. Being a good citizen of public infrastructure means identifying your tool:

node.set_user_agent("my-tool", "1.0")
# User-Agent: my-tool/1.0

node.set_requests_session(session) -- inject your own requests.Session. Handy when you need custom timeouts, proxies, or authentication. Note that the session's User-Agent is not overwritten when you provide your own.

node.validate() -- check that the configured URL actually points to a public-inbox server. Raises RemoteError if it does not.

node.close() -- close the HTTP session. Called automatically when using LoreNode as a context manager.

Message Authentication

LoreNode can optionally verify DKIM signatures, DMARC alignment, and ARC chains on every message it retrieves. This requires the authheaders package (install with pip install liblore[auth]).

with LoreNode(add_auth_headers=True) as node:
    msgs = node.get_thread_by_msgid("20250101-example@kernel.org")
    for msg in msgs:
        print(msg["Authentication-Results"])
        # liblore; dkim=pass header.d=kernel.org; ...

When enabled, each returned EmailMessage gets an Authentication-Results header added by the authheaders library. SPF is not checked because archived messages don't carry the SMTP transaction info (client IP, MAIL FROM, HELO) that SPF requires.

If add_auth_headers=True is set but authheaders is not installed, a LibloreError is raised immediately on construction.

How the API Layers Fit Together

The methods build on each other in layers, from raw bytes up to filtered, sorted thread views:

get_mbox_by_msgid / get_mbox_by_query      ->  raw mbox bytes
        |
get_thread_by_query                        ->  split + dedupe -> list[EmailMessage]
        |
get_thread_by_msgid                        ->  strict + sort  -> list[EmailMessage]
        |
get_thread_updates_since                   ->  poll for new   -> list[EmailMessage]
        |
batch_get_thread_by_msgid / batch_get_...  ->  rate-limited loop -> list[list[EmailMessage]]

You can tap into whichever layer suits your needs. Need raw bytes for archiving? Use the get_mbox_* methods. Need parsed messages with deduplication? Use get_thread_by_query. Want the full convenience of strict filtering and date sorting? Use get_thread_by_msgid. Need to poll for new messages? Use get_thread_updates_since.

Utility Functions

The liblore.utils module provides lower-level helpers for parsing and inspecting email messages.

Header Handling

from liblore.utils import clean_header, get_clean_msgid

# Decode RFC 2047 encoded headers
decoded = clean_header("=?utf-8?q?Re=3A_Some_Subject?=")

# Extract a clean message ID (without angle brackets) from a message
msgid = get_clean_msgid(msg)               # reads Message-Id by default
msgid = get_clean_msgid(msg, "In-Reply-To")  # or any other header

Parsing Messages

from liblore.utils import parse_message

# Parse raw email bytes into an EmailMessage
msg = parse_message(raw_bytes)

Extracting Message Content

from liblore.utils import (
    msg_get_subject,
    msg_get_author,
    msg_get_payload,
    msg_get_recipients,
)

# Get the decoded subject line
subject = msg_get_subject(msg)

# Strip [PATCH v3 2/5] and Re: prefixes to get the bare subject
bare = msg_get_subject(msg, strip_prefixes=True)

# Get the author as a (name, email) tuple
name, addr = msg_get_author(msg)

# Get the plain-text body, stripping the signature
body = msg_get_payload(msg)

# Get the body without quoted lines or signature
body = msg_get_payload(msg, strip_quoted=True, strip_signature=True)

# Get all recipient email addresses (To + Cc + From)
recipients = msg_get_recipients(msg)

Email Serialization

These functions replace Python's buggy as_bytes() with battle-tested serialization that correctly handles RFC 2047 header encoding, line wrapping, and non-ASCII display names.

from liblore.utils import format_addrs, wrap_header, get_msg_as_bytes

# Format (name, email) pairs into an RFC 5322 address string
formatted = format_addrs([
    ("", "foo@example.com"),
    ("Foo Bar", "bar@example.com"),
])
# -> 'foo@example.com, Foo Bar <bar@example.com>'

# Wrap and RFC 2047-encode a header for SMTP
hdr_bytes = wrap_header(("Subject", "Hello world"))

# Serialize a full message to bytes with proper encoding
msg_bytes = get_msg_as_bytes(msg)            # \n line endings (dry-run)
msg_bytes = get_msg_as_bytes(msg, nl="\r\n") # \r\n for SMTP

Sorting and Threading

from liblore.utils import sort_msgs_by_received, get_strict_thread

# Sort messages by their Received timestamp (falls back to Date)
sorted_msgs = sort_msgs_by_received(msgs)

# Filter a list of messages to only those in a specific thread
thread = get_strict_thread(msgs, "20250101-example@kernel.org")

# Break the thread at msgid, ignoring its parent references
thread = get_strict_thread(msgs, msgid, noparent=True)

Thread Minimization

from liblore.utils import minimize_thread

# Strip excessive quoting and non-essential headers for compact display
minimized = minimize_thread(msgs)

# Customize which headers to keep
minimized = minimize_thread(msgs, keep_headers=("From", "Subject", "Date"))

# Aggressively reduce long quotes to just the last paragraph
minimized = minimize_thread(msgs, reduce_quote_context=True)

minimize_thread() creates lightweight copies of each message: it keeps only essential headers (From, To, Cc, Subject, Date, Message-ID, Reply-To, In-Reply-To by default), strips multi-level quotes and trailing quoted blocks, and drops messages that become empty after processing. Messages containing diffs or diffstats are preserved as-is.

When reduce_quote_context=True, long quoted blocks preceding a reply are trimmed to just the last paragraph, with earlier content replaced by a > [... skip NN lines ...] marker. This only applies when more than 5 lines would be skipped.

Mbox Splitting

from liblore.utils import split_mbox, split_and_dedupe

# Split mboxrd bytes into a list of EmailMessage objects
msgs = split_mbox(mbox_bytes)

# Split and deduplicate by Message-ID (first occurrence wins)
msgs = split_and_dedupe(mbox_bytes)

When you need raw message bytes without the cost of parsing, use the _as_bytes variants:

from liblore.utils import split_mbox_as_bytes, split_and_dedupe_as_bytes

# Split mboxrd bytes into a list of raw message byte strings
chunks = split_mbox_as_bytes(mbox_bytes)

# Split, deduplicate, and return raw bytes (no email parsing)
chunks = split_and_dedupe_as_bytes(mbox_bytes)

The _as_bytes functions perform mboxrd unescaping and (for dedupe) Message-ID/List-Id extraction directly on raw bytes, so they skip the email parser entirely. The regular split_mbox and split_and_dedupe are thin wrappers that parse the results.

URL Helpers

from liblore.utils import get_msgid_from_url

# Extract a message ID from a lore URL
msgid = get_msgid_from_url("https://lore.kernel.org/all/20250101-example@kernel.org/")
# -> "20250101-example@kernel.org"

# Also works with bare message IDs
msgid = get_msgid_from_url("<20250101-example@kernel.org>")
# -> "20250101-example@kernel.org"

Exceptions

All exceptions inherit from LibloreError, so you can catch them broadly or handle specific cases:

from liblore import LibloreError, RemoteError, PublicInboxError

try:
    msgs = node.get_thread_by_msgid("nonexistent@example.com")
except RemoteError:
    # HTTP request failed (server error, network issue, etc.)
    ...
except PublicInboxError:
    # Something went wrong with the public-inbox operation
    ...
except LibloreError:
    # Catch-all for any liblore error
    ...

Development

Install with development dependencies:

pip install -e ".[dev]"

Run the test suite:

pytest

Type checking:

mypy src/liblore/ --strict

Linting:

ruff check src/liblore/

Bug Reports

Send bug reports and patches to tools@kernel.org.

Licence

GPL-2.0-or-later. See LICENSES/GPL-2.0-or-later.txt for the full text.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.7.1

Apr 9, 2026

0.7.0

Apr 8, 2026

0.6.1

Apr 8, 2026

0.6.0

Apr 8, 2026

This version

0.5.0

Apr 7, 2026

0.4.0

Mar 5, 2026

0.3.1

Mar 4, 2026

0.3.0

Mar 4, 2026

0.2.0

Mar 4, 2026

0.1.0

Mar 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

liblore-0.5.0.tar.gz (44.2 kB view details)

Uploaded Apr 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

liblore-0.5.0-py3-none-any.whl (27.3 kB view details)

Uploaded Apr 7, 2026 Python 3

File details

Details for the file liblore-0.5.0.tar.gz.

File metadata

Download URL: liblore-0.5.0.tar.gz
Upload date: Apr 7, 2026
Size: 44.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for liblore-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`de858a6716c3d98fb2ac1c3d9229e6d716910626b95f12408d78546c8e79dcf9`
MD5	`3f86304949e5a3b2c01582adf8f9e30d`
BLAKE2b-256	`1cf7b61f2a6e53e47a1524cf23f0c1bd25bf1ee6feca184dba3ee06293d12287`

See more details on using hashes here.

File details

Details for the file liblore-0.5.0-py3-none-any.whl.

File metadata

Download URL: liblore-0.5.0-py3-none-any.whl
Upload date: Apr 7, 2026
Size: 27.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for liblore-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ddfe54d23417974d424a463e160698eba8dce4677f855abb53aca1c9bc23a4c3`
MD5	`d0c536f74dfc9ea09c4ae8de8929fdf9`
BLAKE2b-256	`138c97d886783f6c358260858acb77ebae25970ffbe84dee6a0cbd03c1fb6289`

See more details on using hashes here.

liblore 0.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

liblore

Requirements

Installation

Quick Start

API Reference

LoreNode

Caching

Fetching Threads

Batch Fetching

Raw Mbox Access

Single Messages

Session Configuration

Message Authentication

How the API Layers Fit Together

Utility Functions

Header Handling

Parsing Messages

Extracting Message Content

Email Serialization

Sorting and Threading

Thread Minimization

Mbox Splitting

URL Helpers

Exceptions

Development

Bug Reports

Licence

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes