Skip to main content

BibTeX parser for Python

Project description

citerra

CI PyPI Python License

BibTeX parser for Python.

citerra parses, validates, edits, and writes BibTeX documents. It supports strict parsing by default, opt-in tolerant recovery, diagnostics with source locations, raw-text retention, source-preserving writes, name/date/identifier helpers, and plain-record projection for application code.

The package is distributed as ABI3 wheels for Python 3.8 and newer.

Performance Snapshot

Measured on tests/fixtures/tugboat.bib: 2,701,551 bytes, 73,993 lines, and 3,644 entries. Hardware was AMD Ryzen 5 5600G, 6 cores / 12 threads. Measured on 2026-05-13 with Python 3.11.14; throughput is input-size normalized.

The comparison used citerra 0.2.2, bibtexparser 1.4.4, bibtexparser 2.0.0b9, and pybtex 0.26.1.

citerra structured parse disables source capture and raw preservation for the closest parser-output comparison. citerra source-preserving parse includes raw source text, source locations, diagnostics, and source-order blocks.

Python parser / mode Version Median parse time Throughput Relative time
citerra structured parse 0.2.2 0.058 s 44.3 MiB/s 1.0x
citerra source-preserving parse 0.2.2 0.065 s 39.9 MiB/s 1.1x
bibtexparser parse 2.0.0b9 0.372 s 6.9 MiB/s 6.4x
pybtex parse 0.26.1 0.859 s 3.0 MiB/s 14.8x
bibtexparser parse 1.4.4 10.483 s 0.2 MiB/s 180.1x
Python writer / mode Version Median write time Throughput Relative time
citerra raw-preserving write 0.2.2 0.003 s 953.2 MiB/s 1.0x
citerra normalized write 0.2.2 0.014 s 181.3 MiB/s 5.3x
bibtexparser write 1.4.4 0.106 s 24.3 MiB/s 39.2x
bibtexparser write 2.0.0b9 0.493 s 5.2 MiB/s 182.2x
pybtex write 0.26.1 3.942 s 0.7 MiB/s 1458.5x

Reproduction commands are listed in Reproducing Benchmarks.

Install

pip install citerra

The distribution name and import name are both citerra:

import citerra

Parse

import citerra

document = citerra.parse(
    '@article{paper, author = "Jane Doe", title = "Example Paper", year = 2026}',
    expand_values=True,
)

entry = document.entry("paper")
assert entry is not None
assert entry.entry_type == "article"
assert entry.get("title") == "Example Paper"
assert entry.date_parts().year == 2026

File helpers are available:

from pathlib import Path
import citerra

document = citerra.parse_path("references.bib", tolerant=True)
Path("normalized.bib").write_text(citerra.dumps(document), encoding="utf-8")

File-like helpers are also available:

with open("references.bib", encoding="utf-8") as handle:
    document = citerra.load(handle, tolerant=True)

text = citerra.dumps(document)

Document Model

  • Document contains entries, comments, preambles, string definitions, source-order blocks, diagnostics, and validation helpers.
  • Entry exposes the citation key, entry type, fields, source text, semantic helpers, and field mutation methods.
  • Field exposes the original field name, parsed value, optional raw source text, and optional source location.
  • Value represents string literals, numbers, variables, and concatenations.
  • Diagnostic reports parse or validation problems with stable codes and source locations when available.

Tolerant Parsing And Diagnostics

document = citerra.parse(
    text,
    tolerant=True,
    capture_source=True,
    preserve_raw=True,
    source="refs/main.bib",
)

if document.status != "ok":
    for diagnostic in document.diagnostics:
        span = diagnostic.source
        if span is None:
            print(diagnostic.code, diagnostic.message)
        else:
            print(diagnostic.code, span.line, span.column, diagnostic.message)

Raw Text And Source-Preserving Writes

document = citerra.parse(
    text,
    tolerant=True,
    capture_source=True,
    preserve_raw=True,
)

entry = document.entry("paper")
if entry is not None:
    print(entry.raw)
    print(entry.field("title").raw_value)

Use WriterConfig(preserve_raw=True) for low-churn output that reuses retained source text where possible. Use WriterConfig(preserve_raw=False) for normalized structured output.

document.rename_key("paper", "paper-v2")
document.set_field("paper-v2", "note", "accepted")
document.remove_export_fields(["abstract", "keywords"])

config = citerra.WriterConfig(
    preserve_raw=True,
    trailing_comma=True,
)
output = document.write(config)

Plain Records

Some application code wants ordinary dictionaries for filtering, indexing, or bulk transforms. citerra provides explicit helpers for that shape without changing the document model:

document = citerra.parse_path("references.bib")
records = citerra.document_to_dicts(document)

selected = [record for record in records if record.get("year") == "2026"]
text = citerra.write_entries(
    selected,
    field_order=["author", "title", "journal", "year", "doi"],
    sort_by=["ID"],
    trailing_comma=True,
)

Plain records use ENTRYTYPE and ID keys for the entry type and citation key.

Helpers

assert citerra.normalize_doi("https://doi.org/10.1000/XYZ.") == "10.1000/xyz"
assert citerra.latex_to_unicode("Jos\\'e") == "José"

names = citerra.parse_names("Jane Doe and {Research Group}")
assert names[1].literal == "Research Group"

date = citerra.parse_date("2026-05-13")
assert (date.year, date.month, date.day) == (2026, 5, 13)

Reproducing Benchmarks

The comparison script uses whichever optional packages are installed in the active environment:

python python/benchmarks/compare_parsers.py tests/fixtures/tugboat.bib
python python/benchmarks/compare_parsers.py tests/fixtures/tugboat.bib --write

Implementation

citerra is implemented as a native extension. Wheels include the parser engine, so ordinary Python installs do not require a Rust toolchain.

Local Build

Use the project manifest for local development:

guix shell -m manifest.scm -- maturin build --release --out target/wheels

For local tests without installing into the user environment, unpack the built wheel into a temporary import directory and run pytest with that directory on PYTHONPATH:

rm -rf target/python-test
python3 - <<'PY'
from pathlib import Path
from zipfile import ZipFile

wheel = sorted(Path("target/wheels").glob("citerra-*.whl"))[-1]
target = Path("target/python-test")
target.mkdir(parents=True, exist_ok=True)
with ZipFile(wheel) as archive:
    archive.extractall(target)
PY
guix shell -m manifest.scm -- env PYTHONPATH=target/python-test python3 -m pytest tests/python

License

Licensed under either of Apache-2.0 or MIT, at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

citerra-0.2.2.tar.gz (333.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

citerra-0.2.2-cp38-abi3-win_amd64.whl (383.4 kB view details)

Uploaded CPython 3.8+Windows x86-64

citerra-0.2.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (455.8 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

citerra-0.2.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (437.8 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

citerra-0.2.2-cp38-abi3-macosx_11_0_arm64.whl (419.2 kB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

citerra-0.2.2-cp38-abi3-macosx_10_12_x86_64.whl (442.7 kB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file citerra-0.2.2.tar.gz.

File metadata

  • Download URL: citerra-0.2.2.tar.gz
  • Upload date:
  • Size: 333.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for citerra-0.2.2.tar.gz
Algorithm Hash digest
SHA256 525d52f94a397ec7e5be73c4b415b3dbd016298568ceb6d475f70b699aef1f47
MD5 4b2725b1a107de7836f3bbb6ce0da52d
BLAKE2b-256 5fc815244d1e0a1c4a2b21338960c28f965719ca44aceb9508787d9b2541b97a

See more details on using hashes here.

Provenance

The following attestation bundles were made for citerra-0.2.2.tar.gz:

Publisher: release.yml on b-vitamins/bibtex-parser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file citerra-0.2.2-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: citerra-0.2.2-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 383.4 kB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for citerra-0.2.2-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 b91a3dffe8e736c33d3883587aeda66d51cc0eba285788d95338fb57c57381a6
MD5 5d1aa5ae4c8047789f54e0d511d177fd
BLAKE2b-256 950d55276c93eb454a8480c633f701d5866280e74c9f07b0ffd1aa6a374da0d2

See more details on using hashes here.

Provenance

The following attestation bundles were made for citerra-0.2.2-cp38-abi3-win_amd64.whl:

Publisher: release.yml on b-vitamins/bibtex-parser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file citerra-0.2.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for citerra-0.2.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d10b5e4fe884a5aa61a0edcc2bd2976dd9252058c227317ed85dc4b66dbdf040
MD5 d01441507407ab2f752aa7c278a5fab6
BLAKE2b-256 88ff524629d82a49ffb1166005764a884a2eb4e235338c65f94537795168543e

See more details on using hashes here.

Provenance

The following attestation bundles were made for citerra-0.2.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on b-vitamins/bibtex-parser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file citerra-0.2.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for citerra-0.2.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 5b27e6e25438aa12b94ed8c8f4cc61ba17fc28394a59add2a01f6c6cc0da03dd
MD5 44801fdb0e2cff7a6f410caa1c02de11
BLAKE2b-256 0d3e8217f83b32a7c9d9c8cb0827edfec251fe03b18151d7570b9b7b9f69e1b3

See more details on using hashes here.

Provenance

The following attestation bundles were made for citerra-0.2.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on b-vitamins/bibtex-parser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file citerra-0.2.2-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for citerra-0.2.2-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 94afd7940fead8518059601df55ea7c8d0d7ef0bbba2938927f290de5a96db95
MD5 889af4a968878a89fd2b4a2afdf97db0
BLAKE2b-256 759efa04856839e54d9ba9225206150fb5b7a79220d33d8a1b1890ec01524441

See more details on using hashes here.

Provenance

The following attestation bundles were made for citerra-0.2.2-cp38-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on b-vitamins/bibtex-parser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file citerra-0.2.2-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for citerra-0.2.2-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 34d073ae0e50151299ba0dbab8426783f55b3efe7a7481ee91168caff48435ce
MD5 aba3264cadf81bac22761a82025be898
BLAKE2b-256 f0a496453a471e45ca6909258b345fd42fbea6dcfb2735b6e6180334d1b10540

See more details on using hashes here.

Provenance

The following attestation bundles were made for citerra-0.2.2-cp38-abi3-macosx_10_12_x86_64.whl:

Publisher: release.yml on b-vitamins/bibtex-parser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page