Skip to main content

BibTeX parser for Python

Project description

citerra

CI PyPI Python License

BibTeX parser for Python.

citerra parses, validates, edits, and writes BibTeX documents. It supports strict parsing by default, opt-in tolerant recovery, diagnostics with source locations, raw-text retention, source-preserving writes, name/date/identifier helpers, and plain-record projection for application code.

The package is distributed as ABI3 wheels for Python 3.8 and newer.

Performance Snapshot

Measured on tests/fixtures/tugboat.bib: 2,701,551 bytes, 73,993 lines, and 3,644 entries. Hardware was AMD Ryzen 5 5600G, 6 cores / 12 threads. Measured on 2026-05-13 with Python 3.11.14; throughput is input-size normalized.

For a BibTeX parser, the relevant speed measurements are:

Workload Why it matters citerra result
Structured parse Load a bibliography into entries and fields for application logic 0.058 s, 44.3 MiB/s
Source-preserving parse Keep raw text, source locations, diagnostics, and source-order blocks for tools 0.065 s, 39.9 MiB/s
Raw-preserving write Write retained source text after low-churn edits 0.003 s, 953.2 MiB/s
Normalized write Serialize structured data with configured formatting 0.014 s, 181.3 MiB/s

The comparison used citerra 0.2.3, bibtexparser 1.4.4, bibtexparser 2.0.0b9, and pybtex 0.26.1. citerra structured parse disables source capture and raw preservation for the closest parser-output comparison. Relative time is normalized to the first row in each table.

Python parser / mode Version Output retained Median parse time Throughput Approx. entries/s Relative time
citerra structured parse 0.2.3 Entries, fields, strings, comments, preambles 0.058 s 44.3 MiB/s 62.8k 1.0x
citerra source-preserving parse 0.2.3 Structured data, raw text, locations, diagnostics 0.065 s 39.9 MiB/s 56.1k 1.1x
bibtexparser parse 2.0.0b9 Entries/library model 0.372 s 6.9 MiB/s 9.8k 6.4x
pybtex parse 0.26.1 Bibliography data 0.859 s 3.0 MiB/s 4.2k 14.8x
bibtexparser parse 1.4.4 Entries/database model 10.483 s 0.2 MiB/s 0.35k 180.1x
Python writer / mode Version Median write time Throughput Relative time
citerra raw-preserving write 0.2.3 0.003 s 953.2 MiB/s 1.0x
citerra normalized write 0.2.3 0.014 s 181.3 MiB/s 5.3x
bibtexparser write 1.4.4 0.106 s 24.3 MiB/s 39.2x
bibtexparser write 2.0.0b9 0.493 s 5.2 MiB/s 182.2x
pybtex write 0.26.1 3.942 s 0.7 MiB/s 1458.5x

The workflow table below sums the median parse and write measurements from the same run. It is a round-trip estimate for parse-edit-write workloads, not a separate end-to-end benchmark.

Workflow Median time Throughput Relative time
citerra source-preserving parse + raw-preserving write 0.068 s 37.9 MiB/s 1.0x
citerra structured parse + normalized write 0.072 s 35.8 MiB/s 1.1x
bibtexparser 2.0.0b9 parse + write 0.865 s 3.0 MiB/s 12.7x
pybtex 0.26.1 parse + write 4.801 s 0.5 MiB/s 70.6x
bibtexparser 1.4.4 parse + write 10.589 s 0.2 MiB/s 155.7x

Reproduction commands are listed in Reproducing Benchmarks.

Install

pip install citerra

The distribution name and import name are both citerra:

import citerra

Parse

import citerra

document = citerra.parse(
    '@article{paper, author = "Jane Doe", title = "Example Paper", year = 2026}',
    expand_values=True,
)

entry = document.entry("paper")
assert entry is not None
assert entry.entry_type == "article"
assert entry.get("title") == "Example Paper"
assert entry.date_parts().year == 2026

File helpers are available:

from pathlib import Path
import citerra

document = citerra.parse_path("references.bib", tolerant=True)
Path("normalized.bib").write_text(citerra.dumps(document), encoding="utf-8")

File-like helpers are also available:

with open("references.bib", encoding="utf-8") as handle:
    document = citerra.load(handle, tolerant=True)

text = citerra.dumps(document)

Document Model

  • Document contains entries, comments, preambles, string definitions, source-order blocks, diagnostics, and validation helpers.
  • Entry exposes the citation key, entry type, fields, source text, semantic helpers, and field mutation methods.
  • Field exposes the original field name, parsed value, optional raw source text, and optional source location.
  • Value represents string literals, numbers, variables, and concatenations.
  • Diagnostic reports parse or validation problems with stable codes and source locations when available.

Tolerant Parsing And Diagnostics

text = '''
@article{ok, title = "Good"}
@article{bad, title = "Missing close"
@book{recovered, title = "Recovered"}
'''

document = citerra.parse(
    text,
    tolerant=True,
    capture_source=True,
    preserve_raw=True,
    source="refs/main.bib",
)

if document.status != "ok":
    for diagnostic in document.diagnostics:
        span = diagnostic.source
        if span is None:
            print(diagnostic.code, diagnostic.message)
        else:
            print(diagnostic.code, span.line, span.column, diagnostic.message)

Raw Text And Source-Preserving Writes

text = '@article{paper, title = "Example Paper"}'

document = citerra.parse(
    text,
    tolerant=True,
    capture_source=True,
    preserve_raw=True,
)

entry = document.entry("paper")
if entry is not None:
    print(entry.raw)
    print(entry.field("title").raw_value)

Use WriterConfig(preserve_raw=True) for low-churn output that reuses retained source text where possible. Use WriterConfig(preserve_raw=False) for normalized structured output.

document.rename_key("paper", "paper-v2")
document.set_field("paper-v2", "note", "accepted")
document.remove_export_fields(["abstract", "keywords"])

config = citerra.WriterConfig(
    preserve_raw=True,
    trailing_comma=True,
)
output = document.write(config)

Plain Records

Some application code wants ordinary dictionaries for filtering, indexing, or bulk transforms. citerra provides explicit helpers for that shape without changing the document model:

document = citerra.parse_path("references.bib")
records = citerra.document_to_dicts(document)

selected = [record for record in records if record.get("year") == "2026"]
text = citerra.write_entries(
    selected,
    field_order=["author", "title", "journal", "year", "doi"],
    sort_by=["ID"],
    trailing_comma=True,
)

Plain records use ENTRYTYPE and ID keys for the entry type and citation key.

Helpers

assert citerra.normalize_doi("https://doi.org/10.1000/XYZ.") == "10.1000/xyz"
assert citerra.latex_to_unicode("Jos\\'e") == "José"

names = citerra.parse_names("Jane Doe and {Research Group}")
assert names[1].literal == "Research Group"

date = citerra.parse_date("2026-05-13")
assert (date.year, date.month, date.day) == (2026, 5, 13)

Reproducing Benchmarks

The comparison script uses whichever optional packages are installed in the active environment:

python python/benchmarks/compare_parsers.py tests/fixtures/tugboat.bib
python python/benchmarks/compare_parsers.py tests/fixtures/tugboat.bib --write

Implementation

citerra is implemented as a native extension. Wheels include the parser engine, so ordinary Python installs do not require a Rust toolchain.

Rust Crate

The Rust crate is published as bibtex-parser on crates.io:

[dependencies]
bibtex-parser = "0.2"

See RUST.md for Rust usage.

Local Build

Use the project manifest for local development:

guix shell -m manifest.scm -- maturin build --release --out target/wheels

For local tests without installing into the user environment, unpack the built wheel into a temporary import directory and run pytest with that directory on PYTHONPATH:

rm -rf target/python-test
python3 - <<'PY'
from pathlib import Path
from zipfile import ZipFile

wheel = sorted(Path("target/wheels").glob("citerra-*.whl"))[-1]
target = Path("target/python-test")
target.mkdir(parents=True, exist_ok=True)
with ZipFile(wheel) as archive:
    archive.extractall(target)
PY
guix shell -m manifest.scm -- env PYTHONPATH=target/python-test python3 -m pytest tests/python

License

Licensed under either of Apache-2.0 or MIT, at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

citerra-0.2.3.tar.gz (334.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

citerra-0.2.3-cp38-abi3-win_amd64.whl (384.0 kB view details)

Uploaded CPython 3.8+Windows x86-64

citerra-0.2.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (456.3 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

citerra-0.2.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (438.6 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

citerra-0.2.3-cp38-abi3-macosx_11_0_arm64.whl (420.0 kB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

citerra-0.2.3-cp38-abi3-macosx_10_12_x86_64.whl (443.5 kB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file citerra-0.2.3.tar.gz.

File metadata

  • Download URL: citerra-0.2.3.tar.gz
  • Upload date:
  • Size: 334.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for citerra-0.2.3.tar.gz
Algorithm Hash digest
SHA256 4c0fd20f177604d1f1ced7da720c186f7d4dd6b9c112b86fa288627065d1d5f3
MD5 092d51b0eff14246650603c59ed176ab
BLAKE2b-256 591102f2afbcb7966fe195a1c529e8fbfa4bed595f2a8126160dc5741121a375

See more details on using hashes here.

File details

Details for the file citerra-0.2.3-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: citerra-0.2.3-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 384.0 kB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for citerra-0.2.3-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 a6b25664dd38335abdfcbb8557bad5d4ee92eb500767b82816fe62a51394a49b
MD5 23fb2fb09be1c771bfe169ebe9582a98
BLAKE2b-256 f8f82687ff09135c9402c0eff4c9b64b866a4dfc246fe04a867a37fd81d965df

See more details on using hashes here.

File details

Details for the file citerra-0.2.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for citerra-0.2.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4d41dfe52c18cd035872854a08d7bf5bbc913f4e469038fa5863fa60c84be153
MD5 240bc5646681237ef59bdd985327f993
BLAKE2b-256 d59d06f55604b8e16f80a75b598e5a220e834b17c077052b557b177e348fc083

See more details on using hashes here.

File details

Details for the file citerra-0.2.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for citerra-0.2.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 1f8d383e38ed26f55906ce250928a8ad84d0e03f0a9bb4b86ce6743dc042ed71
MD5 d920d4343c35f29cd21dd3ad9d9d5fc3
BLAKE2b-256 f133710c95967c956a0a399ce92f796e94afe71fa08a3d090780f1b575a3076c

See more details on using hashes here.

File details

Details for the file citerra-0.2.3-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for citerra-0.2.3-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 900ac5e9dd55515657142852091299e5d5d350aebaf0f25291a2064481bd79ab
MD5 57f41ad7efbac1f890e08a0347ca646a
BLAKE2b-256 996689f00069f26a3153cd3cb94dccf00c51898dddd66b615325392e2f3b5c8d

See more details on using hashes here.

File details

Details for the file citerra-0.2.3-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for citerra-0.2.3-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 7869564373c8a757ea85115e5c3997b5783d61e20531b44f5a311f7328217e57
MD5 704968c295c9d4493aed094a41bbf1e3
BLAKE2b-256 44bac34ef151e07022ba75dc66921f0880d128ad3ba7bc71d3c02ff001001df7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page