Skip to main content

BibTeX parser for Python

Project description

citerra

CI PyPI Python License

BibTeX parser for Python.

citerra parses, validates, edits, and writes BibTeX documents. It supports strict parsing by default, opt-in tolerant recovery, diagnostics with source locations, raw-text retention, source-preserving writes, name/date/identifier helpers, and plain-record projection for application code.

The package is distributed as ABI3 wheels for Python 3.8 and newer.

Performance Snapshot

Measured on tests/fixtures/tugboat.bib: 2,701,551 bytes, 73,993 lines, and 3,644 entries. Hardware was AMD Ryzen 5 5600G, 6 cores / 12 threads. Measured on 2026-05-14 with Python 3.11.14; throughput is input-size normalized.

For a BibTeX parser, the relevant speed measurements are:

Workload Why it matters citerra result
Structured parse Load a bibliography into entries and fields for application logic 0.008 s, 322.2 MiB/s
Source-preserving parse Keep raw text, source locations, diagnostics, and source-order blocks for tools 0.011 s, 232.8 MiB/s
Raw-preserving write Write retained source text after low-churn edits 0.002 s, 1149.8 MiB/s
Normalized write Serialize structured data with configured formatting 0.010 s, 259.7 MiB/s

The comparison used citerra 0.3.0, bibtexparser 1.4.4, bibtexparser 2.0.0b9, and pybtex 0.26.1. citerra structured parse disables source capture and raw preservation for the closest parser-output comparison. Relative time is normalized to the first row in each table.

Python parser / mode Version Output retained Median parse time Throughput Approx. entries/s Relative time
citerra structured parse 0.3.0 Entries, fields, strings, comments, preambles 0.008 s 322.2 MiB/s 455.7k 1.0x
citerra source-preserving parse 0.3.0 Structured data, raw text, locations, diagnostics 0.011 s 232.8 MiB/s 329.3k 1.4x
bibtexparser parse 2.0.0b9 Entries/library model 0.367 s 7.0 MiB/s 9.9k 45.9x
pybtex parse 0.26.1 Bibliography data 0.863 s 3.0 MiB/s 4.2k 107.9x
bibtexparser parse 1.4.4 Entries/database model 10.758 s 0.2 MiB/s 0.34k 1345.2x
Python writer / mode Version Median write time Throughput Relative time
citerra raw-preserving write 0.3.0 0.002 s 1149.8 MiB/s 1.0x
citerra normalized write 0.3.0 0.010 s 259.7 MiB/s 4.4x
bibtexparser write 1.4.4 0.106 s 24.3 MiB/s 47.4x
bibtexparser write 2.0.0b9 0.497 s 5.2 MiB/s 222.0x
pybtex write 0.26.1 3.790 s 0.7 MiB/s 1691.3x

The workflow table below sums the median parse and write measurements for each parser/version. It is a round-trip estimate for parse-edit-write workloads, not a separate end-to-end benchmark.

Workflow Median time Throughput Relative time
citerra source-preserving parse + raw-preserving write 0.013 s 193.6 MiB/s 1.0x
citerra structured parse + normalized write 0.018 s 143.8 MiB/s 1.3x
bibtexparser 2.0.0b9 parse + write 0.864 s 3.0 MiB/s 65.0x
pybtex 0.26.1 parse + write 4.653 s 0.6 MiB/s 349.7x
bibtexparser 1.4.4 parse + write 10.864 s 0.2 MiB/s 816.5x

Reproduction commands are listed in Reproducing Benchmarks.

Install

pip install citerra

The distribution name and import name are both citerra:

import citerra

Parse

import citerra

document = citerra.parse(
    '@article{paper, author = "Jane Doe", title = "Example Paper", year = 2026}',
    expand_values=True,
)

entry = document.entry("paper")
assert entry is not None
assert entry.entry_type == "article"
assert entry.get("title") == "Example Paper"
assert entry.date_parts().year == 2026

File helpers are available:

from pathlib import Path
import citerra

document = citerra.parse_path("references.bib", tolerant=True)
Path("normalized.bib").write_text(citerra.dumps(document), encoding="utf-8")

File-like helpers are also available:

with open("references.bib", encoding="utf-8") as handle:
    document = citerra.load(handle, tolerant=True)

text = citerra.dumps(document)

Document Model

  • Document contains entries, comments, preambles, string definitions, source-order blocks, diagnostics, and validation helpers.
  • Entry exposes the citation key, entry type, fields, source text, semantic helpers, and field mutation methods.
  • Field exposes the original field name, parsed value, optional raw source text, and optional source location.
  • Value represents string literals, numbers, variables, and concatenations.
  • Diagnostic reports parse or validation problems with stable codes and source locations when available.

Tolerant Parsing And Diagnostics

text = '''
@article{ok, title = "Good"}
@article{bad, title = "Missing close"
@book{recovered, title = "Recovered"}
'''

document = citerra.parse(
    text,
    tolerant=True,
    capture_source=True,
    preserve_raw=True,
    source="refs/main.bib",
)

if document.status != "ok":
    for diagnostic in document.diagnostics:
        span = diagnostic.source
        if span is None:
            print(diagnostic.code, diagnostic.message)
        else:
            print(diagnostic.code, span.line, span.column, diagnostic.message)

Raw Text And Source-Preserving Writes

text = '@article{paper, title = "Example Paper"}'

document = citerra.parse(
    text,
    tolerant=True,
    capture_source=True,
    preserve_raw=True,
)

entry = document.entry("paper")
if entry is not None:
    print(entry.raw)
    print(entry.field("title").raw_value)

Use WriterConfig(preserve_raw=True) for low-churn output that reuses retained source text where possible. Use WriterConfig(preserve_raw=False) for normalized structured output.

document.rename_key("paper", "paper-v2")
document.set_field("paper-v2", "note", "accepted")
document.remove_export_fields(["abstract", "keywords"])

config = citerra.WriterConfig(
    preserve_raw=True,
    trailing_comma=True,
)
output = document.write(config)

Plain Records

Some application code wants ordinary dictionaries for filtering, indexing, or bulk transforms. citerra provides explicit helpers for that shape without changing the document model:

document = citerra.parse_path("references.bib")
records = citerra.document_to_dicts(document)

selected = [record for record in records if record.get("year") == "2026"]
text = citerra.write_entries(
    selected,
    field_order=["author", "title", "journal", "year", "doi"],
    sort_by=["ID"],
    trailing_comma=True,
)

Plain records use ENTRYTYPE and ID keys for the entry type and citation key.

Helpers

assert citerra.normalize_doi("https://doi.org/10.1000/XYZ.") == "10.1000/xyz"
assert citerra.latex_to_unicode("Jos\\'e") == "José"

names = citerra.parse_names("Jane Doe and {Research Group}")
assert names[1].literal == "Research Group"

date = citerra.parse_date("2026-05-13")
assert (date.year, date.month, date.day) == (2026, 5, 13)

Reproducing Benchmarks

The comparison script uses whichever optional packages are installed in the active environment. bibtexparser 1.x and 2.x use the same package name, so their rows are measured in separate environments.

python python/benchmarks/compare_parsers.py tests/fixtures/tugboat.bib --iterations 15 --warmups 3
python python/benchmarks/compare_parsers.py tests/fixtures/tugboat.bib --write --iterations 15 --warmups 3

Implementation

citerra is implemented as a native extension. Wheels include the parser engine, so ordinary Python installs do not require a Rust toolchain.

Rust Crate

The Rust crate is published as bibtex-parser on crates.io:

[dependencies]
bibtex-parser = "0.2"

See RUST.md for Rust usage.

Local Build

Use the project manifest for local development:

guix shell -m manifest.scm -- maturin build --release --out target/wheels

For local tests without installing into the user environment, unpack the built wheel into a temporary import directory and run pytest with that directory on PYTHONPATH:

rm -rf target/python-test
python3 - <<'PY'
from pathlib import Path
from zipfile import ZipFile

wheel = sorted(Path("target/wheels").glob("citerra-*.whl"))[-1]
target = Path("target/python-test")
target.mkdir(parents=True, exist_ok=True)
with ZipFile(wheel) as archive:
    archive.extractall(target)
PY
guix shell -m manifest.scm -- env PYTHONPATH=target/python-test python3 -m pytest tests/python

License

Licensed under either of Apache-2.0 or MIT, at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

citerra-0.3.1.tar.gz (342.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

citerra-0.3.1-cp38-abi3-win_amd64.whl (468.3 kB view details)

Uploaded CPython 3.8+Windows x86-64

citerra-0.3.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (544.9 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

citerra-0.3.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (468.8 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

citerra-0.3.1-cp38-abi3-macosx_11_0_arm64.whl (498.3 kB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

citerra-0.3.1-cp38-abi3-macosx_10_12_x86_64.whl (527.6 kB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file citerra-0.3.1.tar.gz.

File metadata

  • Download URL: citerra-0.3.1.tar.gz
  • Upload date:
  • Size: 342.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for citerra-0.3.1.tar.gz
Algorithm Hash digest
SHA256 20a706640c4c7c5d998e30fb655048669625c2c65e0c10c3372ab8880728c2c8
MD5 45636c7bd4885cd09ca68300b51b2d53
BLAKE2b-256 b7b9fb9c8d3a199973a876156631cbb1a12322daa223cb385f8a74720fc2ba70

See more details on using hashes here.

File details

Details for the file citerra-0.3.1-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: citerra-0.3.1-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 468.3 kB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for citerra-0.3.1-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 3c6899793ab2796942b1c47ea6bcd955c051eb5d12ae9bb044517a46ab6a39a7
MD5 5d9b6b78d8654944724983b36ec4f59c
BLAKE2b-256 5b0519b77206c912d667cec6160046586a643236dd2708e2711acfdc681150f6

See more details on using hashes here.

File details

Details for the file citerra-0.3.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for citerra-0.3.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 282b6a59a6469ef411e1b30c4e929964b5971cc24b5d96dc058a64d058f5c682
MD5 d9e77e17dec671073a1cc5956f0d3567
BLAKE2b-256 8f2dded31c0a1394077bb679e596119e3c46c7b37a34182af44de3a5aee672f7

See more details on using hashes here.

File details

Details for the file citerra-0.3.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for citerra-0.3.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 975474547950c07f56c467e122502f95f53ea46592ddf783c8b305b8e2227b77
MD5 ca3d65f0efa07b24eee29303ad920e05
BLAKE2b-256 7215cf93968014bf55acecde2af0e22e5ddb453c76dea1ad871ee0c65e3dce9d

See more details on using hashes here.

File details

Details for the file citerra-0.3.1-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for citerra-0.3.1-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8424351ff9463626a8b55ac9d6800fa732bc3da12f1a1728c01b776dd0b4b81f
MD5 01ea6f35e4a53f4431e1e6fc21845170
BLAKE2b-256 1bdad2f4be337f25e25333b361b5d0f6de99d21eb86fe20afe9cc6afcc01eeba

See more details on using hashes here.

File details

Details for the file citerra-0.3.1-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for citerra-0.3.1-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 1e7b4bc488bbe7f93e9331cd4af703587cc45641158a5073364b48686f5adec1
MD5 069cb2cf5a271de216495ccce2565e03
BLAKE2b-256 bfd288aaf5fbd7b889b6d943ccae6a3fe31ee76204f13f34d1c36a8c12a607dc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page