Skip to main content

BibTeX parser for Python

Project description

citerra

CI PyPI Python License

BibTeX parser for Python.

citerra parses, validates, edits, and writes BibTeX documents. It supports strict parsing by default, opt-in tolerant recovery, diagnostics with source locations, raw-text retention, source-preserving writes, name/date/identifier helpers, and plain-record projection for application code.

The package is distributed as ABI3 wheels for Python 3.8 and newer.

Performance Snapshot

Measured on tests/fixtures/tugboat.bib: 2,701,551 bytes, 73,993 lines, and 3,644 entries. Hardware was AMD Ryzen 5 5600G, 6 cores / 12 threads. Measured on 2026-05-14 with Python 3.11.14; throughput is input-size normalized.

For a BibTeX parser, the relevant speed measurements are:

Workload Why it matters citerra result
Structured parse Load a bibliography into entries and fields for application logic 0.008 s, 322.2 MiB/s
Source-preserving parse Keep raw text, source locations, diagnostics, and source-order blocks for tools 0.011 s, 232.8 MiB/s
Raw-preserving write Write retained source text after low-churn edits 0.002 s, 1149.8 MiB/s
Normalized write Serialize structured data with configured formatting 0.010 s, 259.7 MiB/s

The comparison used citerra 0.3.0, bibtexparser 1.4.4, bibtexparser 2.0.0b9, and pybtex 0.26.1. citerra structured parse disables source capture and raw preservation for the closest parser-output comparison. Relative time is normalized to the first row in each table.

Python parser / mode Version Output retained Median parse time Throughput Approx. entries/s Relative time
citerra structured parse 0.3.0 Entries, fields, strings, comments, preambles 0.008 s 322.2 MiB/s 455.7k 1.0x
citerra source-preserving parse 0.3.0 Structured data, raw text, locations, diagnostics 0.011 s 232.8 MiB/s 329.3k 1.4x
bibtexparser parse 2.0.0b9 Entries/library model 0.367 s 7.0 MiB/s 9.9k 45.9x
pybtex parse 0.26.1 Bibliography data 0.863 s 3.0 MiB/s 4.2k 107.9x
bibtexparser parse 1.4.4 Entries/database model 10.758 s 0.2 MiB/s 0.34k 1345.2x
Python writer / mode Version Median write time Throughput Relative time
citerra raw-preserving write 0.3.0 0.002 s 1149.8 MiB/s 1.0x
citerra normalized write 0.3.0 0.010 s 259.7 MiB/s 4.4x
bibtexparser write 1.4.4 0.106 s 24.3 MiB/s 47.4x
bibtexparser write 2.0.0b9 0.497 s 5.2 MiB/s 222.0x
pybtex write 0.26.1 3.790 s 0.7 MiB/s 1691.3x

The workflow table below sums the median parse and write measurements for each parser/version. It is a round-trip estimate for parse-edit-write workloads, not a separate end-to-end benchmark.

Workflow Median time Throughput Relative time
citerra source-preserving parse + raw-preserving write 0.013 s 193.6 MiB/s 1.0x
citerra structured parse + normalized write 0.018 s 143.8 MiB/s 1.3x
bibtexparser 2.0.0b9 parse + write 0.864 s 3.0 MiB/s 65.0x
pybtex 0.26.1 parse + write 4.653 s 0.6 MiB/s 349.7x
bibtexparser 1.4.4 parse + write 10.864 s 0.2 MiB/s 816.5x

Reproduction commands are listed in Reproducing Benchmarks.

Install

pip install citerra

The distribution name and import name are both citerra:

import citerra

Parse

import citerra

document = citerra.parse(
    '@article{paper, author = "Jane Doe", title = "Example Paper", year = 2026}',
    expand_values=True,
)

entry = document.entry("paper")
assert entry is not None
assert entry.entry_type == "article"
assert entry.get("title") == "Example Paper"
assert entry.date_parts().year == 2026

File helpers are available:

from pathlib import Path
import citerra

document = citerra.parse_path("references.bib", tolerant=True)
Path("normalized.bib").write_text(citerra.dumps(document), encoding="utf-8")

File-like helpers are also available:

with open("references.bib", encoding="utf-8") as handle:
    document = citerra.load(handle, tolerant=True)

text = citerra.dumps(document)

Document Model

  • Document contains entries, comments, preambles, string definitions, source-order blocks, diagnostics, and validation helpers.
  • Entry exposes the citation key, entry type, fields, source text, semantic helpers, and field mutation methods.
  • Field exposes the original field name, parsed value, optional raw source text, and optional source location.
  • Value represents string literals, numbers, variables, and concatenations.
  • Diagnostic reports parse or validation problems with stable codes and source locations when available.

Tolerant Parsing And Diagnostics

text = '''
@article{ok, title = "Good"}
@article{bad, title = "Missing close"
@book{recovered, title = "Recovered"}
'''

document = citerra.parse(
    text,
    tolerant=True,
    capture_source=True,
    preserve_raw=True,
    source="refs/main.bib",
)

if document.status != "ok":
    for diagnostic in document.diagnostics:
        span = diagnostic.source
        if span is None:
            print(diagnostic.code, diagnostic.message)
        else:
            print(diagnostic.code, span.line, span.column, diagnostic.message)

Raw Text And Source-Preserving Writes

text = '@article{paper, title = "Example Paper"}'

document = citerra.parse(
    text,
    tolerant=True,
    capture_source=True,
    preserve_raw=True,
)

entry = document.entry("paper")
if entry is not None:
    print(entry.raw)
    print(entry.field("title").raw_value)

Use WriterConfig(preserve_raw=True) for low-churn output that reuses retained source text where possible. Use WriterConfig(preserve_raw=False) for normalized structured output.

document.rename_key("paper", "paper-v2")
document.set_field("paper-v2", "note", "accepted")
document.remove_export_fields(["abstract", "keywords"])

config = citerra.WriterConfig(
    preserve_raw=True,
    trailing_comma=True,
)
output = document.write(config)

Plain Records

Some application code wants ordinary dictionaries for filtering, indexing, or bulk transforms. citerra provides explicit helpers for that shape without changing the document model:

document = citerra.parse_path("references.bib")
records = citerra.document_to_dicts(document)

selected = [record for record in records if record.get("year") == "2026"]
text = citerra.write_entries(
    selected,
    field_order=["author", "title", "journal", "year", "doi"],
    sort_by=["ID"],
    trailing_comma=True,
)

Plain records use ENTRYTYPE and ID keys for the entry type and citation key.

Helpers

assert citerra.normalize_doi("https://doi.org/10.1000/XYZ.") == "10.1000/xyz"
assert citerra.latex_to_unicode("Jos\\'e") == "José"

names = citerra.parse_names("Jane Doe and {Research Group}")
assert names[1].literal == "Research Group"

date = citerra.parse_date("2026-05-13")
assert (date.year, date.month, date.day) == (2026, 5, 13)

Reproducing Benchmarks

The comparison script uses whichever optional packages are installed in the active environment. bibtexparser 1.x and 2.x use the same package name, so their rows are measured in separate environments.

python python/benchmarks/compare_parsers.py tests/fixtures/tugboat.bib --iterations 15 --warmups 3
python python/benchmarks/compare_parsers.py tests/fixtures/tugboat.bib --write --iterations 15 --warmups 3

Implementation

citerra is implemented as a native extension. Wheels include the parser engine, so ordinary Python installs do not require a Rust toolchain.

Rust Crate

The Rust crate is published as bibtex-parser on crates.io:

[dependencies]
bibtex-parser = "0.2"

See RUST.md for Rust usage.

Local Build

Use the project manifest for local development:

guix shell -m manifest.scm -- maturin build --release --out target/wheels

For local tests without installing into the user environment, unpack the built wheel into a temporary import directory and run pytest with that directory on PYTHONPATH:

rm -rf target/python-test
python3 - <<'PY'
from pathlib import Path
from zipfile import ZipFile

wheel = sorted(Path("target/wheels").glob("citerra-*.whl"))[-1]
target = Path("target/python-test")
target.mkdir(parents=True, exist_ok=True)
with ZipFile(wheel) as archive:
    archive.extractall(target)
PY
guix shell -m manifest.scm -- env PYTHONPATH=target/python-test python3 -m pytest tests/python

License

Licensed under either of Apache-2.0 or MIT, at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

citerra-0.3.0.tar.gz (341.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

citerra-0.3.0-cp38-abi3-win_amd64.whl (468.1 kB view details)

Uploaded CPython 3.8+Windows x86-64

citerra-0.3.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (544.8 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

citerra-0.3.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (468.7 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

citerra-0.3.0-cp38-abi3-macosx_11_0_arm64.whl (498.3 kB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

citerra-0.3.0-cp38-abi3-macosx_10_12_x86_64.whl (527.6 kB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file citerra-0.3.0.tar.gz.

File metadata

  • Download URL: citerra-0.3.0.tar.gz
  • Upload date:
  • Size: 341.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for citerra-0.3.0.tar.gz
Algorithm Hash digest
SHA256 df1b657acd58c3f51229eeb30f46275740b7745def0a4b97023b2b8bd488e99f
MD5 bd552c24c523f6af8c5af1e3f8c98a37
BLAKE2b-256 f989aa52bc04a6f888d7adea2c0d0dacd098b6dae9292284df6db50e3776bdb2

See more details on using hashes here.

File details

Details for the file citerra-0.3.0-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: citerra-0.3.0-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 468.1 kB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for citerra-0.3.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 848e387705073d840f147679ff2577a1d54170f6799a684c16bd0273d637b1e5
MD5 71d935ee2ed593902007bc4182f005ae
BLAKE2b-256 21bdcb2e948ee2425ca489e6484f2d824ecc7dbe07b0f4867e397147be709f4d

See more details on using hashes here.

File details

Details for the file citerra-0.3.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for citerra-0.3.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d3d11f70f801fc0320c167866f680c14afedb7f27644154eb75793bbd5eaabb7
MD5 cd9702c34f01fc941144a8a36525e7e0
BLAKE2b-256 cc89183c5e3267e40fa600c08b46accd1cd34d367626dd5896472cd029b03eea

See more details on using hashes here.

File details

Details for the file citerra-0.3.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for citerra-0.3.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 4b7d9da70ef1cd7c1d8c70155e867b3606ee698b156b199b626f4f3e59fe19a1
MD5 9fe724421fd6ac00d160a99913744712
BLAKE2b-256 6888f045ba4d18bb06f9d3d34e04496d0e6e4042f8fe80599037973c9225f2d9

See more details on using hashes here.

File details

Details for the file citerra-0.3.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for citerra-0.3.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c47ef1f5c82928a5a5e1baa3d87429bfea5c510ba39f2f07e1fe465f6c091b3d
MD5 2f7c94aa66f827f0eb2b5c5c02937600
BLAKE2b-256 d65ed4e1b12de7684045ba1bdc0c5dbbc4beda514641c6e476f65911b226e610

See more details on using hashes here.

File details

Details for the file citerra-0.3.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for citerra-0.3.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 77e0ae11d7bf442cb7ab407b31b6e801b2b0ef3c0aa5273cf7391a76a40ace4d
MD5 e94b29a898f6270f235024933619e791
BLAKE2b-256 a1a93c4eafef8af5a84609f190f56d0c0e4bbe206f824980fa03922ab5731f07

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page