BibTeX parser for Python

These details have not been verified by PyPI

Project links

Project description

citerra

BibTeX parser for Python.

citerra parses, validates, edits, and writes BibTeX documents. It supports strict parsing by default, opt-in tolerant recovery, diagnostics with source locations, raw-text retention, source-preserving writes, name/date/identifier helpers, and plain-record projection for application code.

The package is distributed as ABI3 wheels for Python 3.8 and newer.

Performance Snapshot

Measured on tests/fixtures/tugboat.bib: 2,701,551 bytes, 73,993 lines, and 3,644 entries. Hardware was AMD Ryzen 5 5600G, 6 cores / 12 threads. Measured on 2026-05-13 with Python 3.11.14; throughput is input-size normalized.

For a BibTeX parser, the relevant speed measurements are:

Workload	Why it matters	`citerra` result
Structured parse	Load a bibliography into entries and fields for application logic	0.058 s, 44.3 MiB/s
Source-preserving parse	Keep raw text, source locations, diagnostics, and source-order blocks for tools	0.065 s, 39.9 MiB/s
Raw-preserving write	Write retained source text after low-churn edits	0.003 s, 953.2 MiB/s
Normalized write	Serialize structured data with configured formatting	0.014 s, 181.3 MiB/s

The comparison used citerra 0.2.3, bibtexparser 1.4.4, bibtexparser 2.0.0b9, and pybtex 0.26.1. citerra structured parse disables source capture and raw preservation for the closest parser-output comparison. Relative time is normalized to the first row in each table.

Python parser / mode	Version	Output retained	Median parse time	Throughput	Approx. entries/s	Relative time
`citerra` structured parse	0.2.3	Entries, fields, strings, comments, preambles	0.058 s	44.3 MiB/s	62.8k	1.0x
`citerra` source-preserving parse	0.2.3	Structured data, raw text, locations, diagnostics	0.065 s	39.9 MiB/s	56.1k	1.1x
`bibtexparser` parse	2.0.0b9	Entries/library model	0.372 s	6.9 MiB/s	9.8k	6.4x
`pybtex` parse	0.26.1	Bibliography data	0.859 s	3.0 MiB/s	4.2k	14.8x
`bibtexparser` parse	1.4.4	Entries/database model	10.483 s	0.2 MiB/s	0.35k	180.1x

Python writer / mode	Version	Median write time	Throughput	Relative time
`citerra` raw-preserving write	0.2.3	0.003 s	953.2 MiB/s	1.0x
`citerra` normalized write	0.2.3	0.014 s	181.3 MiB/s	5.3x
`bibtexparser` write	1.4.4	0.106 s	24.3 MiB/s	39.2x
`bibtexparser` write	2.0.0b9	0.493 s	5.2 MiB/s	182.2x
`pybtex` write	0.26.1	3.942 s	0.7 MiB/s	1458.5x

The workflow table below sums the median parse and write measurements from the same run. It is a round-trip estimate for parse-edit-write workloads, not a separate end-to-end benchmark.

Workflow	Median time	Throughput	Relative time
`citerra` source-preserving parse + raw-preserving write	0.068 s	37.9 MiB/s	1.0x
`citerra` structured parse + normalized write	0.072 s	35.8 MiB/s	1.1x
`bibtexparser` 2.0.0b9 parse + write	0.865 s	3.0 MiB/s	12.7x
`pybtex` 0.26.1 parse + write	4.801 s	0.5 MiB/s	70.6x
`bibtexparser` 1.4.4 parse + write	10.589 s	0.2 MiB/s	155.7x

Reproduction commands are listed in Reproducing Benchmarks.

Install

pip install citerra

The distribution name and import name are both citerra:

import citerra

Parse

import citerra

document = citerra.parse(
    '@article{paper, author = "Jane Doe", title = "Example Paper", year = 2026}',
    expand_values=True,
)

entry = document.entry("paper")
assert entry is not None
assert entry.entry_type == "article"
assert entry.get("title") == "Example Paper"
assert entry.date_parts().year == 2026

File helpers are available:

from pathlib import Path
import citerra

document = citerra.parse_path("references.bib", tolerant=True)
Path("normalized.bib").write_text(citerra.dumps(document), encoding="utf-8")

File-like helpers are also available:

with open("references.bib", encoding="utf-8") as handle:
    document = citerra.load(handle, tolerant=True)

text = citerra.dumps(document)

Document Model

Document contains entries, comments, preambles, string definitions, source-order blocks, diagnostics, and validation helpers.
Entry exposes the citation key, entry type, fields, source text, semantic helpers, and field mutation methods.
Field exposes the original field name, parsed value, optional raw source text, and optional source location.
Value represents string literals, numbers, variables, and concatenations.
Diagnostic reports parse or validation problems with stable codes and source locations when available.

Tolerant Parsing And Diagnostics

text = '''
@article{ok, title = "Good"}
@article{bad, title = "Missing close"
@book{recovered, title = "Recovered"}
'''

document = citerra.parse(
    text,
    tolerant=True,
    capture_source=True,
    preserve_raw=True,
    source="refs/main.bib",
)

if document.status != "ok":
    for diagnostic in document.diagnostics:
        span = diagnostic.source
        if span is None:
            print(diagnostic.code, diagnostic.message)
        else:
            print(diagnostic.code, span.line, span.column, diagnostic.message)

Raw Text And Source-Preserving Writes

text = '@article{paper, title = "Example Paper"}'

document = citerra.parse(
    text,
    tolerant=True,
    capture_source=True,
    preserve_raw=True,
)

entry = document.entry("paper")
if entry is not None:
    print(entry.raw)
    print(entry.field("title").raw_value)

Use WriterConfig(preserve_raw=True) for low-churn output that reuses retained source text where possible. Use WriterConfig(preserve_raw=False) for normalized structured output.

document.rename_key("paper", "paper-v2")
document.set_field("paper-v2", "note", "accepted")
document.remove_export_fields(["abstract", "keywords"])

config = citerra.WriterConfig(
    preserve_raw=True,
    trailing_comma=True,
)
output = document.write(config)

Plain Records

Some application code wants ordinary dictionaries for filtering, indexing, or bulk transforms. citerra provides explicit helpers for that shape without changing the document model:

document = citerra.parse_path("references.bib")
records = citerra.document_to_dicts(document)

selected = [record for record in records if record.get("year") == "2026"]
text = citerra.write_entries(
    selected,
    field_order=["author", "title", "journal", "year", "doi"],
    sort_by=["ID"],
    trailing_comma=True,
)

Plain records use ENTRYTYPE and ID keys for the entry type and citation key.

Helpers

assert citerra.normalize_doi("https://doi.org/10.1000/XYZ.") == "10.1000/xyz"
assert citerra.latex_to_unicode("Jos\\'e") == "José"

names = citerra.parse_names("Jane Doe and {Research Group}")
assert names[1].literal == "Research Group"

date = citerra.parse_date("2026-05-13")
assert (date.year, date.month, date.day) == (2026, 5, 13)

Reproducing Benchmarks

The comparison script uses whichever optional packages are installed in the active environment:

python python/benchmarks/compare_parsers.py tests/fixtures/tugboat.bib
python python/benchmarks/compare_parsers.py tests/fixtures/tugboat.bib --write

Implementation

citerra is implemented as a native extension. Wheels include the parser engine, so ordinary Python installs do not require a Rust toolchain.

Rust Crate

The Rust crate is published as bibtex-parser on crates.io:

[dependencies]
bibtex-parser = "0.2"

See RUST.md for Rust usage.

Local Build

Use the project manifest for local development:

guix shell -m manifest.scm -- maturin build --release --out target/wheels

For local tests without installing into the user environment, unpack the built wheel into a temporary import directory and run pytest with that directory on PYTHONPATH:

rm -rf target/python-test
python3 - <<'PY'
from pathlib import Path
from zipfile import ZipFile

wheel = sorted(Path("target/wheels").glob("citerra-*.whl"))[-1]
target = Path("target/python-test")
target.mkdir(parents=True, exist_ok=True)
with ZipFile(wheel) as archive:
    archive.extractall(target)
PY
guix shell -m manifest.scm -- env PYTHONPATH=target/python-test python3 -m pytest tests/python

License

Licensed under either of Apache-2.0 or MIT, at your option.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.1

May 13, 2026

0.3.0

May 13, 2026

This version

0.2.3

May 13, 2026

0.2.2

May 13, 2026

0.2.1

May 13, 2026

0.2.0

May 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

citerra-0.2.3.tar.gz (334.8 kB view details)

Uploaded May 13, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

citerra-0.2.3-cp38-abi3-win_amd64.whl (384.0 kB view details)

Uploaded May 13, 2026 CPython 3.8+Windows x86-64

citerra-0.2.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (456.3 kB view details)

Uploaded May 13, 2026 CPython 3.8+manylinux: glibc 2.17+ x86-64

citerra-0.2.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (438.6 kB view details)

Uploaded May 13, 2026 CPython 3.8+manylinux: glibc 2.17+ ARM64

citerra-0.2.3-cp38-abi3-macosx_11_0_arm64.whl (420.0 kB view details)

Uploaded May 13, 2026 CPython 3.8+macOS 11.0+ ARM64

citerra-0.2.3-cp38-abi3-macosx_10_12_x86_64.whl (443.5 kB view details)

Uploaded May 13, 2026 CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file citerra-0.2.3.tar.gz.

File metadata

Download URL: citerra-0.2.3.tar.gz
Upload date: May 13, 2026
Size: 334.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for citerra-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`4c0fd20f177604d1f1ced7da720c186f7d4dd6b9c112b86fa288627065d1d5f3`
MD5	`092d51b0eff14246650603c59ed176ab`
BLAKE2b-256	`591102f2afbcb7966fe195a1c529e8fbfa4bed595f2a8126160dc5741121a375`

See more details on using hashes here.

File details

Details for the file citerra-0.2.3-cp38-abi3-win_amd64.whl.

File metadata

Download URL: citerra-0.2.3-cp38-abi3-win_amd64.whl
Upload date: May 13, 2026
Size: 384.0 kB
Tags: CPython 3.8+, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for citerra-0.2.3-cp38-abi3-win_amd64.whl
Algorithm	Hash digest
SHA256	`a6b25664dd38335abdfcbb8557bad5d4ee92eb500767b82816fe62a51394a49b`
MD5	`23fb2fb09be1c771bfe169ebe9582a98`
BLAKE2b-256	`f8f82687ff09135c9402c0eff4c9b64b866a4dfc246fe04a867a37fd81d965df`

See more details on using hashes here.

File details

Details for the file citerra-0.2.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: citerra-0.2.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: May 13, 2026
Size: 456.3 kB
Tags: CPython 3.8+, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for citerra-0.2.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`4d41dfe52c18cd035872854a08d7bf5bbc913f4e469038fa5863fa60c84be153`
MD5	`240bc5646681237ef59bdd985327f993`
BLAKE2b-256	`d59d06f55604b8e16f80a75b598e5a220e834b17c077052b557b177e348fc083`

See more details on using hashes here.

File details

Details for the file citerra-0.2.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

Download URL: citerra-0.2.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Upload date: May 13, 2026
Size: 438.6 kB
Tags: CPython 3.8+, manylinux: glibc 2.17+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for citerra-0.2.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`1f8d383e38ed26f55906ce250928a8ad84d0e03f0a9bb4b86ce6743dc042ed71`
MD5	`d920d4343c35f29cd21dd3ad9d9d5fc3`
BLAKE2b-256	`f133710c95967c956a0a399ce92f796e94afe71fa08a3d090780f1b575a3076c`

See more details on using hashes here.

File details

Details for the file citerra-0.2.3-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

Download URL: citerra-0.2.3-cp38-abi3-macosx_11_0_arm64.whl
Upload date: May 13, 2026
Size: 420.0 kB
Tags: CPython 3.8+, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for citerra-0.2.3-cp38-abi3-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`900ac5e9dd55515657142852091299e5d5d350aebaf0f25291a2064481bd79ab`
MD5	`57f41ad7efbac1f890e08a0347ca646a`
BLAKE2b-256	`996689f00069f26a3153cd3cb94dccf00c51898dddd66b615325392e2f3b5c8d`

See more details on using hashes here.

File details

Details for the file citerra-0.2.3-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

Download URL: citerra-0.2.3-cp38-abi3-macosx_10_12_x86_64.whl
Upload date: May 13, 2026
Size: 443.5 kB
Tags: CPython 3.8+, macOS 10.12+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for citerra-0.2.3-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm	Hash digest
SHA256	`7869564373c8a757ea85115e5c3997b5783d61e20531b44f5a311f7328217e57`
MD5	`704968c295c9d4493aed094a41bbf1e3`
BLAKE2b-256	`44bac34ef151e07022ba75dc66921f0880d128ad3ba7bc71d3c02ff001001df7`

See more details on using hashes here.

citerra 0.2.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

citerra

Performance Snapshot

Install

Parse

Document Model

Tolerant Parsing And Diagnostics

Raw Text And Source-Preserving Writes

Plain Records

Helpers

Reproducing Benchmarks

Implementation

Rust Crate

Local Build

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes