BibTeX parser for Python
Project description
citerra
BibTeX parser for Python.
citerra parses, validates, edits, and writes BibTeX documents. It supports
strict parsing by default, opt-in tolerant recovery, diagnostics with source
locations, raw-text retention, source-preserving writes, name/date/identifier
helpers, and plain-record projection for application code.
The package is distributed as ABI3 wheels for Python 3.8 and newer.
Performance Snapshot
Measured on tests/fixtures/tugboat.bib: 2,701,551 bytes, 73,993 lines, and
3,644 entries. Hardware was AMD Ryzen 5 5600G, 6 cores / 12 threads. Measured
on 2026-05-14 with Python 3.11.14; throughput is input-size normalized.
For a BibTeX parser, the relevant speed measurements are:
| Workload | Why it matters | citerra result |
|---|---|---|
| Structured parse | Load a bibliography into entries and fields for application logic | 0.008 s, 322.2 MiB/s |
| Source-preserving parse | Keep raw text, source locations, diagnostics, and source-order blocks for tools | 0.011 s, 232.8 MiB/s |
| Raw-preserving write | Write retained source text after low-churn edits | 0.002 s, 1149.8 MiB/s |
| Normalized write | Serialize structured data with configured formatting | 0.010 s, 259.7 MiB/s |
The comparison used citerra 0.3.0, bibtexparser 1.4.4,
bibtexparser 2.0.0b9, and pybtex 0.26.1. citerra structured parse
disables source capture and raw preservation for the closest parser-output
comparison. Relative time is normalized to the first row in each table.
| Python parser / mode | Version | Output retained | Median parse time | Throughput | Approx. entries/s | Relative time |
|---|---|---|---|---|---|---|
citerra structured parse |
0.3.0 | Entries, fields, strings, comments, preambles | 0.008 s | 322.2 MiB/s | 455.7k | 1.0x |
citerra source-preserving parse |
0.3.0 | Structured data, raw text, locations, diagnostics | 0.011 s | 232.8 MiB/s | 329.3k | 1.4x |
bibtexparser parse |
2.0.0b9 | Entries/library model | 0.367 s | 7.0 MiB/s | 9.9k | 45.9x |
pybtex parse |
0.26.1 | Bibliography data | 0.863 s | 3.0 MiB/s | 4.2k | 107.9x |
bibtexparser parse |
1.4.4 | Entries/database model | 10.758 s | 0.2 MiB/s | 0.34k | 1345.2x |
| Python writer / mode | Version | Median write time | Throughput | Relative time |
|---|---|---|---|---|
citerra raw-preserving write |
0.3.0 | 0.002 s | 1149.8 MiB/s | 1.0x |
citerra normalized write |
0.3.0 | 0.010 s | 259.7 MiB/s | 4.4x |
bibtexparser write |
1.4.4 | 0.106 s | 24.3 MiB/s | 47.4x |
bibtexparser write |
2.0.0b9 | 0.497 s | 5.2 MiB/s | 222.0x |
pybtex write |
0.26.1 | 3.790 s | 0.7 MiB/s | 1691.3x |
The workflow table below sums the median parse and write measurements for each parser/version. It is a round-trip estimate for parse-edit-write workloads, not a separate end-to-end benchmark.
| Workflow | Median time | Throughput | Relative time |
|---|---|---|---|
citerra source-preserving parse + raw-preserving write |
0.013 s | 193.6 MiB/s | 1.0x |
citerra structured parse + normalized write |
0.018 s | 143.8 MiB/s | 1.3x |
bibtexparser 2.0.0b9 parse + write |
0.864 s | 3.0 MiB/s | 65.0x |
pybtex 0.26.1 parse + write |
4.653 s | 0.6 MiB/s | 349.7x |
bibtexparser 1.4.4 parse + write |
10.864 s | 0.2 MiB/s | 816.5x |
Reproduction commands are listed in Reproducing Benchmarks.
Install
pip install citerra
The distribution name and import name are both citerra:
import citerra
Parse
import citerra
document = citerra.parse(
'@article{paper, author = "Jane Doe", title = "Example Paper", year = 2026}',
expand_values=True,
)
entry = document.entry("paper")
assert entry is not None
assert entry.entry_type == "article"
assert entry.get("title") == "Example Paper"
assert entry.date_parts().year == 2026
File helpers are available:
from pathlib import Path
import citerra
document = citerra.parse_path("references.bib", tolerant=True)
Path("normalized.bib").write_text(citerra.dumps(document), encoding="utf-8")
File-like helpers are also available:
with open("references.bib", encoding="utf-8") as handle:
document = citerra.load(handle, tolerant=True)
text = citerra.dumps(document)
Document Model
Documentcontains entries, comments, preambles, string definitions, source-order blocks, diagnostics, and validation helpers.Entryexposes the citation key, entry type, fields, source text, semantic helpers, and field mutation methods.Fieldexposes the original field name, parsed value, optional raw source text, and optional source location.Valuerepresents string literals, numbers, variables, and concatenations.Diagnosticreports parse or validation problems with stable codes and source locations when available.
Tolerant Parsing And Diagnostics
text = '''
@article{ok, title = "Good"}
@article{bad, title = "Missing close"
@book{recovered, title = "Recovered"}
'''
document = citerra.parse(
text,
tolerant=True,
capture_source=True,
preserve_raw=True,
source="refs/main.bib",
)
if document.status != "ok":
for diagnostic in document.diagnostics:
span = diagnostic.source
if span is None:
print(diagnostic.code, diagnostic.message)
else:
print(diagnostic.code, span.line, span.column, diagnostic.message)
Raw Text And Source-Preserving Writes
text = '@article{paper, title = "Example Paper"}'
document = citerra.parse(
text,
tolerant=True,
capture_source=True,
preserve_raw=True,
)
entry = document.entry("paper")
if entry is not None:
print(entry.raw)
print(entry.field("title").raw_value)
Use WriterConfig(preserve_raw=True) for low-churn output that reuses retained
source text where possible. Use WriterConfig(preserve_raw=False) for
normalized structured output.
document.rename_key("paper", "paper-v2")
document.set_field("paper-v2", "note", "accepted")
document.remove_export_fields(["abstract", "keywords"])
config = citerra.WriterConfig(
preserve_raw=True,
trailing_comma=True,
)
output = document.write(config)
Plain Records
Some application code wants ordinary dictionaries for filtering, indexing, or
bulk transforms. citerra provides explicit helpers for that shape without
changing the document model:
document = citerra.parse_path("references.bib")
records = citerra.document_to_dicts(document)
selected = [record for record in records if record.get("year") == "2026"]
text = citerra.write_entries(
selected,
field_order=["author", "title", "journal", "year", "doi"],
sort_by=["ID"],
trailing_comma=True,
)
Plain records use ENTRYTYPE and ID keys for the entry type and citation key.
Helpers
assert citerra.normalize_doi("https://doi.org/10.1000/XYZ.") == "10.1000/xyz"
assert citerra.latex_to_unicode("Jos\\'e") == "José"
names = citerra.parse_names("Jane Doe and {Research Group}")
assert names[1].literal == "Research Group"
date = citerra.parse_date("2026-05-13")
assert (date.year, date.month, date.day) == (2026, 5, 13)
Reproducing Benchmarks
The comparison script uses whichever optional packages are installed in the
active environment. bibtexparser 1.x and 2.x use the same package name, so
their rows are measured in separate environments.
python python/benchmarks/compare_parsers.py tests/fixtures/tugboat.bib --iterations 15 --warmups 3
python python/benchmarks/compare_parsers.py tests/fixtures/tugboat.bib --write --iterations 15 --warmups 3
Implementation
citerra is implemented as a native extension. Wheels include the parser
engine, so ordinary Python installs do not require a Rust toolchain.
Rust Crate
The Rust crate is published as bibtex-parser on crates.io:
[dependencies]
bibtex-parser = "0.2"
See RUST.md for Rust usage.
Local Build
Use the project manifest for local development:
guix shell -m manifest.scm -- maturin build --release --out target/wheels
For local tests without installing into the user environment, unpack the built
wheel into a temporary import directory and run pytest with that directory on
PYTHONPATH:
rm -rf target/python-test
python3 - <<'PY'
from pathlib import Path
from zipfile import ZipFile
wheel = sorted(Path("target/wheels").glob("citerra-*.whl"))[-1]
target = Path("target/python-test")
target.mkdir(parents=True, exist_ok=True)
with ZipFile(wheel) as archive:
archive.extractall(target)
PY
guix shell -m manifest.scm -- env PYTHONPATH=target/python-test python3 -m pytest tests/python
License
Licensed under either of Apache-2.0 or MIT, at your option.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file citerra-0.3.1.tar.gz.
File metadata
- Download URL: citerra-0.3.1.tar.gz
- Upload date:
- Size: 342.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20a706640c4c7c5d998e30fb655048669625c2c65e0c10c3372ab8880728c2c8
|
|
| MD5 |
45636c7bd4885cd09ca68300b51b2d53
|
|
| BLAKE2b-256 |
b7b9fb9c8d3a199973a876156631cbb1a12322daa223cb385f8a74720fc2ba70
|
File details
Details for the file citerra-0.3.1-cp38-abi3-win_amd64.whl.
File metadata
- Download URL: citerra-0.3.1-cp38-abi3-win_amd64.whl
- Upload date:
- Size: 468.3 kB
- Tags: CPython 3.8+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c6899793ab2796942b1c47ea6bcd955c051eb5d12ae9bb044517a46ab6a39a7
|
|
| MD5 |
5d9b6b78d8654944724983b36ec4f59c
|
|
| BLAKE2b-256 |
5b0519b77206c912d667cec6160046586a643236dd2708e2711acfdc681150f6
|
File details
Details for the file citerra-0.3.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: citerra-0.3.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 544.9 kB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
282b6a59a6469ef411e1b30c4e929964b5971cc24b5d96dc058a64d058f5c682
|
|
| MD5 |
d9e77e17dec671073a1cc5956f0d3567
|
|
| BLAKE2b-256 |
8f2dded31c0a1394077bb679e596119e3c46c7b37a34182af44de3a5aee672f7
|
File details
Details for the file citerra-0.3.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: citerra-0.3.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 468.8 kB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
975474547950c07f56c467e122502f95f53ea46592ddf783c8b305b8e2227b77
|
|
| MD5 |
ca3d65f0efa07b24eee29303ad920e05
|
|
| BLAKE2b-256 |
7215cf93968014bf55acecde2af0e22e5ddb453c76dea1ad871ee0c65e3dce9d
|
File details
Details for the file citerra-0.3.1-cp38-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: citerra-0.3.1-cp38-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 498.3 kB
- Tags: CPython 3.8+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8424351ff9463626a8b55ac9d6800fa732bc3da12f1a1728c01b776dd0b4b81f
|
|
| MD5 |
01ea6f35e4a53f4431e1e6fc21845170
|
|
| BLAKE2b-256 |
1bdad2f4be337f25e25333b361b5d0f6de99d21eb86fe20afe9cc6afcc01eeba
|
File details
Details for the file citerra-0.3.1-cp38-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: citerra-0.3.1-cp38-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 527.6 kB
- Tags: CPython 3.8+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e7b4bc488bbe7f93e9331cd4af703587cc45641158a5073364b48686f5adec1
|
|
| MD5 |
069cb2cf5a271de216495ccce2565e03
|
|
| BLAKE2b-256 |
bfd288aaf5fbd7b889b6d943ccae6a3fe31ee76204f13f34d1c36a8c12a607dc
|