Fast streaming parser for Crystal Reports XML, with Rust acceleration
Project description
crxml
Fast streaming parser for Crystal Reports XML exports.
from crxml import CrystalXMLSource, to_dataframe
df = to_dataframe(CrystalXMLSource("report.xml", row_tag="Details"))
print(df.head())
Installation
Prerequisites: Python ≥3.10 and Rust.
pip install crxml
About
crxml streams through Crystal Reports XML files row by row, never loading
the full document into memory. It extracts field data from nested CR field
elements and yields flat dictionaries. A built-in pipeline lets you rename,
cast, filter, and drop fields with | operators. The Rust backend
processes 100 MB in ~0.42 seconds using ~75 MB RSS for streaming.
This library is conceptually based on carlosplanchon/xmlstreamer.
API
CrystalXMLSource
CrystalXMLSource(source, row_tag="Details")
Parses a CR XML file and yields dict[str, str] rows. Accepts a file path
(string or Path), or a file-like object with a .name attribute. The
row_tag parameter controls which XML element is treated as a record
(default: Details).
Pipeline stages
Stages are chained with |:
from crxml.stages import RenameFields, CastTypes, DropFields, FilterRows
pipeline = (
CrystalXMLSource("report.xml")
| RenameFields({"f1": "invoice", "f2": "amount"})
| CastTypes({"amount": float})
| DropFields("tax_rate")
| FilterRows(lambda r: r["amount"] > 100)
)
- RenameFields(mapping), renames dict keys
- CastTypes(types, errors="raise"), casts fields to target types
- DropFields(*fields), removes fields from rows
- FilterRows(predicate), keeps rows matching predicate
Sinks
from crxml import to_dataframe, to_csv, collect
df = to_dataframe(pipeline) # → pd.DataFrame
to_csv(pipeline, "out.csv") # → CSV file
rows = collect(pipeline) # → list[dict]
Parallel mode
df = pipeline.parallel(workers=4) |> to_dataframe
Distributes batches across worker processes. See the docs for requirements.
Benchmarks
| Test | Size | Rows | Time | Rows/s | MB/s | RSS |
|---|---|---|---|---|---|---|
| Stream | 10 MB | 9,010 | 0.043s | 211 K | 234 | 22 MB |
| Stream | 50 MB | 45,328 | 0.223s | 203 K | 224 | 45 MB |
| Stream | 100 MB | 90,384 | 0.418s | 216 K | 239 | 75 MB |
| To list | 10 MB | 9,010 | 0.052s | 174 K | 192 | 32 MB |
| To list | 50 MB | 45,328 | 0.249s | 182 K | 201 | 98 MB |
| To list | 100 MB | 90,384 | 0.478s | 189 K | 209 | 181 MB |
| Pipeline | 10 MB | 9,010 | 0.060s | 150 K | 166 | 32 MB |
| Pipeline | 50 MB | 45,328 | 0.295s | 154 K | 169 | 96 MB |
| Pipeline | 100 MB | 90,384 | 0.579s | 156 K | 173 | 176 MB |
| DataFrame | 10 MB | 9,010 | 0.320s | 28 K | 31 | 86 MB |
| DataFrame | 50 MB | 45,328 | 0.538s | 84 K | 93 | 152 MB |
| DataFrame | 100 MB | 90,384 | 0.829s | 109 K | 121 | 234 MB |
pandas is imported lazily — memory climbs only when to_dataframe is called.
Publishing
./upload.sh
Builds a manylinux2014 wheel + sdist and uploads to PyPI. Requires maturin and twine. The --manylinux 2014 --zig flag ensures PyPI-compatible platform tags — python -m build does not support manylinux flags via PEP 517.
Documentation
Full documentation is available at the project site, covering installation, usage, stages, custom stages, architecture, performance, FastAPI integration, and the Rust core.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crxml-0.3.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: crxml-0.3.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 236.6 kB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6189d8ae681f85d77fadec56a29ef9e31a200f34648e54123d56cc55d6085ad5
|
|
| MD5 |
0e47eb9e28f1c64fd70a8c0b972f740d
|
|
| BLAKE2b-256 |
cbfba65463b422ccacb3e3d20be04cd3f446d43228189fee7760cd041debba7e
|