Skip to main content

Fast XML flattening library with Python bindings

Project description

fast-xml-flattener

PyPI Python License: MIT Rust CI codecov rust codecov python

Flatten nested XML into CSV, JSON, Parquet, or Python dicts — in milliseconds, not seconds.

fast-xml-flattener is a Rust-powered Python library that converts XML documents into flat, analysis-ready representations. It uses a zero-copy streaming parser and builds output structures in a single tree walk, with no intermediate serde_json::Value or DOM allocation. The result: throughput that leaves pure-Python parsers far behind.


Why fast-xml-flattener?

XML → flat dict (median of 7 runs, CPython 3.13)

Library 0.5 MB 5.4 MB 27 MB
fast-xml-flattener 11 ms 225 ms 1 089 ms
lxml + manual flatten 27 ms 407 ms 2 108 ms
xmltodict + manual flatten 63 ms 997 ms 4 952 ms

XML → flat JSON string (median of 7 runs)

Library 0.5 MB 5.4 MB 27 MB
fast-xml-flattener 13 ms 164 ms 884 ms
xmltodict + json.dumps 93 ms 1 147 ms 5 374 ms

Dell Vostro i7-1260P, 64 GB RAM, Linux, CPython 3.13. Synthetic XML with nested records (id, user, address, order fields). See benches/benchmark.py.

4–7× faster than xmltodict, 2–2.5× faster than lxml across all tested sizes. The gap widens with document size because the Rust parser operates at memory-bandwidth speed with zero DOM allocation. The GIL is held only for dict-returning functions (to_dict, to_flatten_dict); all other outputs release it entirely, making the library safe to use from thread pools.


Features

  • Flatten nested XML into JSON, flatten-JSON, native Python dict, flatten-dict, CSV, or Parquet
  • Dot-notation object access — navigate parsed XML like obj.user.address.city with XmlObject
  • File streaming — pass a Path or filename string; Rust reads the file in buffered chunks without loading it into Python memory
  • Single-pass streaming parser — no DOM, no intermediate Value allocation
  • GIL-free for string/CSV/Parquet outputs — safe to use from thread pools
  • xmltodict-compatible semantics: @attr, #text, auto-list for repeated tags
  • Namespace stripping, CDATA, entity references, comments — all handled correctly
  • Supports Python 3.10+

Input

Every function accepts XML content or a file path — no manual open() required:

# XML string
fxf.to_dict("<root><a>1</a></root>")

# pathlib.Path — Rust reads the file in buffered chunks
fxf.to_dict(Path("data.xml"))

# plain str path (does not start with '<')
fxf.to_dict("data.xml")
Input type Behaviour
str starting with < Parsed as XML content
str not starting with < Treated as a file path
pathlib.Path / os.PathLike Always treated as a file path

File I/O happens entirely in Rust via a buffered reader — the file is never fully loaded into Python memory.

Output Formats

Function Returns Description
to_json(xml) str 1:1 JSON preserving XML structure (@attr, #text)
to_flatten_json(xml, separator=".") str Flat JSON with dot-notation keys (user.address.city)
to_dict(xml) dict 1:1 nested Python dict — built directly in Rust, no JSON round-trip
to_flatten_dict(xml, separator=".") dict Flat Python dict with dot-notation keys
to_csv(xml, include_attrs=True) str Tabular CSV, one row per XML record
to_parquet(xml, path, include_attrs=True) None Columnar Parquet file for big-data workflows
to_object(xml) XmlObject Dot-notation Python object with attribute and text access

Installation

pip install fast-xml-flattener

Quick Start

import fast_xml_flattener as fxf

xml = """
<root>
  <user>
    <id>1</id>
    <name>Alice</name>
    <address>
      <city>Warsaw</city>
      <zip>00-001</zip>
    </address>
  </user>
</root>
"""

# 1:1 JSON string — preserves nesting
result = fxf.to_json(xml)
# '{"user": {"id": "1", "name": "Alice", "address": {"city": "Warsaw", "zip": "00-001"}}}'

# Flattened JSON string with dot-notation keys
flat = fxf.to_flatten_json(xml)
# '{"user.id": "1", "user.name": "Alice", "user.address.city": "Warsaw", "user.address.zip": "00-001"}'

# Native Python dict (1:1 nested) — no JSON round-trip
d = fxf.to_dict(xml)
print(d["user"]["name"])             # Alice
print(d["user"]["address"]["city"])  # Warsaw

# Flattened native Python dict
fd = fxf.to_flatten_dict(xml, separator=".")
print(fd["user.address.city"])       # Warsaw

# CSV — one row per <user> element
csv = fxf.to_csv(xml, include_attrs=True)

# Parquet — ready for pandas / Spark / DuckDB
fxf.to_parquet(xml, path="output.parquet", include_attrs=True)

# Dot-notation object access
obj = fxf.to_object(xml)
print(obj.root.user.name)              # Alice
print(obj.root.user.address.city)      # Warsaw

# All functions also accept a file path — Rust streams the file without
# loading it into Python memory
from pathlib import Path

d = fxf.to_dict(Path("data.xml"))
obj = fxf.to_object("data.xml")        # plain str path works too

XmlObject — dot-notation access

to_object() parses XML and returns an XmlObject that wraps the result of to_dict(). XML parsing is done in Rust; the object layer adds minimal Python overhead.

xml = '''
<catalog>
  <book id="1" lang="en">
    <title>Clean Code</title>
    <author>Robert C. Martin</author>
  </book>
  <book id="2" lang="pl">
    <title>Czysty Kod</title>
    <author>Robert C. Martin</author>
  </book>
</catalog>
'''

obj = fxf.to_object(xml)

# Navigate nested structure with dot notation
books = obj.catalog.book          # list of XmlObject (repeated tag)
print(books[0].title)             # Clean Code
print(books[1].title)             # Czysty Kod

# Access XML attributes via _attrs (no @ prefix)
print(books[0]._attrs)            # {"id": "1", "lang": "en"}
print(books[0]._attrs["lang"])    # en

# Access text content via _text (useful when element has both text and attrs)
print(books[0].title._text)       # Clean Code

# Get the underlying raw dict via .raw
print(books[0].raw)               # {"@id": "1", "@lang": "en", "title": "Clean Code", ...}
Property / access Returns Description
obj.child_tag XmlObject, list[XmlObject], or str Child element; list when tag repeats; str for pure-text leaves
obj._attrs dict[str, str] XML attributes of this element (keys without @ prefix)
obj._text str | None Text content (#text) of this element
obj.raw dict | str Underlying value from to_dict() — str for pure-text leaves

Loading Parquet with pandas

import pandas as pd

df = pd.read_parquet("output.parquet")
print(df.head())

Using with DuckDB

import duckdb

duckdb.sql("SELECT * FROM 'output.parquet'").show()

Development

Requirements

  • Python 3.10+ (3.13 recommended for development)
  • Rust (stable)
  • maturin

Setup with pyenv (recommended)

# Install pyenv: https://github.com/pyenv/pyenv
pyenv install 3.13
pyenv local 3.13

# Create and activate virtual environment
pyenv virtualenv 3.13 xml-flattener
pyenv activate xml-flattener

# Install uv and dev dependencies
pip install uv
uv pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

Setup without pyenv

python -m venv venv
source venv/bin/activate
pip install uv
uv pip install -e ".[dev]"
pre-commit install

Build

uv run maturin develop   # development build
maturin build --release  # release wheel

Tests

uv run pytest            # Python integration tests (95 cases)
cargo test               # Rust unit tests (25 cases)
uv run ruff check .      # linting
cargo clippy --all-targets -- -D warnings  # Rust linting

Releasing

Releases are fully automated. Append one of these tags anywhere in your commit message (or PR title when squash-merging) to trigger a release:

Tag Bump Example
[fix] patch (0.1.0 → 0.1.1) fix null value in CSV output [fix]
[minor] minor (0.1.0 → 0.2.0) add streaming API [minor]
[major] major (0.1.0 → 1.0.0) redesign public API [major]

The release pipeline then:

  1. Bumps version in Cargo.toml and pyproject.toml
  2. Prepends an entry to CHANGELOG.md
  3. Commits (chore: bump version to X.Y.Z) and creates a vX.Y.Z git tag
  4. Builds wheels for Linux x86_64/aarch64, macOS universal2, Windows x86_64
  5. Publishes to PyPI via OIDC trusted publishing (no secrets needed)
  6. Creates a GitHub Release with the changelog entry and wheel artifacts

One-time PyPI setup (trusted publishing)

  1. Go to PyPI → Your projects → fast-xml-flattener → Publishing → Add a publisher
  2. Set: GitHub owner andree0, repo fast-xml-flattener, workflow release.yml, environment pypi
  3. On GitHub: Settings → Environments → New environment named pypi

No API tokens or secrets are required — OIDC handles authentication.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_xml_flattener-0.1.7.tar.gz (43.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fast_xml_flattener-0.1.7-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

fast_xml_flattener-0.1.7-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.3 MB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.17+ ARM64

fast_xml_flattener-0.1.7-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ x86-64

fast_xml_flattener-0.1.7-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.3 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ ARM64

fast_xml_flattener-0.1.7-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.3 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.17+ ARM64

fast_xml_flattener-0.1.7-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

fast_xml_flattener-0.1.7-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ ARM64

fast_xml_flattener-0.1.7-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (2.6 MB view details)

Uploaded CPython 3.13macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

fast_xml_flattener-0.1.7-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

fast_xml_flattener-0.1.7-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

fast_xml_flattener-0.1.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

fast_xml_flattener-0.1.7-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

fast_xml_flattener-0.1.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

fast_xml_flattener-0.1.7-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64

File details

Details for the file fast_xml_flattener-0.1.7.tar.gz.

File metadata

  • Download URL: fast_xml_flattener-0.1.7.tar.gz
  • Upload date:
  • Size: 43.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.13.1

File hashes

Hashes for fast_xml_flattener-0.1.7.tar.gz
Algorithm Hash digest
SHA256 13279268b96c263a0ecdffe97c8c6cd4a775b00961e146998d8522d52675c953
MD5 3b3926a5853ac0b47ec26967f5b5739e
BLAKE2b-256 08d9919cec23edf9b1e94d750f345cb3de738f3fcced4f32f48b56caec68c67a

See more details on using hashes here.

File details

Details for the file fast_xml_flattener-0.1.7-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fast_xml_flattener-0.1.7-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1fa2cb62ef5a5e7508ac5b638c0d90fb01b43f6cca0649db90a751b1e7421a35
MD5 9d6a438e8fb9c20b8b149d7f8e62ca1f
BLAKE2b-256 bd912a1e709bff7dc088c9a2c16a92444687d7d9b52b6eb85ab202dbb6a19ba4

See more details on using hashes here.

File details

Details for the file fast_xml_flattener-0.1.7-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for fast_xml_flattener-0.1.7-pp311-pypy311_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 971bbbd04d5338bceb27c5a15591033b663870f64729c8e20fc2aae0ee6e3977
MD5 0b967a65280dcd5e8c6d005fd772f6d6
BLAKE2b-256 f7fe8c3f1a40e4cc4895776577e6f2a75e3f8904cc7fd59c8979feff1c510a41

See more details on using hashes here.

File details

Details for the file fast_xml_flattener-0.1.7-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for fast_xml_flattener-0.1.7-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 12e6bdb36fd0e96ec8a3f9843d0a6bb0fba41f9b3d42b21ffbbfa20a7e8e8e87
MD5 5cd97a121c443c21575560d19accefeb
BLAKE2b-256 f1c3548b3ae57a52e1eb038133b5b727d448149ef9d277815cdc8b957d266d7a

See more details on using hashes here.

File details

Details for the file fast_xml_flattener-0.1.7-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fast_xml_flattener-0.1.7-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 02f78c1c8cc730b68dcd96b92fb9abdd167b87d51634f05b34bfbc7fa51b94e9
MD5 4323150ce8cc504608c40cf60c7a8f02
BLAKE2b-256 8b86029e2674285b3aec23af6144c89ab737cf945a9d127c362bea30cbb1cbaa

See more details on using hashes here.

File details

Details for the file fast_xml_flattener-0.1.7-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for fast_xml_flattener-0.1.7-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 08e84993ed0057152791906904f87fc96f0774937d53d971de7c1a244862ea16
MD5 e330a144baa791235a730f4450fb3319
BLAKE2b-256 1b57ff80c9044ed388633cc265b04bace814b662b9c4c17c98156602abc3d32e

See more details on using hashes here.

File details

Details for the file fast_xml_flattener-0.1.7-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for fast_xml_flattener-0.1.7-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 aefd54c2cd73b33464020ffb4e5b3c11c5a9394e705b9a488c63471862427ef1
MD5 1314f051c086579269a3545e1efaed8b
BLAKE2b-256 463aacdf1fa8b67cddeef1ab9280d72963525f2bd8e87ae483ac60ef4ca149ee

See more details on using hashes here.

File details

Details for the file fast_xml_flattener-0.1.7-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fast_xml_flattener-0.1.7-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e78c3bca628689e409b2fcbec344a475c53a2578e7e651a527772eda64d7d6b8
MD5 80300a01517967d43da4bf88be566388
BLAKE2b-256 6a6d684d6b2c99e01b34e5c7952ff62ff384d501a3eded47ae6ac06114c3ddcc

See more details on using hashes here.

File details

Details for the file fast_xml_flattener-0.1.7-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for fast_xml_flattener-0.1.7-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 c8187fad5209e2f4ad1009fa4526a2c145c7cdb18b0171d1975fbf0c6ac78aca
MD5 c7710ecae84effe5f00fc65eb30e050e
BLAKE2b-256 5e1590a02dee8ccb4e2d78d427ca56d547d1ee3509599ae930a11e1b92b90aa0

See more details on using hashes here.

File details

Details for the file fast_xml_flattener-0.1.7-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for fast_xml_flattener-0.1.7-cp313-cp313-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 cdceeeb71cd7922ad6bcf870bbe35ed1f5356a68951e0a537159574675bb2914
MD5 bc295338bc6df32461997974c2908a16
BLAKE2b-256 a2506630fd092bd5570a471cfef155b4388bcbca36ceb71fbd6d2a8d1a3b3a2c

See more details on using hashes here.

File details

Details for the file fast_xml_flattener-0.1.7-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fast_xml_flattener-0.1.7-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 afeb18b13337137baa23efa00efec25855209af04fef65c62e7573904fbd1b63
MD5 5121e51ed531dbae1951f5731e83bc01
BLAKE2b-256 43553a7884c047f8f62d37640f2e527ad5792f278460bb763a0aafcaaa67127d

See more details on using hashes here.

File details

Details for the file fast_xml_flattener-0.1.7-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for fast_xml_flattener-0.1.7-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 74c2e92ce20dfc1efef45381739e06a7b6c0c006ba6ee5aa71bc4b8fdf3a1e42
MD5 35bb7ed541411f49754f4c7b8808e389
BLAKE2b-256 4c15bb16b2a1003745e7b4ad8a3f394b490db7a7cce26e07f82f0519abd9958e

See more details on using hashes here.

File details

Details for the file fast_xml_flattener-0.1.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fast_xml_flattener-0.1.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4b79ac05c9c90b654e7d476e2d72658abb9097ac697aa241515445b82f31216f
MD5 192b13b6fb8429e34e2d28d343f8d7dc
BLAKE2b-256 51f53d0eaf5a3bd30069954bde4c45d472145968813bfc8fd04624116db39e94

See more details on using hashes here.

File details

Details for the file fast_xml_flattener-0.1.7-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for fast_xml_flattener-0.1.7-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 d6b24c6614a13aead0ccd05109581f1ef0d4f5f4ccf8054b58031f6f303b1b82
MD5 8d462032b7091f3293cac282841f73b5
BLAKE2b-256 6f87725f8cd06b87f30c0ba6abeb367c93ecef44f5a09a2ae672a9f91f619c8b

See more details on using hashes here.

File details

Details for the file fast_xml_flattener-0.1.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fast_xml_flattener-0.1.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e2d78e81d470b9da40de2ab30efc0a8a9c1eb1a0aa451dbe8f31715a3e480a26
MD5 5aada5b9219bb751d6667e93a634b03c
BLAKE2b-256 7bb3312d64aaf766c1c401dd54163b01aa7ae54d607ffe6372c743be4faacee8

See more details on using hashes here.

File details

Details for the file fast_xml_flattener-0.1.7-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for fast_xml_flattener-0.1.7-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 dee2a99e181ea6cd2625dddeb034497af1efd1be151ac280346d255385b4e0ba
MD5 078395457ecfb612abb6627bdb446f90
BLAKE2b-256 6e129c0ac4cbbdac6b30af0e01420b2d0f41de850c0d57c0c330ef92982ba088

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page