Skip to main content

High-performance data processing tools for Python, built with Rust

Project description

Oxidize

High-performance data processing tools for Python, built with Rust.

Philosophy

Python data tools often require choosing between performance, installation simplicity, and parallel processing. These tools address that by providing:

  • Performance: Rust implementation with 2-10x speed improvements
  • Easy installation: Pre-built wheels, no compilation required
  • True parallelism: GIL release for concurrent processing
  • Practical focus: Solutions for common data engineering tasks

Tools

oxidize-postal

Address parsing and normalization with international support.

import oxidize_postal

parsed = oxidize_postal.parse_address("781 Franklin Ave Brooklyn NY 11216")
# {'house_number': '781', 'road': 'franklin ave', 'city': 'brooklyn', 'state': 'ny', 'postcode': '11216'}

expansions = oxidize_postal.expand_address("123 Main St NYC NY")
# ['123 main street nyc new york', '123 main street nyc ny', ...]

Improvements over pypostal:

  • pip install with pre-built wheels (no C compilation)
  • GIL released for parallel processing
  • Single module API
  • Cross-platform support

oxidize-xml

Streaming XML to JSON conversion for large files.

import oxidize_xml

# Extract repeated elements to JSON Lines
count = oxidize_xml.parse_xml_file_to_json_file("data.xml", "book", "output.jsonl")

# Stream processing for large files
json_lines = oxidize_xml.parse_xml_file_to_json_string("export.xml", "record")

Improvements over lxml:

  • 2-3x faster streaming parser
  • Processes files larger than available RAM
  • Consistent schema output for data analysis
  • Built-in XML security protections

Technical Approach

Rust + PyO3: Combines Rust's performance and memory safety with Python's ecosystem integration.

GIL Release: All compute operations release Python's Global Interpreter Lock, enabling true parallel processing in threaded environments.

Streaming Architecture: Designed for processing large datasets without loading everything into memory.

Pre-built Wheels: Cross-platform distribution eliminates compilation requirements and system dependencies.

Use Cases

  • ETL pipelines with address normalization
  • Processing large XML exports and API responses
  • Data cleaning workflows requiring parallel processing
  • Web services handling structured data parsing

Future Tools

Planned additions following the same principles:

  • oxidize-csv: High-performance CSV processing
  • oxidize-json: Streaming JSON operations
  • oxidize-regex: Parallel text processing

Contributing

Each tool has its own repository with specific contribution guidelines. General focus areas:

  • Performance improvements with benchmarks
  • API usability for common workflows
  • Documentation and examples
  • Test coverage for edge cases

License

MIT License for all tools.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oxidize-0.6.0.tar.gz (3.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oxidize-0.6.0-py3-none-any.whl (3.3 kB view details)

Uploaded Python 3

File details

Details for the file oxidize-0.6.0.tar.gz.

File metadata

  • Download URL: oxidize-0.6.0.tar.gz
  • Upload date:
  • Size: 3.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for oxidize-0.6.0.tar.gz
Algorithm Hash digest
SHA256 47c8da368a5fdc6fd4d464a39cca1621ccd29ed69f002f5f76b96ed611972136
MD5 b3b90bebdb88dd1b9deff1df59604dd9
BLAKE2b-256 c2af2b375b4676a4de7d20953bc0968b33487822c86d65cfab868b0709a8db34

See more details on using hashes here.

File details

Details for the file oxidize-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: oxidize-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 3.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for oxidize-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f5384e4e2d176bf1cbf15eceb66353d2c8e2a8562d69c64ad54dc1d23a4e1fcf
MD5 5a1330f355ad31c53e3b69b157befb80
BLAKE2b-256 e372292f813774414648aa5628f0078e2a4eea334aaa8ab1da234f41443698a8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page