High-performance DDEX XML parser for Python
Project description
DDEX Parser - Python Bindings
High-performance DDEX XML parser for Python with built-in security features and comprehensive metadata extraction. Parse DDEX files 10x faster than traditional XML parsers with full support for all DDEX versions and profiles.
Installation
pip install ddex-parser
# or for specific version
pip install ddex-parser==0.4.4
Version Notes
v0.4.4: Enhanced validation - Parser now properly fails on missing required fields instead of using placeholder values. Ensure your error handling can catch ValidationError exceptions for incomplete DDEX files.
v0.4.1: Enhanced data structure implementation with complete data access. Built on v0.4.0's security fixes (RUSTSEC-2025-0020) with PyO3 0.24 compatibility.
Quick Start
from ddex_parser import DDEXParser
import pandas as pd
# Parse DDEX file
parser = DDEXParser()
result = parser.parse_file("release.xml")
# Access parsed data
print(f"Release: {result.release_title}")
print(f"Artist: {result.main_artist}")
print(f"Tracks: {len(result.tracks)}")
# Convert to DataFrame for analysis
tracks_df = result.to_dataframe()
print(tracks_df.head())
Features
🚀 High Performance
- 10x faster than standard XML parsers
- Streaming support for large files (>100MB)
- Memory-efficient processing
- Native Rust implementation with Python bindings
🔒 Security Built-in
- XXE (XML External Entity) attack protection
- Entity expansion limits
- Memory-bounded parsing
- Deep nesting protection
📊 Data Science Ready
- Direct pandas DataFrame export
- Structured metadata extraction
- JSON serialization support
- Type hints for better IDE experience
🎵 Music Industry Focused
- Support for all DDEX versions (3.2, 3.3, 4.0+)
- Release, track, and artist metadata
- Rights and usage information
- Territory and deal terms
- Image and audio resource handling
API Reference
DDEXParser
from ddex_parser import DDEXParser
parser = DDEXParser(
max_entity_expansions=1000, # Limit entity expansions for security
max_depth=100, # Maximum XML nesting depth
streaming=True # Enable streaming for large files
)
Parsing Methods
parse_file(path: str) -> DDEXResult
Parse a DDEX XML file from disk.
result = parser.parse_file("path/to/release.xml")
parse_string(xml: str) -> DDEXResult
Parse DDEX XML from a string.
with open("release.xml", "r") as f:
xml_content = f.read()
result = parser.parse_string(xml_content)
parse_async(path: str) -> Awaitable[DDEXResult]
Asynchronous parsing for non-blocking operations.
import asyncio
async def parse_ddex():
result = await parser.parse_async("release.xml")
return result
# Usage
result = asyncio.run(parse_ddex())
DataFrame Integration
Perfect for data analysis workflows:
import pandas as pd
from ddex_parser import DDEXParser
parser = DDEXParser()
result = parser.parse_file("catalog.xml")
# Get tracks as DataFrame
tracks_df = result.to_dataframe("tracks")
print(tracks_df.columns)
# ['track_id', 'title', 'artist', 'duration', 'isrc', 'genre', ...]
# Analyze your catalog
genre_counts = tracks_df['genre'].value_counts()
avg_duration = tracks_df['duration'].mean()
# Export for further analysis
tracks_df.to_csv("catalog_analysis.csv")
tracks_df.to_parquet("catalog_analysis.parquet")
Performance Benchmarks
Performance comparison on a MacBook Pro M2:
| File Size | ddex-parser | lxml | xml.etree | Speedup |
|---|---|---|---|---|
| 10KB | 0.8ms | 8ms | 12ms | 10x-15x |
| 100KB | 3ms | 45ms | 78ms | 15x-26x |
| 1MB | 28ms | 380ms | 650ms | 13x-23x |
| 10MB | 180ms | 3.2s | 5.8s | 18x-32x |
Memory usage is consistently 60-80% lower than traditional parsers.
Integration with ddex-builder
Round-trip compatibility with ddex-builder for complete workflows:
from ddex_parser import DDEXParser
from ddex_builder import DDEXBuilder
# Parse existing DDEX file
parser = DDEXParser()
original = parser.parse_file("input.xml")
# Modify data
modified_data = original.to_dict()
modified_data['tracks'][0]['title'] = "New Title"
# Build new DDEX file
builder = DDEXBuilder()
new_xml = builder.build_from_dict(modified_data)
# Verify round-trip integrity
new_result = parser.parse_string(new_xml)
assert new_result.tracks[0].title == "New Title"
Requirements
- Python 3.8+
- pandas (optional, for DataFrame support)
- PyO3 0.24 compatible runtime
License
This project is licensed under the MIT License - see the LICENSE file for details.
Related Projects
- ddex-builder - Build deterministic DDEX XML files
- ddex-parser (npm) - JavaScript/TypeScript bindings
- DDEX Suite - Complete DDEX processing toolkit
Built for the music industry. Powered by Rust for maximum performance and safety.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ddex_parser-0.4.5-cp38-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: ddex_parser-0.4.5-cp38-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 487.9 kB
- Tags: CPython 3.8+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5031944c6b3145322aadec9b8cf0ac7bc8a779bd3e755ba4f2c31bcde28a6d12
|
|
| MD5 |
63cd9518f397c028777367cd619452fe
|
|
| BLAKE2b-256 |
510fe73d23eafaa99b7064e3bedbbc40ef9f0bd43d357b7f6a53c31300251d27
|