Skip to main content

High-performance DDEX XML parser for Python

Project description

DDEX Parser - Python Bindings

PyPI version Python versions Downloads License: MIT

High-performance DDEX XML parser for Python with built-in security features and comprehensive metadata extraction. Parse DDEX files 10x faster than traditional XML parsers with full support for all DDEX versions and profiles.

Installation

pip install ddex-parser
# or for specific version
pip install ddex-parser==0.4.4

Version Notes

v0.4.4: Enhanced validation - Parser now properly fails on missing required fields instead of using placeholder values. Ensure your error handling can catch ValidationError exceptions for incomplete DDEX files.

v0.4.1: Enhanced data structure implementation with complete data access. Built on v0.4.0's security fixes (RUSTSEC-2025-0020) with PyO3 0.24 compatibility.

Quick Start

from ddex_parser import DDEXParser
import pandas as pd

# Parse DDEX file
parser = DDEXParser()
result = parser.parse_file("release.xml")

# Access parsed data
print(f"Release: {result.release_title}")
print(f"Artist: {result.main_artist}")
print(f"Tracks: {len(result.tracks)}")

# Convert to DataFrame for analysis
tracks_df = result.to_dataframe()
print(tracks_df.head())

Features

🚀 High Performance

  • 10x faster than standard XML parsers
  • Streaming support for large files (>100MB)
  • Memory-efficient processing
  • Native Rust implementation with Python bindings

🔒 Security Built-in

  • XXE (XML External Entity) attack protection
  • Entity expansion limits
  • Memory-bounded parsing
  • Deep nesting protection

📊 Data Science Ready

  • Direct pandas DataFrame export
  • Structured metadata extraction
  • JSON serialization support
  • Type hints for better IDE experience

🎵 Music Industry Focused

  • Support for all DDEX versions (3.2, 3.3, 4.0+)
  • Release, track, and artist metadata
  • Rights and usage information
  • Territory and deal terms
  • Image and audio resource handling

API Reference

DDEXParser

from ddex_parser import DDEXParser

parser = DDEXParser(
    max_entity_expansions=1000,  # Limit entity expansions for security
    max_depth=100,               # Maximum XML nesting depth
    streaming=True               # Enable streaming for large files
)

Parsing Methods

parse_file(path: str) -> DDEXResult

Parse a DDEX XML file from disk.

result = parser.parse_file("path/to/release.xml")

parse_string(xml: str) -> DDEXResult

Parse DDEX XML from a string.

with open("release.xml", "r") as f:
    xml_content = f.read()
result = parser.parse_string(xml_content)

parse_async(path: str) -> Awaitable[DDEXResult]

Asynchronous parsing for non-blocking operations.

import asyncio

async def parse_ddex():
    result = await parser.parse_async("release.xml")
    return result

# Usage
result = asyncio.run(parse_ddex())

DataFrame Integration

Perfect for data analysis workflows:

import pandas as pd
from ddex_parser import DDEXParser

parser = DDEXParser()
result = parser.parse_file("catalog.xml")

# Get tracks as DataFrame
tracks_df = result.to_dataframe("tracks")
print(tracks_df.columns)
# ['track_id', 'title', 'artist', 'duration', 'isrc', 'genre', ...]

# Analyze your catalog
genre_counts = tracks_df['genre'].value_counts()
avg_duration = tracks_df['duration'].mean()

# Export for further analysis
tracks_df.to_csv("catalog_analysis.csv")
tracks_df.to_parquet("catalog_analysis.parquet")

Performance Benchmarks

Performance comparison on a MacBook Pro M2:

File Size ddex-parser lxml xml.etree Speedup
10KB 0.8ms 8ms 12ms 10x-15x
100KB 3ms 45ms 78ms 15x-26x
1MB 28ms 380ms 650ms 13x-23x
10MB 180ms 3.2s 5.8s 18x-32x

Memory usage is consistently 60-80% lower than traditional parsers.

Integration with ddex-builder

Round-trip compatibility with ddex-builder for complete workflows:

from ddex_parser import DDEXParser
from ddex_builder import DDEXBuilder

# Parse existing DDEX file
parser = DDEXParser()
original = parser.parse_file("input.xml")

# Modify data
modified_data = original.to_dict()
modified_data['tracks'][0]['title'] = "New Title"

# Build new DDEX file
builder = DDEXBuilder()
new_xml = builder.build_from_dict(modified_data)

# Verify round-trip integrity
new_result = parser.parse_string(new_xml)
assert new_result.tracks[0].title == "New Title"

Requirements

  • Python 3.8+
  • pandas (optional, for DataFrame support)
  • PyO3 0.24 compatible runtime

License

This project is licensed under the MIT License - see the LICENSE file for details.

Related Projects


Built for the music industry. Powered by Rust for maximum performance and safety.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ddex_parser-0.4.5-cp38-abi3-macosx_11_0_arm64.whl (487.9 kB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

File details

Details for the file ddex_parser-0.4.5-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ddex_parser-0.4.5-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5031944c6b3145322aadec9b8cf0ac7bc8a779bd3e755ba4f2c31bcde28a6d12
MD5 63cd9518f397c028777367cd619452fe
BLAKE2b-256 510fe73d23eafaa99b7064e3bedbbc40ef9f0bd43d357b7f6a53c31300251d27

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page