Skip to main content

Advanced data manipulation with XWNode integration, async operations, and universal format conversion

Project description

🚀 xwdata: Universal Data Engine with XWNode Integration

Company: eXonware.com
Author: Eng. Muhammad AlShehri
Email: connect@exonware.com
Version: 0.1.0.1


🎯 What is xwdata?

xwdata is the ultimate data manipulation engine that seamlessly combines format-agnostic operations, powerful graph navigation (XWNode), and intelligent orchestration into one async-first library. Load from any format, manipulate with confidence using copy-on-write semantics, and save to any format - all with one clean API.

The Problem We Solve

Traditional data libraries force you to:

  • ❌ Learn different APIs for each format (json, yaml, xml, etc.)
  • ❌ Write custom code for format conversions
  • ❌ Deal with mutable state causing bugs
  • ❌ Handle format-specific quirks manually
  • ❌ Build complex navigation logic for nested data

The xwdata Solution

One API for all formats - Load JSON, save as YAML, convert to XML
Ultra-fast multi-format - 0.15-0.21ms for all 30+ formats
V8 advanced features - Partial access, typed loading, canonical hashing (all formats!)
XWNode integration - Powerful path navigation and graph operations
Copy-on-write semantics - Safe concurrent access, immutable by default
Universal metadata - Perfect roundtrips preserve format-specific features
Async by design - High-performance async operations throughout
Engine orchestration - Reuses xwsystem serialization (30+ formats!)
Reference resolution - Automatic handling of $ref, @href, *anchors
Beats V7 performance - 24-67% faster on medium/large files!


Quick Start

Installation

# Lite (Default) - Core Only
pip install exonware-xwdata

# Lazy (Recommended for Development) - Auto-install on demand
pip install exonware-xwdata[lazy]

# Full (Recommended for Production) - All dependencies pre-installed
pip install exonware-xwdata[full]

Basic Usage

from exonware.xwdata import XWData

# === Synchronous Creation ===
# From native Python data
data = XWData({'name': 'Alice', 'age': 30, 'city': 'NYC'})

# Get values (async)
import asyncio
name = asyncio.run(data.get('name'))  # 'Alice'

# === Async Operations ===
async def main():
    # Load from file (any format!)
    data = await XWData.load('config.json')
    
    # Navigate and modify (copy-on-write!)
    data = await data.set('api.timeout', 30)
    data = await data.set('api.retries', 3)
    
    # Save to different format
    await data.save('config.yaml')  # JSON → YAML conversion!
    await data.save('config.xml')   # → XML too!

asyncio.run(main())

🌟 Key Features

1. Format-Agnostic Operations

# Load from any format
data = await XWData.load('config.json')    # JSON
data = await XWData.load('config.yaml')    # YAML
data = await XWData.load('config.xml')     # XML
data = await XWData.load('config.toml')    # TOML

# Save to any format
await data.save('output.json')   # → JSON
await data.save('output.yaml')   # → YAML
await data.save('output.xml')    # → XML

Supported Formats:

  • Text: JSON, YAML, XML, TOML, CSV, INI
  • Extended: JSON5 (with comments), JSONL (streaming)
  • Binary: BSON, MessagePack, Pickle (via xwsystem)
  • Schema-based: Avro, Protobuf, Parquet (via xwsystem)

2. XWNode Integration - Powerful Navigation

# Create from nested data
data = XWData({
    'users': [
        {'name': 'Alice', 'age': 30},
        {'name': 'Bob', 'age': 25}
    ]
})

# Navigate with paths
alice_age = await data.get('users.0.age')  # 30

# Check existence
has_email = await data.exists('users.0.email')  # False

# Copy-on-write mutations
data = await data.set('users.0.city', 'NYC')
data = await data.delete('users.1')

3. Copy-on-Write Semantics - Safe Concurrency

# Original data
data1 = XWData({'counter': 0})

# Modify creates new instance
data2 = await data1.set('counter', 1)
data3 = await data1.set('counter', 2)

# Original unchanged
assert await data1.get('counter') == 0
assert await data2.get('counter') == 1
assert await data3.get('counter') == 2

4. Multi-Source Merging

# Merge multiple sources intelligently
data = XWData([
    {'base': 'config'},           # Base dict
    'overrides.yaml',             # Load and merge file
    existing_xwdata_instance,     # Merge another XWData
    {'final': 'override'}         # Final overrides
], merge_strategy='deep')

5. Async-First Design

# All I/O operations are async
async def process_configs():
    # Load multiple files concurrently
    config1 = await XWData.load('config1.json')
    config2 = await XWData.load('config2.yaml')
    
    # Merge them
    merged = await config1.merge(config2)
    
    # Transform
    transformed = await merged.transform(lambda d: {
        k.upper(): v for k, v in d.items()
    })
    
    # Save results
    await transformed.save('result.json')

🏗️ Architecture

Engine Pattern (Inspired by xwquery)

XWData (facade) → XWDataEngine (orchestrator) → Services
                         ↓
                  XWSerializer (xwsystem - reuse!)
                         ↓
                  Format Strategies (metadata & references)
                         ↓
                  XWNode (xwnode - navigation)

Components:

  • XWData - User-facing facade with fluent API
  • XWDataEngine - Core orchestrator (the brain)
  • XWSerializer - Format I/O from xwsystem (reused, not duplicated)
  • FormatStrategies - Lightweight format-specific logic (50 lines each)
  • XWDataNode - Extends XWNode with COW and metadata
  • Services - Metadata, References, Caching, Monitoring

No Handler Duplication: xwdata doesn't reimplement serialization - it orchestrates xwsystem's battle-tested serializers and adds data manipulation features on top!


📚 Advanced Features

Universal Metadata - Perfect Roundtrips

# Preserves format-specific semantics
data = await XWData.load('schema.json')  # Has $ref, @id
await data.save('schema.xml')            # Converts to @href, preserves meaning
await data.save('schema.json')           # Perfect roundtrip!

Reference Resolution

from exonware.xwdata import XWData, XWDataConfig, ReferenceConfig

# Configure reference resolution
config = XWDataConfig.default()
config.reference = ReferenceConfig.eager()  # Resolve immediately

# Load file with $ref, @href, *anchor references
data = await XWData.load('schema.json', config=config)

# References automatically detected and resolved!

Performance Caching

config = XWDataConfig.fast()  # Enable all caching

# First load - cache miss
data1 = await XWData.load('large.json', config=config)

# Second load - cache hit (instant!)
data2 = await XWData.load('large.json', config=config)

Streaming Large Files

# Stream large JSONL files
async for chunk in XWData.stream_load('huge_data.jsonl'):
    process(chunk)

🎓 Configuration

Presets

from exonware.xwdata import XWDataConfig

# Default balanced configuration
config = XWDataConfig.default()

# High security for untrusted data
config = XWDataConfig.strict()

# High performance for speed
config = XWDataConfig.fast()

# Development mode with debugging
config = XWDataConfig.development()

Custom Configuration

from exonware.xwdata import (
    XWDataConfig, SecurityConfig, PerformanceConfig, 
    ReferenceConfig, MetadataConfig, COWConfig
)

config = XWDataConfig(
    security=SecurityConfig(max_file_size_mb=50),
    performance=PerformanceConfig.fast(),
    reference=ReferenceConfig.lazy(),
    metadata=MetadataConfig.full(),
    cow=COWConfig.immutable()
)

🔧 Development

# Install in development mode
pip install -e .

# Run tests
python tests/runner.py

# Run specific test layers
python tests/runner.py --core          # Fast core tests
python tests/runner.py --unit          # Unit tests
python tests/runner.py --integration   # Integration tests

# Run verification
python tests/verify_installation.py

🚀 Project Phases

Current Phase: 🧪 Version 0 - Experimental Stage

  • Focus: Engine architecture, async operations, xwsystem integration
  • Status: 🟢 ACTIVE - Foundation complete with engine pattern

Development Roadmap:

  • Version 1 (Q1 2026): Production Ready - Enterprise deployment
  • Version 2 (Q2 2026): Mars Standard Draft - Cross-platform interoperability
  • Version 3 (Q3 2026): RUST Core & Facades - High-performance multi-language
  • Version 4 (Q4 2026): Mars Standard Implementation - Full compliance

📖 View Complete Project Phases


🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Run the test suite
  6. Submit a pull request

📄 License

MIT License - see LICENSE file for details.


🔗 eXonware Ecosystem

xwdata integrates seamlessly with:

  • xwsystem - Core utilities, serialization (24+ formats), security
  • xwnode - Node structures (57 strategies), graph operations
  • xwquery - Query languages (35+ languages) - Coming soon!
  • xwschema - Schema validation - Coming soon!

Built with ❤️ by eXonware.com - Making data manipulation effortless

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

exonware_xwdata-0.1.0.1.tar.gz (125.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

exonware_xwdata-0.1.0.1-py3-none-any.whl (117.5 kB view details)

Uploaded Python 3

File details

Details for the file exonware_xwdata-0.1.0.1.tar.gz.

File metadata

  • Download URL: exonware_xwdata-0.1.0.1.tar.gz
  • Upload date:
  • Size: 125.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.18

File hashes

Hashes for exonware_xwdata-0.1.0.1.tar.gz
Algorithm Hash digest
SHA256 8c51d09f2b860a3dda94c61e7a533b80500f25c2b037fd4c0020fd6f2a5a6de9
MD5 0907cc24c5f8d7624e92aa556a8fa08c
BLAKE2b-256 a5ddef702ff2e43784e8c2bcc12dfb13bcd82826e20652d40f3fba4f739be14d

See more details on using hashes here.

File details

Details for the file exonware_xwdata-0.1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for exonware_xwdata-0.1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 530bb6f09b2ba55b1f752c32e986923b4a34d33d8652608a573d2b86e27b5b44
MD5 9f6544a0d14ba918c14e74033e19a123
BLAKE2b-256 7be6c405e6a13b189db0bd1de8328aef0d345d26f056609cee8cac5b5a05ccac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page