Skip to main content

DataWeave interpreter with a Rust-native engine and Python bridge

Project description

DataWeave-Py

A DataWeave data transformation runtime with a Rust-native engine package and a Python bridge, providing powerful data transformation capabilities without requiring the JVM.

Install from PyPI:

uv add dataweave-py

or

pip install dataweave-py

DataWeave Playground

DataWeave Playground For the best DataWeave Playground a without payload size limits DataWeave Playground, visit: https://dataweavelang.org

Optional extras:

# pandas helpers (DataFrame/Series input normalization)
pip install "dataweave-py[pandas]"

# pydantic helpers
pip install "dataweave-py[pydantic]"

# everything
pip install "dataweave-py[full]"

Overview

DataWeave-Py (dwpy) is a Python-facing interpreter for the DataWeave language, originally developed by MuleSoft for data transformation in the Mule runtime. The runtime is migrating to a Rust core while preserving the existing Python API, enabling:

  • Data transformation: Convert between JSON, XML, CSV and other formats
  • Functional programming: Leverage map, filter, reduce, and other functional operators
  • Pattern matching: Use powerful match expressions with guards and bindings
  • Safe navigation: Handle null values gracefully with null-safe operators
  • Rich built-ins: Access 100+ built-in functions for strings, numbers, dates, arrays, and objects

Requirements

  • Python 3.10 or higher
  • Rust stable toolchain with cargo
  • Dependencies managed via uv (recommended) or pip

Rust Engine And Python Bridge

The default runtime path is the Rust engine exposed through the Python package as dwpy._dwpy_rust. The legacy Python interpreter is still available as an explicit fallback backend.

Build and install the Rust-backed Python bridge into the local virtual environment:

uv venv --python 3.12
source .venv/bin/activate
UV_CACHE_DIR=.uv-cache uv run maturin develop --release

Run the Rust backend from Python:

from dwpy import DataWeaveRuntime

runtime = DataWeaveRuntime(backend="rust")
result = runtime.execute(
    "%dw 2.0\noutput application/json\n---\n{message: upper(payload.message)}",
    {"message": "hello from rust"},
)
print(result)

Backend selection:

  • DataWeaveRuntime() or backend="auto" uses the Rust bridge first and falls back to the legacy Python backend only for explicitly unsupported migration gaps.
  • DataWeaveRuntime(backend="rust") runs strict Rust mode and fails instead of falling back.
  • DataWeaveRuntime(backend="python") uses the legacy Python interpreter.
  • DWPY_BACKEND=rust forces strict Rust mode for process-wide test runs.

Build a distributable wheel with the Rust extension:

UV_CACHE_DIR=.uv-cache uv run maturin build --release

Run the Rust workspace tests:

cargo test --workspace

Run the Python suite against the Rust backend:

DWPY_BACKEND=rust UV_CACHE_DIR=.uv-cache uv run --extra dev pytest

Run the default Python package path, which exercises the Python bridge:

UV_CACHE_DIR=.uv-cache uv run --extra dev pytest

Quick Start

Basic Usage

from dwpy import DataWeaveRuntime

# Create a runtime instance
runtime = DataWeaveRuntime()

# Define a DataWeave script
script = """%dw 2.0
output application/json
---
{
  message: "Hello, " ++ upper(payload.name),
  timestamp: now()
}
"""

# Execute with a payload
payload = {"name": "world"}
result = runtime.execute(script, payload)

print(result)
# Output: {'message': 'Hello, WORLD', 'timestamp': '2025-11-03T...Z'}

Data Transformation Example

from dwpy import DataWeaveRuntime

runtime = DataWeaveRuntime()

# Transform and enrich order data
script = """%dw 2.0
output application/json
---
{
  orderId: payload.id,
  status: upper(payload.status default "pending"),
  total: payload.items reduce ((item, acc = 0) -> 
    acc + (item.price * (item.quantity default 1))
  ),
  itemCount: sizeOf(payload.items)
}
"""

payload = {
    "id": "ORD-123",
    "status": "confirmed",
    "items": [
        {"price": 29.99, "quantity": 2},
        {"price": 15.50, "quantity": 1}
    ]
}

result = runtime.execute(script, payload)
print(result)
# Output: {'orderId': 'ORD-123', 'status': 'CONFIRMED', 'total': 75.48, 'itemCount': 2}

Using Variables

from dwpy import DataWeaveRuntime

runtime = DataWeaveRuntime()

script = """%dw 2.0
output application/json
var requestTime = vars.requestTime default now()
---
{
  user: payload.userId,
  processedAt: requestTime
}
"""

payload = {"userId": "U-456"}
vars = {"requestTime": "2024-05-05T12:00:00Z"}

result = runtime.execute(script, payload, vars=vars)

Pattern Matching

from dwpy import DataWeaveRuntime

runtime = DataWeaveRuntime()

script = """%dw 2.0
output application/json
---
{
  category: payload.price match {
    case var p when p > 100 -> "premium",
    case var p when p > 50 -> "standard",
    else -> "budget"
  }
}
"""

result = runtime.execute(script, {"price": 75})
# Output: {'category': 'standard'}

String Interpolation

from dwpy import DataWeaveRuntime

runtime = DataWeaveRuntime()

# Simple interpolation
script = """%dw 2.0
output application/json
---
{
  greeting: "Hello $(payload.name)!",
  total: "Total: $(payload.price * payload.quantity)",
  status: "Order $(payload.orderId) is $(upper(payload.status))"
}
"""

payload = {
    "name": "Alice",
    "price": 10.5,
    "quantity": 3,
    "orderId": "ORD-123",
    "status": "confirmed"
}

result = runtime.execute(script, payload)
# Output: {
#   'greeting': 'Hello Alice!',
#   'total': 'Total: 31.5',
#   'status': 'Order ORD-123 is CONFIRMED'
# }

String interpolation allows you to embed expressions directly within strings using the $(expression) syntax. The expression can be:

  • Property access: $(payload.name)
  • Nested properties: $(payload.user.email)
  • Expressions: $(payload.price * 1.1)
  • Function calls: $(upper(payload.status))
  • Any valid DataWeave expression

Output Formats

The runtime supports these output directives:

  • application/python (native Python objects)
  • application/json
  • application/csv
  • application/xml
  • text/plain
  • text/markdown

Format-specific notes:

  • output text/plain only works when the final script result is a string.
  • output text/markdown expects a tabular value (list or dict) and renders a Markdown table.
  • output text/markdown header=false is rejected because Markdown table rendering requires headers.
  • payload_format="text/markdown" parses Markdown pipe tables into structured rows (Array<Object> by default, or Array<Array<String>> with payload_format_options={"header": False}).

Supported Features

DataWeave-Py currently supports a wide range of DataWeave language features:

Core Language Features

  • ✅ Header directives (%dw 2.0, output, var, import)
  • ✅ Payload and variable access
  • ✅ Object and array literals
  • ✅ Field selectors (.field, ?.field, [index])
  • ✅ Comments (line // and block /* */)
  • ✅ Default values (payload.field default "fallback")
  • ✅ String interpolation ("Hello $(payload.name)")

Operators

  • ✅ Concatenation (++)
  • ✅ Difference (--)
  • ✅ Arithmetic (+, -, *, /)
  • ✅ Comparison (==, !=, >, <, >=, <=)
  • ✅ Logical (and, or, not)
  • ✅ Range (to)

Control Flow

  • ✅ Conditional expressions (if-else)
  • ✅ Pattern matching (match-case)
  • ✅ Match guards (case var x when condition)

Collection Operations

  • map - Transform elements
  • filter - Select elements
  • reduce - Aggregate values
  • flatMap - Map and flatten
  • distinctBy - Remove duplicates
  • groupBy - Group by criteria
  • orderBy - Sort elements

Built-in Functions

String Functions

upper, lower, trim, contains, startsWith, endsWith, isBlank, splitBy, joinBy, find, match, matches

Numeric Functions

abs, ceil, floor, round, pow, mod, sum, avg, max, min, random, randomInt, isDecimal, isInteger, isEven, isOdd

Array/Object Functions

sizeOf, isEmpty, flatten, indexOf, lastIndexOf, distinctBy, filterObject, keysOf, valuesOf, entriesOf, pluck, maxBy, minBy

Date Functions

now, isLeapYear, daysBetween

Utility Functions

log, logInfo, logDebug, logWarn, logError

Running Tests

The project includes comprehensive test coverage:

# Run all tests
pytest

# Run specific test file
pytest tests/test_runtime_basic.py

# Run with verbose output
pytest -v

# Run with coverage
pytest --cov=dwpy

Browser WASM (Pyodide)

The project includes a browser-worker runtime for WASM execution with Pyodide and wheel-based loading.

  • Worker bootstrap: web/pyodide-worker.mjs
  • Python entrypoint: dwpy.wasm_entry.run_dataweave(...)
  • Full instructions: docs/WASM_PYODIDE.md

Language Server (LSP)

The project now includes a stdio Language Server for DataWeave:

  • Command: dwpy-lsp
  • Module: dwpy.lsp.server
  • Engine shared with Monaco + WASM completion bridge: dwpy.lsp.engine

Install

Install the LSP extra:

uv pip install "dataweave-py[lsp]"

Sidecar context files

For structure-aware payload/vars completion in .dwl files, place these JSON files next to the script:

  • <file>.payload.json
  • <file>.vars.json

Example for transform.dwl:

  • transform.dwl.payload.json
  • transform.dwl.vars.json

If sidecars are missing or invalid, the server falls back to script-only inference.

VS Code client (example)

{
  "languageserver": {
    "dataweave-py": {
      "command": "dwpy-lsp",
      "filetypes": ["dataweave", "dwl"]
    }
  }
}

Neovim client (example)

require("lspconfig").dwpy_lsp.setup({
  cmd = { "dwpy-lsp" },
  filetypes = { "dataweave", "dwl" },
})

Project Structure

dataweave-py/
├── crates/                    # Rust workspace
│   ├── dwpy-core/             # Core Rust value model and engine foundation
│   ├── dwpy-python/           # PyO3 extension exposed as dwpy._dwpy_rust
│   └── dwpy-wasm/             # WASM wrapper foundation
├── dwpy/                      # Main Python package
│   ├── __init__.py           # Package exports
│   ├── parser.py             # DataWeave parser
│   ├── runtime.py            # Runtime backend facade
│   ├── _python_runtime.py    # Legacy Python interpreter backend
│   └── builtins.py           # Built-in functions
├── tests/                     # Test suite
│   ├── test_runtime_basic.py # Core functionality tests
│   ├── test_builtins.py      # Built-in function tests
│   └── fixtures/             # Test data and fixtures
├── runtime-2.11.0-20250825/  # Original JVM runtime reference
├── docs/                      # Documentation
├── pyproject.toml            # Project configuration
└── README.md                 # This file

Development

Setting Up Development Environment

# Create virtual environment
uv venv --python 3.12
source .venv/bin/activate

# Install Python development dependencies
UV_CACHE_DIR=.uv-cache uv sync --extra dev

# Build and install the Rust-backed Python bridge in editable mode
UV_CACHE_DIR=.uv-cache uv run maturin develop --release

Running the Test Suite

# Run all tests
UV_CACHE_DIR=.uv-cache uv run --extra dev pytest

# Force strict Rust backend
DWPY_BACKEND=rust UV_CACHE_DIR=.uv-cache uv run --extra dev pytest

# Run Rust workspace tests
cargo test --workspace

Code Style

The project follows standard Python conventions:

  • PEP 8 style guide
  • Type hints where appropriate
  • Comprehensive docstrings
  • Two-space indentation for consistency with Scala codebase

Comparison with JVM Runtime

DataWeave-Py aims to provide feature parity with the official JVM-based DataWeave runtime. Key differences:

Feature JVM Runtime DataWeave-Py
Language Scala Rust core with Python bridge
Performance High (compiled/JIT) Native Rust engine through PyO3
Startup Time Slower (JVM warmup) Fast native extension loading
Memory Usage Higher (JVM overhead) Lower native runtime footprint
Integration Java/Mule apps Python apps, Rust crate, future WASM wrapper
Module System Full support Rust-native support for the current suite
Type System Static typing Rust-backed inference plus Python API helpers

Roadmap

Current Status (v0.1.0)

  • ✅ Core language parser
  • ✅ Expression evaluation
  • ✅ 60+ built-in functions
  • ✅ Pattern matching
  • ✅ Collection operators

Planned Features

  • 🔄 Full module system support
  • 🔄 Import statements
  • 🔄 Custom function definitions
  • 🔄 XML/CSV format support
  • 🔄 Streaming for large datasets
  • 🔄 Type validation
  • 🔄 Performance optimizations

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Write tests for your changes
  4. Ensure all tests pass (pytest)
  5. Commit your changes (git commit -m 'feat: add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

License

See the original DataWeave runtime license terms. This project is a reference implementation for educational and development purposes.

Resources

Support

For questions, issues, or contributions:

  • Open an issue on GitHub
  • Check existing documentation in the docs/ directory
  • Review test cases in tests/ for usage examples

Note: This is an independent Python implementation and is not officially supported by MuleSoft. For production use cases requiring full DataWeave compatibility, please use the official JVM-based runtime.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataweave_py-1.0.1.tar.gz (307.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dataweave_py-1.0.1-cp310-abi3-win_amd64.whl (1.7 MB view details)

Uploaded CPython 3.10+Windows x86-64

dataweave_py-1.0.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

dataweave_py-1.0.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.9 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

dataweave_py-1.0.1-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (3.4 MB view details)

Uploaded CPython 3.10+macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file dataweave_py-1.0.1.tar.gz.

File metadata

  • Download URL: dataweave_py-1.0.1.tar.gz
  • Upload date:
  • Size: 307.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dataweave_py-1.0.1.tar.gz
Algorithm Hash digest
SHA256 3cd3322cc7aa22aa5aa1fa2c468757f572edd3e7e036e7d28789d654c6736317
MD5 e22b163a1c870c9a47cf76ead0a707f8
BLAKE2b-256 4b457b0b37f8c0f361bd29c1f41359659502d31236c667b5f2741f5854a1a3d1

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataweave_py-1.0.1.tar.gz:

Publisher: publish.yml on estebanwasinger/dataweave-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataweave_py-1.0.1-cp310-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for dataweave_py-1.0.1-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 4df6325a790e13b1394c8598664e0f3fbbef93fa2ec97dd331141b5c84fa5b95
MD5 e61e80dddf394c3682a7c58299c8c8b1
BLAKE2b-256 67e21519f7735590cb757ee966a8e70a2b51b187fe0a9676b29e2db593e2f774

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataweave_py-1.0.1-cp310-abi3-win_amd64.whl:

Publisher: publish.yml on estebanwasinger/dataweave-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataweave_py-1.0.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dataweave_py-1.0.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 93636e42757e422fb304f7f4a458762ad8c58cf8a3b4cdbb22f8b0a6b4a5312e
MD5 ff143d127fd18fb6ae343e53b6d30648
BLAKE2b-256 6d048022e2951c4efc06a038734559e06b9b7715e96a22b236c8a1ee3a383690

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataweave_py-1.0.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on estebanwasinger/dataweave-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataweave_py-1.0.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for dataweave_py-1.0.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 69fd9299c705f40b4839c5730bc5b8ec21e8ce58dc8a83df0a343227bcb0600d
MD5 844a465c4173712638f06cb3c36820b7
BLAKE2b-256 5092a972c35e16f0a1e62c3dd6328d2d2809275826c92df3ff2ca4cf93529972

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataweave_py-1.0.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish.yml on estebanwasinger/dataweave-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataweave_py-1.0.1-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for dataweave_py-1.0.1-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 9982e4aaa41a2d3923470e9f0652cb08ace13c998e2f8a2d735ab0a1718bc476
MD5 31379c873cac17761d079ded41b00318
BLAKE2b-256 d754627c1c3c173dc6c03d19f9b6f1b59a91d12190215fb06f18364b3ee70072

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataweave_py-1.0.1-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: publish.yml on estebanwasinger/dataweave-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page