Skip to main content

Rich metadata context engine for AI-driven data analytics

Project description

DataRaum Context Engine

PyPI version Python License CI

A rich metadata context engine for AI-driven data analytics.

Traditional semantic layers tell BI tools "what things are called." DataRaum tells AI "what the data means, how it behaves, how it relates, and what you can compute from it."

The core insight: AI agents don't need tools to discover metadata at runtime. They need rich, pre-computed context delivered in a format optimized for LLM consumption.

Quick Start — MCP Server

The most common way to use DataRaum is as an MCP server inside Claude Desktop (or any MCP-compatible client).

# Install
pip install dataraum

# Or with uv
uv pip install dataraum

Add to your Claude Desktop config (claude_desktop_config.json):

{
  "mcpServers": {
    "dataraum": {
      "command": "dataraum-mcp"
    }
  }
}

Then in Claude Desktop:

Add the CSV files in /path/to/my/data and measure data quality

The server runs a 17-phase analysis pipeline and makes these tools available:

Tool Description
begin_session Start an investigation session with a contract
add_source Register a data source (CSV, Parquet, JSON, or directory)
look Explore data structure, relationships, and semantic metadata
measure Measure entropy scores, readiness, and data quality
query Natural language query against the data
run_sql Execute SQL directly with export support
end_session Archive workspace and end the session

Typical Workflow

add_source(name="accounting", path="/path/to/data")
  → begin_session(intent="explore data quality", contract="exploratory_analysis")
  → look()                    # Understand the data
  → measure()                 # Check quality scores and readiness
  → query("total revenue?")   # Ask questions
  → run_sql(sql="...", export_format="csv", export_name="report")
  → end_session(outcome="delivered")

Quick Start — CLI

# Run analysis pipeline (writes metadata.db + data.duckdb to ./pipeline_output)
dataraum run /path/to/data

# Inspect what was produced
dataraum dev context ./pipeline_output

See CLI Reference for all options.

What It Produces

DataRaum analyzes your data and generates:

  • Statistical metadata — distributions, cardinality, null rates, patterns
  • Semantic metadata — column roles, entity types, business terms (LLM-powered)
  • Topological metadata — relationships, join paths, hierarchies
  • Temporal metadata — granularity, gaps, seasonality, trends
  • Quality metadata — rules, scores, anomalies
  • Entropy scores — uncertainty quantification across all dimensions
  • Ontological context — domain-specific interpretation (financial, marketing, etc.)

LLM Configuration

Semantic analysis requires an Anthropic API key:

export ANTHROPIC_API_KEY="sk-..."

Configure the LLM provider in config/llm/config.yaml. See Configuration for details.

Development

git clone https://github.com/dataraum/dataraum
cd dataraum

# Install with dev dependencies (using uv)
uv sync --group dev

# Run tests
uv run pytest --testmon tests/unit -q

# Type check
uv run mypy src/

# Lint
uv run ruff check src/
uv run ruff format --check src/

Documentation

License

Apache 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataraum-0.2.1.tar.gz (898.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataraum-0.2.1-py3-none-any.whl (583.9 kB view details)

Uploaded Python 3

File details

Details for the file dataraum-0.2.1.tar.gz.

File metadata

  • Download URL: dataraum-0.2.1.tar.gz
  • Upload date:
  • Size: 898.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dataraum-0.2.1.tar.gz
Algorithm Hash digest
SHA256 4b3a8b5110b69ee76fe2c4b837c0ce7a19ea9adc1959f828901318e0aa512e9e
MD5 7b69983aeef69d29b8a42cbe129bdcc1
BLAKE2b-256 48e51628445974481659b96efb8e00d53f1308ecae558b3c0df36e37a09c4c17

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataraum-0.2.1.tar.gz:

Publisher: release.yml on dataraum/dataraum

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataraum-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: dataraum-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 583.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dataraum-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 525ea14e4ea434311839dc73e0fc55a4e0e21117e4ff29f0ce9f1bb67421652d
MD5 cae703cd6e93767955b9a43d60a3c4c3
BLAKE2b-256 4745999485b29c0a4cc2dda7e21b4de4f151d81941915bb4c0d468cffd39d702

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataraum-0.2.1-py3-none-any.whl:

Publisher: release.yml on dataraum/dataraum

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page