Skip to main content

Rich metadata context engine for AI-driven data analytics

Project description

DataRaum Context Engine

PyPI version Python License CI

A rich metadata context engine for AI-driven data analytics.

Traditional semantic layers tell BI tools "what things are called." DataRaum tells AI "what the data means, how it behaves, how it relates, and what you can compute from it."

The core insight: AI agents don't need tools to discover metadata at runtime. They need rich, pre-computed context delivered in a format optimized for LLM consumption.

Quick Start — MCP Server

The most common way to use DataRaum is as an MCP server inside Claude Desktop (or any MCP-compatible client).

# Install
pip install dataraum

# Or with uv
uv pip install dataraum

Add to your Claude Desktop config (claude_desktop_config.json):

{
  "mcpServers": {
    "dataraum": {
      "command": "dataraum-mcp"
    }
  }
}

Then in Claude Desktop:

Add the CSV files in /path/to/my/data and measure data quality

The server runs a 17-phase analysis pipeline and makes these tools available:

Tool Description
begin_session Start an investigation session with a contract
add_source Register a data source (CSV, Parquet, JSON, or directory)
look Explore data structure, relationships, and semantic metadata
measure Measure entropy scores, readiness, and data quality
query Natural language query against the data
run_sql Execute SQL directly with export support
end_session Archive workspace and end the session

Typical Workflow

add_source(name="accounting", path="/path/to/data")
  → begin_session(intent="explore data quality", contract="exploratory_analysis")
  → look()                    # Understand the data
  → measure()                 # Check quality scores and readiness
  → query("total revenue?")   # Ask questions
  → run_sql(sql="...", export_format="csv", export_name="report")
  → end_session(outcome="delivered")

Quick Start — CLI

# Run analysis pipeline (writes metadata.db + data.duckdb to ./pipeline_output)
dataraum run /path/to/data

# Inspect what was produced
dataraum dev context ./pipeline_output

See CLI Reference for all options.

What It Produces

DataRaum analyzes your data and generates:

  • Statistical metadata — distributions, cardinality, null rates, patterns
  • Semantic metadata — column roles, entity types, business terms (LLM-powered)
  • Topological metadata — relationships, join paths, hierarchies
  • Temporal metadata — granularity, gaps, seasonality, trends
  • Quality metadata — rules, scores, anomalies
  • Entropy scores — uncertainty quantification across all dimensions
  • Ontological context — domain-specific interpretation (financial, marketing, etc.)

LLM Configuration

Semantic analysis requires an Anthropic API key:

export ANTHROPIC_API_KEY="sk-..."

Configure the LLM provider in config/llm/config.yaml. See Configuration for details.

Development

git clone https://github.com/dataraum/dataraum
cd dataraum

# Install with dev dependencies (using uv)
uv sync --group dev

# Run tests
uv run pytest --testmon tests/unit -q

# Type check
uv run mypy src/

# Lint
uv run ruff check src/
uv run ruff format --check src/

Documentation

License

Apache 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataraum-0.2.0.tar.gz (876.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataraum-0.2.0-py3-none-any.whl (572.7 kB view details)

Uploaded Python 3

File details

Details for the file dataraum-0.2.0.tar.gz.

File metadata

  • Download URL: dataraum-0.2.0.tar.gz
  • Upload date:
  • Size: 876.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dataraum-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1449cd3217a34b5fba9c0cb038c95f30e0890a984fc1115a6f4160089da3cf78
MD5 3ce292684f3d65aa39fdb427c66ec09d
BLAKE2b-256 82f38b50f2f909f65b8647b39aca27fc1adbf3fcdcab0af4cb92798d31f2ae93

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataraum-0.2.0.tar.gz:

Publisher: release.yml on dataraum/dataraum

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataraum-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: dataraum-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 572.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dataraum-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 308354cecb0cb92bb20f23d6100e266e3101a9ad7a0901618b7069632bd89a59
MD5 d91c9381b5cedae738e980440d3978fc
BLAKE2b-256 6bcaa1480f1ba2339817f074f725bf199294a590ce6c04586a5a8683087f7f05

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataraum-0.2.0-py3-none-any.whl:

Publisher: release.yml on dataraum/dataraum

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page