Rich metadata context engine for AI-driven data analytics
Project description
DataRaum Context Engine
A rich metadata context engine for AI-driven data analytics.
Traditional semantic layers tell BI tools "what things are called." DataRaum tells AI "what the data means, how it behaves, how it relates, and what you can compute from it."
The core insight: AI agents don't need tools to discover metadata at runtime. They need rich, pre-computed context delivered in a format optimized for LLM consumption.
Quick Start — MCP Server
The most common way to use DataRaum is as an MCP server inside Claude Desktop (or any MCP-compatible client).
# Install
pip install dataraum
# Or with uv
uv pip install dataraum
Add to your Claude Desktop config (claude_desktop_config.json):
{
"mcpServers": {
"dataraum": {
"command": "dataraum-mcp"
}
}
}
Then in Claude Desktop:
Add the CSV files in /path/to/my/data and measure data quality
The server runs a 17-phase analysis pipeline and makes these tools available:
| Tool | Description |
|---|---|
begin_session |
Start an investigation session with a contract |
add_source |
Register a data source (CSV, Parquet, JSON, or directory) |
look |
Explore data structure, relationships, and semantic metadata |
measure |
Measure entropy scores, readiness, and data quality |
query |
Natural language query against the data |
run_sql |
Execute SQL directly with export support |
end_session |
Archive workspace and end the session |
Typical Workflow
add_source(name="accounting", path="/path/to/data")
→ begin_session(intent="explore data quality", contract="exploratory_analysis")
→ look() # Understand the data
→ measure() # Check quality scores and readiness
→ query("total revenue?") # Ask questions
→ run_sql(sql="...", export_format="csv", export_name="report")
→ end_session(outcome="delivered")
Quick Start — CLI
# Run analysis pipeline (writes metadata.db + data.duckdb to ./pipeline_output)
dataraum run /path/to/data
# Inspect what was produced
dataraum dev context ./pipeline_output
See CLI Reference for all options.
What It Produces
DataRaum analyzes your data and generates:
- Statistical metadata — distributions, cardinality, null rates, patterns
- Semantic metadata — column roles, entity types, business terms (LLM-powered)
- Topological metadata — relationships, join paths, hierarchies
- Temporal metadata — granularity, gaps, seasonality, trends
- Quality metadata — rules, scores, anomalies
- Entropy scores — uncertainty quantification across all dimensions
- Ontological context — domain-specific interpretation (financial, marketing, etc.)
LLM Configuration
Semantic analysis requires an Anthropic API key:
export ANTHROPIC_API_KEY="sk-..."
Configure the LLM provider in config/llm/config.yaml. See Configuration for details.
Development
git clone https://github.com/dataraum/dataraum
cd dataraum
# Install with dev dependencies (using uv)
uv sync --group dev
# Run tests
uv run pytest --testmon tests/unit -q
# Type check
uv run mypy src/
# Lint
uv run ruff check src/
uv run ruff format --check src/
Documentation
- Architecture — system design and pipeline overview
- Pipeline — 17-phase pipeline reference
- Entropy — uncertainty quantification system
- Data Model — metadata schema
- CLI Reference — command-line interface
- MCP Setup — MCP server configuration
- Configuration — config directory reference
- Contributing — development setup and patterns
License
Apache 2.0 — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dataraum-0.2.1.tar.gz.
File metadata
- Download URL: dataraum-0.2.1.tar.gz
- Upload date:
- Size: 898.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b3a8b5110b69ee76fe2c4b837c0ce7a19ea9adc1959f828901318e0aa512e9e
|
|
| MD5 |
7b69983aeef69d29b8a42cbe129bdcc1
|
|
| BLAKE2b-256 |
48e51628445974481659b96efb8e00d53f1308ecae558b3c0df36e37a09c4c17
|
Provenance
The following attestation bundles were made for dataraum-0.2.1.tar.gz:
Publisher:
release.yml on dataraum/dataraum
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dataraum-0.2.1.tar.gz -
Subject digest:
4b3a8b5110b69ee76fe2c4b837c0ce7a19ea9adc1959f828901318e0aa512e9e - Sigstore transparency entry: 1328207950
- Sigstore integration time:
-
Permalink:
dataraum/dataraum@d19711b6f92be24d76188917a95473cada6a129f -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/dataraum
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d19711b6f92be24d76188917a95473cada6a129f -
Trigger Event:
release
-
Statement type:
File details
Details for the file dataraum-0.2.1-py3-none-any.whl.
File metadata
- Download URL: dataraum-0.2.1-py3-none-any.whl
- Upload date:
- Size: 583.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
525ea14e4ea434311839dc73e0fc55a4e0e21117e4ff29f0ce9f1bb67421652d
|
|
| MD5 |
cae703cd6e93767955b9a43d60a3c4c3
|
|
| BLAKE2b-256 |
4745999485b29c0a4cc2dda7e21b4de4f151d81941915bb4c0d468cffd39d702
|
Provenance
The following attestation bundles were made for dataraum-0.2.1-py3-none-any.whl:
Publisher:
release.yml on dataraum/dataraum
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dataraum-0.2.1-py3-none-any.whl -
Subject digest:
525ea14e4ea434311839dc73e0fc55a4e0e21117e4ff29f0ce9f1bb67421652d - Sigstore transparency entry: 1328207964
- Sigstore integration time:
-
Permalink:
dataraum/dataraum@d19711b6f92be24d76188917a95473cada6a129f -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/dataraum
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d19711b6f92be24d76188917a95473cada6a129f -
Trigger Event:
release
-
Statement type: