Deterministic dataset shape and semantic inference for Invariant
Project description
Datasculpt
Deterministic dataset shape and semantic inference for tabular data.
The Problem
Before data can be governed, queried, or compared across systems, its structural intent must be understood. Most data systems (catalogs, semantic layers, governance engines) assume this understanding exists but don't produce it.
The Solution
Datasculpt infers and explains structural intent:
- Shape — Is this long or wide? Time in headers or rows?
- Grain — What uniquely identifies each row?
- Roles — Which columns are dimensions, measures, or keys?
What It Is Not
- Not a data catalog (produces metadata, doesn't store it)
- Not an ETL tool (analyzes structure, doesn't transform data)
- Not a semantic layer (understands layout, not meaning)
Quick Start
pip install datasculpt
from datasculpt import infer
result = infer("data.csv")
print(result.proposal.shape_hypothesis) # wide_observations
print(result.decision_record.grain.key_columns) # ['geo_id', 'sex', 'age_group']
for col in result.proposal.columns:
print(f"{col.name}: {col.role.value}")
# geo_id: dimension
# sex: dimension
# age_group: dimension
# population: measure
# unemployed: measure
Try It
🔬 Live Demo — Analyze datasets in your browser. No installation, no data leaves your machine.
Documentation
- Quickstart — First inference in 5 minutes
- Examples — See inference on different dataset shapes
- Concepts — Understand shapes, roles, and grain
- API Reference — Function signatures and types
Key Features
Five Dataset Shapes
| Shape | Description |
|---|---|
long_observations |
Rows are atomic observations |
long_indicators |
Unpivoted indicator/value pairs |
wide_observations |
Measures as columns |
wide_time_columns |
Time periods in column headers |
series_column |
Time series as arrays in cells |
Eight Column Roles
| Role | Purpose |
|---|---|
key |
Contributes to uniqueness |
dimension |
Categorical grouping |
measure |
Numeric, aggregatable |
time |
Temporal dimension |
indicator_name |
Names in unpivoted data |
value |
Values in unpivoted data |
series |
Embedded time series |
metadata |
Descriptive, non-analytical |
Deterministic Inference
Same input → same output. No LLMs, no randomness, no hidden state.
Evidence-Based
Every decision is scored and justified:
>>> result.decision_record.hypotheses
[
HypothesisScore(hypothesis=WIDE_OBSERVATIONS, score=0.72, reasons=[...]),
HypothesisScore(hypothesis=LONG_OBSERVATIONS, score=0.65, reasons=[...]),
]
Interactive Mode
Resolve ambiguity with questions:
result = infer("data.csv", interactive=True)
if result.pending_questions:
answers = {result.pending_questions[0].id: "long_indicators"}
result = apply_answers(result, answers)
Installation Options
# Core only
pip install datasculpt
# With optional adapters
pip install datasculpt[frictionless] # Schema validation
pip install datasculpt[dataprofiler] # Statistical profiling
pip install datasculpt[all] # Everything
Requirements
- Python 3.11+
- pandas 2.0+
Development
# Install with dev dependencies
make install-dev
# Run tests
make test
# Lint and format
make lint
make format
# Type checking
make typecheck
# Serve docs locally
make docs-serve
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datasculpt-0.1.0.tar.gz.
File metadata
- Download URL: datasculpt-0.1.0.tar.gz
- Upload date:
- Size: 391.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24ddfdfc1f19da65372ff1fc5ee51cbb03bab86356b819e90b0ceab737525528
|
|
| MD5 |
5633d6da193a404782807235808d352c
|
|
| BLAKE2b-256 |
591f1e1f0d6dc338748234269abc96b700d511a5d456b8ed6469fe97bd0e6064
|
File details
Details for the file datasculpt-0.1.0-py3-none-any.whl.
File metadata
- Download URL: datasculpt-0.1.0-py3-none-any.whl
- Upload date:
- Size: 111.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff253fd5a1d0e1102dc93f9b1a61ede2cf101478e3fb041c7ab4869e1b3e7827
|
|
| MD5 |
6eb4c614cff403bc7eb993292777b923
|
|
| BLAKE2b-256 |
c3004eb02587d2628cf12cc6841d46202c9de0eec65d41923e83a4d46454eaeb
|