Transform raw datasets into purpose-built data through descent-ascent methodology
Project description
Intuitiveness
Transform raw, complex datasets into purpose-built data that directly answers your questions.
Intuitiveness is a Python package implementing the descent-ascent methodology for dataset transformation, backed by peer-reviewed research. It helps data scientists and analysts simplify messy data, extract core insights, and rebuild enriched datasets with exactly the dimensions they need.
🎯 What is Intuitiveness?
Traditional data workflows force you to work with whatever structure you're given. Intuitiveness flips this: you define the question first, then the data follows.
The Descent-Ascent Cycle
-
Descent (L4 → L0): Strip away complexity to find the core truth
- Raw tabular data → Knowledge graph → Categories → Features → Single datum
-
Ascent (L0 → L3): Rebuild with YOUR intent, adding only relevant dimensions
- Single datum → Features → YOUR categories → YOUR relationships → Purpose-built dataset
The 5 Levels of Abstraction
| Level | Name | Description | Example |
|---|---|---|---|
| L4 | Raw Dataset | Original tabular data | students.csv with 50 columns |
| L3 | Entity Graph | Knowledge graph of relationships | Student → School → District |
| L2 | Domain Categories | Grouped by semantic domains | Urban/Rural schools |
| L1 | Feature Vector | Unified numeric representation | [85.5, 320, 0.42, ...] |
| L0 | Core Datum | Single atomic value | Average score: 85.5 |
⚡ Quick Start
Installation
# Core package (descent/ascent operations)
pip install intuitiveness
# With quality assessment (TabPFN-based)
pip install intuitiveness[quality]
# With Neo4j knowledge graphs
pip install intuitiveness[neo4j]
# With data discovery (data.gouv.fr search)
pip install intuitiveness[discovery]
# Full features (includes Streamlit app)
pip install intuitiveness[all]
Basic Usage
Quality Assessment (TabPFN)
import pandas as pd
from intuitiveness import assess_quality
# Load your dataset
df = pd.read_csv("data.csv")
# Quick quality assessment
report = assess_quality(df, target_column="label")
print(f"Usability Score: {report.usability_score:.2f}")
print(f"Data Completeness: {report.completeness:.2f}")
print(f"Feature Diversity: {report.diversity:.2f}")
# Get improvement suggestions
for suggestion in report.suggestions:
print(f"- {suggestion.type}: {suggestion.description}")
Descent-Ascent Cycle
from intuitiveness import (
Level4Dataset,
ComplexityLevel,
descend,
ascend
)
import pandas as pd
# Start with raw data
df = pd.read_csv("schools_data.csv")
l4 = Level4Dataset({"raw": df})
# Descend to core truth
l0 = descend(l4, ComplexityLevel.LEVEL_0,
operation="mean",
target_column="score")
print(f"Core datum: {l0.data}") # {'average_score': 85.5}
# Ascend with YOUR dimensions
l3 = ascend(l0, ComplexityLevel.LEVEL_3,
enrichments=["region", "school_type", "funding_level"])
# Export purpose-built dataset
l3.export("purpose_built_schools.csv")
Feature Checking
from intuitiveness import (
QUALITY_AVAILABLE,
DISCOVERY_AVAILABLE,
NEO4J_AVAILABLE
)
print(f"Quality Assessment: {'✓' if QUALITY_AVAILABLE else '✗ (pip install intuitiveness[quality])'}")
print(f"Data Discovery: {'✓' if DISCOVERY_AVAILABLE else '✗ (pip install intuitiveness[discovery])'}")
print(f"Neo4j Graphs: {'✓' if NEO4J_AVAILABLE else '✗ (pip install intuitiveness[neo4j])'}")
🚀 Features
Core Functionality
- ✅ 5-level complexity system (L0-L4) for dataset abstraction
- ✅ Descent operations to simplify datasets and extract core insights
- ✅ Ascent operations to rebuild datasets with custom dimensions
- ✅ Navigation system to explore and track transformation paths
Quality Assessment (with [quality] extra)
- 📊 TabPFN-based scoring for dataset usability (0-100 scale)
- 🔍 Feature profiling with importance rankings
- 💡 Automated suggestions for data improvements
- 🎯 Anomaly detection using density estimation
- 🧪 Synthetic data generation with validation benchmarks
Data Discovery (with [discovery] extra)
- 🇫🇷 Natural language search for French open data (data.gouv.fr)
- 🤖 SmolLM3-powered queries in plain French
- 📥 Direct CSV downloads with caching
Knowledge Graphs (with [neo4j] extra)
- 🕸️ Neo4j integration for entity relationship storage
- 🔗 Graph-based navigation through dataset transformations
- 🧠 Semantic matching using sentence embeddings
Streamlit App (with [app] extra)
- 🖥️ Interactive web interface for visual workflows
- 📈 Real-time quality visualizations with Plotly
- 🎨 Export tools for CSV, JSON, and Python snippets
📦 Installation Options
| Install Command | Includes | Use Case |
|---|---|---|
pip install intuitiveness |
Core package | Basic descent/ascent operations |
pip install intuitiveness[quality] |
+ TabPFN, SHAP | Data quality assessment |
pip install intuitiveness[neo4j] |
+ Neo4j driver | Knowledge graph storage |
pip install intuitiveness[embeddings] |
+ sentence-transformers | Semantic matching |
pip install intuitiveness[discovery] |
+ requests | Data.gouv.fr search |
pip install intuitiveness[app] |
+ Streamlit, Plotly | Full web application |
pip install intuitiveness[all] |
Everything | Complete feature set |
pip install intuitiveness[dev] |
+ pytest, ruff, mypy | Development tools |
📚 Documentation
- GitHub Repository: ArthurSrz/intuitiveness
- Research Paper: Intuitiveness as the Next Stage of Open Data
- Scientific Article: See
scientific_article/directory for peer-reviewed methodology
Prerequisites for Full Features
Neo4j Database (optional)
docker run -d --name neo4j -p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=neo4j/password \
-e NEO4J_PLUGINS='["apoc"]' \
neo4j:latest
HuggingFace Token for NL Queries (optional)
Set HF_TOKEN environment variable:
export HF_TOKEN="your_token_here"
Or add to .streamlit/secrets.toml:
HF_TOKEN = "your_token_here"
🧪 Example Use Cases
1. Data Scientist: Quick Quality Check
from intuitiveness import assess_quality
import pandas as pd
df = pd.read_csv("messy_data.csv")
report = assess_quality(df, target_column="target")
if report.usability_score < 60:
print("⚠️ Low quality dataset - applying suggestions...")
from intuitiveness import apply_all_suggestions
improved_df = apply_all_suggestions(df, report.suggestions)
print(f"✅ Score improved to {assess_quality(improved_df).usability_score:.0f}")
2. Analyst: Finding Core Insights
from intuitiveness import Level4Dataset, descend, ComplexityLevel
# Strip away complexity
l4 = Level4Dataset({"raw": pd.read_csv("sales_2024.csv")})
l0 = descend(l4, ComplexityLevel.LEVEL_0,
operation="sum",
target_column="revenue")
print(f"Total Revenue: ${l0.data['total_revenue']:,.2f}")
3. Researcher: Building Custom Datasets
from intuitiveness import Level0Dataset, ascend, ComplexityLevel
# Start from core truth
l0 = Level0Dataset({"gdp_growth": 2.3})
# Add relevant dimensions for YOUR research question
l3 = ascend(l0, ComplexityLevel.LEVEL_3,
enrichments=["country", "sector", "quarter"])
# Export for modeling
l3.export("research_dataset.csv")
🏆 Acknowledgments
Part of the Dataflow research project.
Funded by:
- Datactivist
- UNESCO Chair in AI and Data Science for Society
Designed by: Arthur Sarazin & Mathis Mourey
📄 License
MIT License - see LICENSE file for details.
Copyright (c) 2024-2025 Arthur Sarazin & Mathis Mourey
📖 Citation
If you use Intuitiveness in your research, please cite:
@software{intuitiveness2024,
author = {Sarazin, Arthur and Mourey, Mathis},
title = {Intuitiveness: Purpose-Built Dataset Transformation},
year = {2024},
publisher = {GitHub},
url = {https://github.com/ArthurSrz/intuitiveness},
doi = {10.5281/zenodo.685140191}
}
Built with ❤️ for better data science
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file intuitiveness-0.1.0.tar.gz.
File metadata
- Download URL: intuitiveness-0.1.0.tar.gz
- Upload date:
- Size: 323.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b7a24d65c6c35876e66f393a5c03387457f85a04f4a08d963326a4085a937457
|
|
| MD5 |
030c4495c325030563b65410e4ab6f5d
|
|
| BLAKE2b-256 |
7d797acb3f9d4ab5607e677c1f810b6850303264c35cc0b76cb736ed0dd96647
|
File details
Details for the file intuitiveness-0.1.0-py3-none-any.whl.
File metadata
- Download URL: intuitiveness-0.1.0-py3-none-any.whl
- Upload date:
- Size: 389.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6917b054d7b0865c08ec031bf7a367d580edd0bfadd4702cf49619df161d69d0
|
|
| MD5 |
b4108ee055bb738491f78fcf27c61ce4
|
|
| BLAKE2b-256 |
6e7211d60b63be45675e1baa5251d81aba92e4bc3f4eb8d38570884d208d60b2
|