Skip to main content

Transform raw datasets into purpose-built data through descent-ascent methodology

Project description

Intuitiveness

PyPI version Python versions License: MIT DOI

Intuitiveness Gear Cube

Transform raw, complex datasets into purpose-built data that directly answers your questions.

Intuitiveness is a Python package implementing the descent-ascent methodology for dataset transformation, backed by peer-reviewed research. It helps data scientists and analysts simplify messy data, extract core insights, and rebuild enriched datasets with exactly the dimensions they need.


🎯 What is Intuitiveness?

Traditional data workflows force you to work with whatever structure you're given. Intuitiveness flips this: you define the question first, then the data follows.

The Descent-Ascent Cycle

  1. Descent (L4 → L0): Strip away complexity to find the core truth

    • Raw tabular data → Knowledge graph → Categories → Features → Single datum
  2. Ascent (L0 → L3): Rebuild with YOUR intent, adding only relevant dimensions

    • Single datum → Features → YOUR categories → YOUR relationships → Purpose-built dataset

The 5 Levels of Abstraction

Level Name Description Example
L4 Raw Dataset Original tabular data students.csv with 50 columns
L3 Entity Graph Knowledge graph of relationships Student → School → District
L2 Domain Categories Grouped by semantic domains Urban/Rural schools
L1 Feature Vector Unified numeric representation [85.5, 320, 0.42, ...]
L0 Core Datum Single atomic value Average score: 85.5

⚡ Quick Start

Installation

# Core package (descent/ascent operations)
pip install intuitiveness

# With quality assessment (TabPFN-based)
pip install intuitiveness[quality]

# With Neo4j knowledge graphs
pip install intuitiveness[neo4j]

# With data discovery (data.gouv.fr search)
pip install intuitiveness[discovery]

# Full features (includes Streamlit app)
pip install intuitiveness[all]

Basic Usage

Quality Assessment (TabPFN)

import pandas as pd
from intuitiveness import assess_quality

# Load your dataset
df = pd.read_csv("data.csv")

# Quick quality assessment
report = assess_quality(df, target_column="label")

print(f"Usability Score: {report.usability_score:.2f}")
print(f"Data Completeness: {report.completeness:.2f}")
print(f"Feature Diversity: {report.diversity:.2f}")

# Get improvement suggestions
for suggestion in report.suggestions:
    print(f"- {suggestion.type}: {suggestion.description}")

Descent-Ascent Cycle

from intuitiveness import (
    Level4Dataset,
    ComplexityLevel,
    descend,
    ascend
)
import pandas as pd

# Start with raw data
df = pd.read_csv("schools_data.csv")
l4 = Level4Dataset({"raw": df})

# Descend to core truth
l0 = descend(l4, ComplexityLevel.LEVEL_0,
             operation="mean",
             target_column="score")
print(f"Core datum: {l0.data}")  # {'average_score': 85.5}

# Ascend with YOUR dimensions
l3 = ascend(l0, ComplexityLevel.LEVEL_3,
            enrichments=["region", "school_type", "funding_level"])

# Export purpose-built dataset
l3.export("purpose_built_schools.csv")

Feature Checking

from intuitiveness import (
    QUALITY_AVAILABLE,
    DISCOVERY_AVAILABLE,
    NEO4J_AVAILABLE
)

print(f"Quality Assessment: {'✓' if QUALITY_AVAILABLE else '✗ (pip install intuitiveness[quality])'}")
print(f"Data Discovery: {'✓' if DISCOVERY_AVAILABLE else '✗ (pip install intuitiveness[discovery])'}")
print(f"Neo4j Graphs: {'✓' if NEO4J_AVAILABLE else '✗ (pip install intuitiveness[neo4j])'}")

🚀 Features

Core Functionality

  • 5-level complexity system (L0-L4) for dataset abstraction
  • Descent operations to simplify datasets and extract core insights
  • Ascent operations to rebuild datasets with custom dimensions
  • Navigation system to explore and track transformation paths

Quality Assessment (with [quality] extra)

  • 📊 TabPFN-based scoring for dataset usability (0-100 scale)
  • 🔍 Feature profiling with importance rankings
  • 💡 Automated suggestions for data improvements
  • 🎯 Anomaly detection using density estimation
  • 🧪 Synthetic data generation with validation benchmarks

Data Discovery (with [discovery] extra)

  • 🇫🇷 Natural language search for French open data (data.gouv.fr)
  • 🤖 SmolLM3-powered queries in plain French
  • 📥 Direct CSV downloads with caching

Knowledge Graphs (with [neo4j] extra)

  • 🕸️ Neo4j integration for entity relationship storage
  • 🔗 Graph-based navigation through dataset transformations
  • 🧠 Semantic matching using sentence embeddings

Streamlit App (with [app] extra)

  • 🖥️ Interactive web interface for visual workflows
  • 📈 Real-time quality visualizations with Plotly
  • 🎨 Export tools for CSV, JSON, and Python snippets

📦 Installation Options

Install Command Includes Use Case
pip install intuitiveness Core package Basic descent/ascent operations
pip install intuitiveness[quality] + TabPFN, SHAP Data quality assessment
pip install intuitiveness[neo4j] + Neo4j driver Knowledge graph storage
pip install intuitiveness[embeddings] + sentence-transformers Semantic matching
pip install intuitiveness[discovery] + requests Data.gouv.fr search
pip install intuitiveness[app] + Streamlit, Plotly Full web application
pip install intuitiveness[all] Everything Complete feature set
pip install intuitiveness[dev] + pytest, ruff, mypy Development tools

📚 Documentation

Prerequisites for Full Features

Neo4j Database (optional)

docker run -d --name neo4j -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/password \
  -e NEO4J_PLUGINS='["apoc"]' \
  neo4j:latest

HuggingFace Token for NL Queries (optional)

Set HF_TOKEN environment variable:

export HF_TOKEN="your_token_here"

Or add to .streamlit/secrets.toml:

HF_TOKEN = "your_token_here"

🧪 Example Use Cases

1. Data Scientist: Quick Quality Check

from intuitiveness import assess_quality
import pandas as pd

df = pd.read_csv("messy_data.csv")
report = assess_quality(df, target_column="target")

if report.usability_score < 60:
    print("⚠️ Low quality dataset - applying suggestions...")
    from intuitiveness import apply_all_suggestions
    improved_df = apply_all_suggestions(df, report.suggestions)
    print(f"✅ Score improved to {assess_quality(improved_df).usability_score:.0f}")

2. Analyst: Finding Core Insights

from intuitiveness import Level4Dataset, descend, ComplexityLevel

# Strip away complexity
l4 = Level4Dataset({"raw": pd.read_csv("sales_2024.csv")})
l0 = descend(l4, ComplexityLevel.LEVEL_0,
             operation="sum",
             target_column="revenue")
print(f"Total Revenue: ${l0.data['total_revenue']:,.2f}")

3. Researcher: Building Custom Datasets

from intuitiveness import Level0Dataset, ascend, ComplexityLevel

# Start from core truth
l0 = Level0Dataset({"gdp_growth": 2.3})

# Add relevant dimensions for YOUR research question
l3 = ascend(l0, ComplexityLevel.LEVEL_3,
            enrichments=["country", "sector", "quarter"])

# Export for modeling
l3.export("research_dataset.csv")

🏆 Acknowledgments

Part of the Dataflow research project.

Funded by:

  • Datactivist
  • UNESCO Chair in AI and Data Science for Society

Designed by: Arthur Sarazin & Mathis Mourey


📄 License

MIT License - see LICENSE file for details.

Copyright (c) 2024-2025 Arthur Sarazin & Mathis Mourey


📖 Citation

If you use Intuitiveness in your research, please cite:

@software{intuitiveness2024,
  author = {Sarazin, Arthur and Mourey, Mathis},
  title = {Intuitiveness: Purpose-Built Dataset Transformation},
  year = {2024},
  publisher = {GitHub},
  url = {https://github.com/ArthurSrz/intuitiveness},
  doi = {10.5281/zenodo.685140191}
}

Built with ❤️ for better data science

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intuitiveness-0.1.0.tar.gz (323.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

intuitiveness-0.1.0-py3-none-any.whl (389.2 kB view details)

Uploaded Python 3

File details

Details for the file intuitiveness-0.1.0.tar.gz.

File metadata

  • Download URL: intuitiveness-0.1.0.tar.gz
  • Upload date:
  • Size: 323.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for intuitiveness-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b7a24d65c6c35876e66f393a5c03387457f85a04f4a08d963326a4085a937457
MD5 030c4495c325030563b65410e4ab6f5d
BLAKE2b-256 7d797acb3f9d4ab5607e677c1f810b6850303264c35cc0b76cb736ed0dd96647

See more details on using hashes here.

File details

Details for the file intuitiveness-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: intuitiveness-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 389.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for intuitiveness-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6917b054d7b0865c08ec031bf7a367d580edd0bfadd4702cf49619df161d69d0
MD5 b4108ee055bb738491f78fcf27c61ce4
BLAKE2b-256 6e7211d60b63be45675e1baa5251d81aba92e4bc3f4eb8d38570884d208d60b2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page