Structural profiling for any dataset. Point it at your data. It tells you what you have.

These details have not been verified by PyPI

Project links

Project description

database-whisper

Auto-discovers structural patterns in datasets. Point it at a file, it tells you which fields matter for disambiguation, recommends indexes, and characterizes the structure.

Zero configuration. Zero dependencies (core). Works on CSV, TSV, JSON, SQLite, Excel, Parquet, and SQL dumps.

Install

pip install database-whisper

Optional format support:

pip install openpyxl    # for Excel .xlsx
pip install pyarrow     # for Parquet

Quick start

import database_whisper as dw

report = dw.profile("your_data.csv")
print(report)

Output:

=== Structural Profile: your_data.csv ===
Records: 114,000 | Fields: 20

Structural Density: HIGH (112,871x speedup)
  This dataset has deep categorical structure.

Auto-detected Identity: track_id, track_name, duration_ms

Discriminator Ladder:
  1. track_genre          98.7% reduction  ####################  dominant

Recommended Indexes:
  CREATE INDEX idx_track_genre ON tracks (track_genre);
    -- Standalone index: 99% reduction alone.

Data Quality:
  Ambiguous neighborhoods: 16,641 / 89,741 (18.5%)
  Fully resolved by ladder: YES (100% accuracy)

Structural Fingerprint: SINGLE-AXIS
  One field dominates. Minimal disambiguation depth needed.

What it does

Given any structured dataset, the algorithm:

Auto-detects which fields are identity (primary keys) and which are provenance (record IDs to exclude)
Discovers a discriminator ladder — the ordered sequence of fields that best resolves ambiguity among records sharing the same identity
Measures retrieval speedup vs flat scan and structural density
Recommends database indexes based on the discovered structure
Classifies the dataset by its structural fingerprint (SINGLE-AXIS, DEEP-PIPELINE, ALREADY-UNIQUE, LOW-STRUCTURE)

Supported formats

Format	Extension	Dependencies
CSV / TSV	.csv, .tsv	none
JSON (array or nested)	.json	none
NDJSON	.ndjson, .jsonl	none
SQLite	.db, .sqlite	none
SQL dump	.sql	none
Excel	.xlsx	openpyxl
Parquet	.parquet	pyarrow

API

import database_whisper as dw

# Profile a file (auto-detects format)
report = dw.profile("data.csv")
report = dw.profile("data.db")
report = dw.profile("data.xlsx")

# Profile in-memory records
report = dw.profile_records(records, field_names=["col1", "col2", ...])

# Batch router
router = dw.Router()
router.ingest(records, identity_fields=["gene", "disease"])
result = router.query({"gene": "BRAF", "disease": "Melanoma"}, ask_field="therapy")

# Streaming / incremental
live = dw.LiveRouter(identity_fields=["gene", "disease"])
for record in stream:
    event = live.insert(record)

# Memory with sleep consolidation
mem = dw.Memory(identity_fields=["gene", "disease"])
for fact in facts:
    mem.insert(fact)

Tested domains

The algorithm has been validated on 9 datasets across different domains. Same code, different data, different structures discovered.

Domain	Records	Speedup	Accuracy
Oncology (CIViC)	4,825	4,761x	100%
Pharma safety (FAERS)	50,000	7,462x	100%
Weather (NOAA Storm)	50,000	50,000x	100%
Astronomy (NASA Exoplanets)	6,158	6,109x	100%
Seismology (USGS Earthquakes)	20,000	20,000x	100%
Particle physics (CERN CMS)	100,000	100,000x	100%
Music (Spotify)	114,000	112,871x	100%
Astronomy (LSST PLAsTiCC)	7,848	7,848x	100%
Cosmology (LSST CosmoDC2)	50,000	3x	100%

Research

Paper I: Discriminator Ladder Learning — the algorithm and 3-domain validation
Paper II: Five Consequences of One Algorithm — anomaly detection, reasoning traces, compression, federated bridging across 9 domains

Requirements

Python 3.9+. Core package has zero external dependencies.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

database_whisper-0.1.0.tar.gz (19.5 kB view details)

Uploaded Apr 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

database_whisper-0.1.0-py3-none-any.whl (20.7 kB view details)

Uploaded Apr 12, 2026 Python 3

File details

Details for the file database_whisper-0.1.0.tar.gz.

File metadata

Download URL: database_whisper-0.1.0.tar.gz
Upload date: Apr 12, 2026
Size: 19.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for database_whisper-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`5ec6e660343cdc028ac456472f609241e5314d08eeef5a44d05d96a24629bc75`
MD5	`f82ef936de57ef6b3f8ae15a12599620`
BLAKE2b-256	`ed38e9d7a90136281a3d4fabd7e0faa653d6aa84964f72abe6b5392ef3c993dc`

See more details on using hashes here.

File details

Details for the file database_whisper-0.1.0-py3-none-any.whl.

File metadata

Download URL: database_whisper-0.1.0-py3-none-any.whl
Upload date: Apr 12, 2026
Size: 20.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for database_whisper-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`03f99c690b5f26ca449c9610ca5020ff74a7b02177bef6d6405177d15af60b6b`
MD5	`637dde456c3be1d214e0527f0439be66`
BLAKE2b-256	`5991c10c6066b64a7e35d0ad1ed2b74b66d5146e23802fa7a6ce72c32a2b4841`

See more details on using hashes here.

database-whisper 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

database-whisper

Install

Quick start

What it does

Supported formats

API

Tested domains

Research

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes