AI toolkit for tabular data — auto EDA, data profiling, anomaly detection, and smart transformations on DataFrames.

These details have not been verified by PyPI

Project links

Project description

tableai

Profile, clean, and query tabular data with one-liners — plus natural-language DataFrame analysis.

PyPI Python License

tableai is a toolkit for making sense of DataFrames fast. Profile any DataFrame and get column types, null counts, descriptive statistics, correlations, and a data-quality score. Clean it with a single call that imputes missing values, drops duplicates, and clips outliers. Detect anomalies with IQR or Isolation Forest. Get rule-based natural-language insights — or ask questions in plain English and have anyllm generate the pandas code for you.

Built by Viet-Anh Nguyen at NRL.ai.

Why tableai?

One-liner API — tableai.profile(df) gives you everything in one call
Plugin architecture — Register custom profilers, cleaners, and anomaly detectors
Local-first — All core features work without any cloud or LLM call
Minimal core deps — pandas and numpy; sklearn and anyllm are optional
Production-ready — Structured dataclass results, JSON export, reproducible

Installation

pip install tableai

For optional features:

pip install tableai[sklearn]   # Isolation Forest + KMeans clustering
pip install tableai[llm]       # NL querying via anyllm
pip install tableai[all]       # everything

Python 3.8+ supported (tested on 3.8, 3.9, 3.10, 3.11, 3.12, 3.13)

Quick Start

import tableai
import pandas as pd

df = pd.read_csv("sales.csv")

# 1. Profile the DataFrame (dtypes, nulls, stats, correlations, quality score)
report = tableai.profile(df)
print(report.quality_score)              # 0.0 - 1.0
print(report.nulls)                      # per-column null counts
print(report.correlations.head())        # top correlated pairs

# 2. Clean the DataFrame (impute, dedupe, clip outliers)
clean = tableai.clean(df, impute=True, dedupe=True, clip_outliers=True)

# 3. Detect anomalies (IQR by default, Isolation Forest if sklearn installed)
anomalies = tableai.anomalies(df, method="iqr")
print(f"{len(anomalies)} anomalous rows")

# 4. Rule-based insights
for insight in tableai.insights(df):
    print("-", insight)

# 5. Natural-language querying (requires tableai[llm] + anyllm)
result = tableai.ask(df, "what is the average revenue by region?")
print(result)

Models & Methods

Profiling

Dtype detection — numeric / categorical / datetime / text / boolean / ID
Null analysis — per-column null counts, percentages, and null patterns
Descriptive statistics — mean, std, min, 25/50/75 percentiles, max, skew, kurtosis
Cardinality — unique counts and top-K value frequencies
Correlation matrix — Pearson for numerics, Cramer's V for categoricals
Duplicate detection — exact and near-duplicate row counts

Cleaning

Configurable pipeline applied in order:

Drop constant columns — zero variance
Impute — median for numerics, mode for categoricals (configurable)
Deduplicate — drop exact-duplicate rows
Clip outliers — IQR method ([Q1 - 1.5*IQR, Q3 + 1.5*IQR])
Type coercion — auto-convert date-like strings to datetime

Anomaly detection

Method	Algorithm	Notes
`iqr` (default)	1.5 x IQR per numeric column	Zero deps
`zscore`	`	z
`isolation_forest`	sklearn `IsolationForest`	Needs `tableai[sklearn]`

Data quality score

Weighted average (0.0 - 1.0) of four sub-scores:

Completeness — 1 - null_ratio
Uniqueness — ratio of distinct rows
Consistency — fraction of columns with a dominant dtype
Validity — fraction of values inside expected ranges / formats

Insights (rule-based NL)

Pattern-driven natural-language observations, for example:

"Column 'age' has 23.4% missing values"
"'price' and 'quantity' are strongly positively correlated (r=0.87)"
"Column 'id' appears to be a unique identifier"
"12 rows are exact duplicates"

Natural-language querying (optional)

tableai.ask(df, "…") uses anyllm to generate pandas code for your question, executes it in a sandboxed namespace, and returns the result. Works with any local or cloud LLM that anyllm supports.

Models & Methods

tableai uses pure pandas/numpy for core operations — no ML dependencies required.

Profiling (tableai.profile) — Computes per-column:

Dtype detection (numeric, categorical, datetime, string)
Null counts and percentages
Unique value counts
Numeric statistics: mean, median, std, min, max, quartiles, skewness, kurtosis
Top categorical values
Pearson correlation matrix between numeric columns

Cleaning (tableai.clean) — Configurable strategies:

Missing values: median (numeric), mode (categorical), drop, or zero
Duplicate removal
Outlier handling: IQR-based clipping or removal

Anomaly Detection (tableai.anomalies):

IQR method (default, no deps) — flags points outside Q1-1.5·IQR / Q3+1.5·IQR
Isolation Forest (optional via [ml], requires scikit-learn)

Quality Scoring (tableai.quality_score) — Weighted score 0-100:

Completeness 35% (1 - null_ratio)
Validity 25% (IQR-based outlier ratio)
Uniqueness 20% (duplicate detection)
Consistency 20% (mixed-type detection)

Insights (tableai.insights) — Rule-based natural language insights about missing values, correlations, skewness, cardinality, duplicates, and class imbalance.

Natural Language Querying (tableai.ask, tableai.query) — Optional via [llm] extra. Uses anyllm to generate pandas code from natural language. Falls back to keyword matching when LLM unavailable.

API Reference

Function	Purpose
`tableai.profile(df)`	Returns `ProfileReport` dataclass
`tableai.clean(df, **opts)`	Returns a cleaned DataFrame
`tableai.anomalies(df, method="iqr")`	Returns rows flagged as anomalous
`tableai.quality_score(df)`	Returns float 0.0 - 1.0
`tableai.insights(df)`	Returns `list[str]` of NL insights
`tableai.ask(df, question, model=None)`	NL query via LLM
`tableai.compare(df1, df2)`	Diff two DataFrames (schema + data)

CLI Usage

tableai profile data.csv --out report.json
tableai clean data.csv --out clean.csv
tableai anomalies data.csv --method isolation_forest
tableai ask data.csv "average sales by region"
tableai quality data.csv

Examples

Full profiling report to JSON

import tableai, pandas as pd

df = pd.read_csv("customers.csv")
report = tableai.profile(df)
report.to_json("customers_report.json")
print(f"Quality: {report.quality_score:.2f}")

Custom cleaning pipeline

import tableai

clean = tableai.clean(
    df,
    impute_numeric="median",
    impute_categorical="mode",
    dedupe=True,
    clip_outliers=True,
    drop_constant=True,
)

Ask questions in English (with Ollama)

import tableai

# Uses anyllm; defaults to Ollama if running locally
answer = tableai.ask(df, "which customer spent the most last quarter?",
                     model="llama3.1:8b")
print(answer)

License

MIT (c) Viet-Anh Nguyen

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.4

Apr 9, 2026

This version

0.2.3

Apr 9, 2026

0.2.2

Apr 9, 2026

0.2.1

Apr 9, 2026

0.2.0

Apr 9, 2026

0.0.1

May 4, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tableai-0.2.3.tar.gz (35.4 kB view details)

Uploaded Apr 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tableai-0.2.3-py3-none-any.whl (29.7 kB view details)

Uploaded Apr 9, 2026 Python 3

File details

Details for the file tableai-0.2.3.tar.gz.

File metadata

Download URL: tableai-0.2.3.tar.gz
Upload date: Apr 9, 2026
Size: 35.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for tableai-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`6803a699a0837c83039b18a5e1cff5d9a0c7e7ea1adc3e8737b19c9284b01c2f`
MD5	`bfe8b327b127f28f557aea966eb3ba42`
BLAKE2b-256	`668659d1ca4cee8db4b7963d1bf9f4bf835f6347a5f319e049d9c25aa6c88b41`

See more details on using hashes here.

File details

Details for the file tableai-0.2.3-py3-none-any.whl.

File metadata

Download URL: tableai-0.2.3-py3-none-any.whl
Upload date: Apr 9, 2026
Size: 29.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for tableai-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`64e11480d10fdb3e56505ba84a6a9e9d246c278cb384b4d437c9bc036cf30be5`
MD5	`396ccf8fc13b0ef9f577846b9771ee09`
BLAKE2b-256	`770602b723c4357c9c7a36b0f5b1eda7286a6ca0c11c1ea49ea3b4071952c926`

See more details on using hashes here.

tableai 0.2.3

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

tableai

Why tableai?

Installation

Quick Start

Models & Methods

Profiling

Cleaning

Anomaly detection

Data quality score

Insights (rule-based NL)

Natural-language querying (optional)

Models & Methods

API Reference

CLI Usage

Examples

Full profiling report to JSON

Custom cleaning pipeline

Ask questions in English (with Ollama)

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes