Skip to main content

One-line, opinionated data cleaning for pandas/Polars. Fixes missing values, categories, outliers, and duplicates with transparent logs and a reproducible report.

Project description

CI PyPI Python Versions License Downloads

dataprep-ai

One-line, opinionated data cleaning for pandas/Polars.
Fix missing values, inconsistent categories, outliers, and duplicates with transparent logs and a reproducible report.


Installation

pip install dataprep-ai

----
For the optional explorer app:
pip install "dataprep-ai[app]"

Requirements

Python: 3.9  3.12

OS: Linux, macOS, Windows

Required libs (auto-installed): pandas, numpy, pyarrow, scikit-learn, pydantic, rich

Optional:

polars (enabled automatically where supported)  Polars round-trip I/O

streamlit, matplotlib  only needed for the explorer

Quickstart:

import pandas as pd
from dataprep_ai import clean, CleaningConfig

df = pd.DataFrame({
    "age":[23, None, 25, 1000],
    "income":[52000, 58000, None, 1200000],
    "city":["NY","New York","nyc", None],
    "id":[1,2,2,4]
})

result = clean(df, CleaningConfig(
    id_columns=["id"],
    outlier_strategy="iqr_cap",
    categorical_normalization=True,
    drop_duplicates=False
))

print(result.summary_markdown)  # see cleaning report
df_clean = result.df            # cleaned DataFrame
result.to_json("clean_report.json")

Streamlit Explorer:

pip install "dataprep-ai[app]"
streamlit run -m dataprep_ai.explore -- --csv your.csv

Backends

Input = pandas.DataFrame  Output = pandas.DataFrame

Input = polars.DataFrame  Output = polars.DataFrame (internally converts via pandas in v0.1)

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataprep_ai-0.1.4.tar.gz (6.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataprep_ai-0.1.4-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file dataprep_ai-0.1.4.tar.gz.

File metadata

  • Download URL: dataprep_ai-0.1.4.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dataprep_ai-0.1.4.tar.gz
Algorithm Hash digest
SHA256 b3621c8ea6580f7e5e84a7479ecec40d9fdb5ab52180889160e884101ea9bb21
MD5 2fec451d642a6b82ac571a5a60038f55
BLAKE2b-256 ed8308fe50ac125af928768f5b440986866e23dbc584249497ddef594bc4a9d7

See more details on using hashes here.

File details

Details for the file dataprep_ai-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: dataprep_ai-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 7.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dataprep_ai-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e34e99f72eaf5d29a7dde7a9f7a175917f5d1598b2b89a43b05481dc643ac6b3
MD5 866e17d506366baaa03277ceb08018b1
BLAKE2b-256 8f43230e7d8e174542a6e09b2ab7b1675a63e53a84228dbff304f8b6df55b032

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page