Skip to main content

One-line data cleaning for pandas/Polars with reports and a reversible patch.

Project description

CI

dataprep-ai

One-line, opinionated data cleaning for pandas/Polars.

Fix missing values, inconsistent categories, outliers, and duplicates with transparent logs and a reproducible report.

pip install dataprep-ai

Quickstart

import pandas as pd from dataprep_ai import clean, CleaningConfig

df = pd.DataFrame({ "age":[23, None, 25, 1000], "income":[52000, 58000, None, 1200000], "city":["NY","New York","nyc", None], "id":[1,2,2,4] })

result = clean(df, CleaningConfig( id_columns=["id"], outlier_strategy="iqr_cap", categorical_normalization=True ))

print(result.summary_markdown) df_clean = result.df result.to_json("clean_report.json")

Streamlit explorer pip install "dataprep-ai[app]" streamlit run -m dataprep_ai.explore -- --csv your.csv

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataprep_ai-0.1.1.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataprep_ai-0.1.1-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file dataprep_ai-0.1.1.tar.gz.

File metadata

  • Download URL: dataprep_ai-0.1.1.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dataprep_ai-0.1.1.tar.gz
Algorithm Hash digest
SHA256 8828439720093e0923b4c44bcf1a26b43fe4204b7f61c356bef990b1788788d6
MD5 bc17779418e966793e13664f8363bbb6
BLAKE2b-256 7a5a7247696337ce3b4b5e833a2f863057492be52b9f5ea9c1357633cda2dd26

See more details on using hashes here.

File details

Details for the file dataprep_ai-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: dataprep_ai-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 6.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dataprep_ai-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7538feb2f1f188a63a82bf34c3c265fa661213ef417c397fc845d608beb52684
MD5 4d90460113d53248b7c0dcd3c228c164
BLAKE2b-256 6db70016e33437a95605d66c4f6ea4052345b15b9431072ff0f776ce13097de5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page