Skip to main content

Dataprep: Data Preparation in Python

Project description


Documentation | Forum | Mail List

Dataprep lets you prepare your data using a single library with a few lines of code.

Currently, you can use dataprep to:

  • Collect data from common data sources (through dataprep.connector)
  • Do your exploratory data analysis (through dataprep.eda)
  • ...more modules are coming

Releases

Repo Version Downloads
PyPI
conda-forge

Installation

pip install -U dataprep

Examples & Usages

The following examples can give you an impression of what dataprep can do:

EDA

There are common tasks during the exploratory data analysis stage, like a quick look at the columnar distribution, or understanding the correlations between columns.

The EDA module categorizes these EDA tasks into functions helping you finish EDA tasks with a single function call.

  • Want to understand the distributions for each DataFrame column? Use plot.

  • Want to understand the correlation between columns? Use plot_correlation.

  • Or, if you want to understand the impact of the missing values for each column, use plot_missing.

You can drill down to get more information by given plot, plot_correlation and plot_missing a column name.: E.g. for plot_missing

    for numerical column usingplot:

    for categorical column usingplot:

Don't forget to checkout the examples folder for detailed demonstration!

Connector

Connector provides a simple way to collect data from different websites, offering several benefits:

  • A unified API: you can fetch data using one or two lines of code to get data from many websites.
  • Auto Pagination: it automatically does the pagination for you so that you can specify the desired count of the returned results without even considering the count-per-request restriction from the API.
  • Smart API request strategy: it can issue API requests in parallel while respecting the rate limit policy.

In the following examples, you can download the Yelp business search result into a pandas DataFrame, using only two lines of code, without taking deep looking into the Yelp documentation! More examples can be found here: Examples

Contribute

There are many ways to contribute to Dataprep.

  • Submit bugs and help us verify fixes as they are checked in.
  • Review the source code changes.
  • Engage with other Dataprep users and developers on StackOverflow.
  • Help each other in the Dataprep Community Discord and Mail list & Forum.
  • Twitter
  • Contribute bug fixes.
  • Providing use cases and writing down your user experience.

Please take a look at our wiki for development documentations!

Acknowledgement

Some functionalities of DataPrep are inspired by the following packages.

  • Pandas Profiling

    Inspired the report functionality and insights provided in DataPrep.eda.

  • missingno

    Inspired the missing value analysis in DataPrep.eda.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataprep-0.2.13.post2.tar.gz (109.9 kB view details)

Uploaded Source

Built Distribution

dataprep-0.2.13.post2-py3-none-any.whl (141.5 kB view details)

Uploaded Python 3

File details

Details for the file dataprep-0.2.13.post2.tar.gz.

File metadata

  • Download URL: dataprep-0.2.13.post2.tar.gz
  • Upload date:
  • Size: 109.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.10 CPython/3.8.4 Linux/4.4.0-184-generic

File hashes

Hashes for dataprep-0.2.13.post2.tar.gz
Algorithm Hash digest
SHA256 813fe06e041c45e3873fecd984c686ea064d397fe07d8ea34c2bdde06c311d41
MD5 d4a42a5246826478d6551ef94c13c6b6
BLAKE2b-256 3ba0f89ce81650032fa3202a5a30ff732b68262c46fa4f78be17eebaa51cd02f

See more details on using hashes here.

File details

Details for the file dataprep-0.2.13.post2-py3-none-any.whl.

File metadata

  • Download URL: dataprep-0.2.13.post2-py3-none-any.whl
  • Upload date:
  • Size: 141.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.10 CPython/3.8.4 Linux/4.4.0-184-generic

File hashes

Hashes for dataprep-0.2.13.post2-py3-none-any.whl
Algorithm Hash digest
SHA256 04caf7f22937751060bd2bc0ca335306f02f71bef5d744591b2588d015c421cf
MD5 fa35627a08e761fe30535a5f7a7cb433
BLAKE2b-256 bacdff603454d6f4e3cd7c86b816716156be10209dbdf2539563792d156426d5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page