Skip to main content

Dataprep: Data Preparation in Python

Project description


License Doc Badge Version Python Version Downloads Codecov Build Status Chat

Dataprep lets you prepare your data using a single library with a few lines of code.

Currently, you can use dataprep to:

  • Collect data from common data sources (through dataprep.data_connector)
  • Do your exploratory data analysis (through dataprep.eda)
  • ...more modules are coming

Documentation | Mail List & Forum

Installation

pip install dataprep

Examples & Usages

The following examples can give you an impression of what dataprep can do:

EDA

There are common tasks during the exploratory data analysis stage, like a quick look at the columnar distribution, or understanding the correlations between columns.

The EDA module categorizes these EDA tasks into functions helping you finish EDA tasks with a single function call.

  • Want to understand the distributions for each DataFrame column? Use plot.
  • Want to understand the correlation between columns? Use plot_correlation.
  • Or, if you want to understand the impact of the missing values for each column, use plot_missing.
  • You can drill down to get more information by given plot, plot_correlation and plot_missing a column name. E.g. for plot_missing:

Don't forget to checkout the examples folder for detailed demonstration!

Data Connector

You can download Yelp business search result into a pandas DataFrame, using two lines of code, without taking deep looking into the Yelp documentation! Moreover, Data Connector will automatically do the pagination for you so that you can specify the desire count of the returned results without even considering the count-per-request restriction from the API.

The code requests 120 records even though Yelp restricts you can only fetch 50 per request.

Contribute

There are many ways to contribute to Dataprep.

  • Submit bugs and help us verify fixes as they are checked in.
  • Review the source code changes.
  • Engage with other Dataprep users and developers on StackOverflow.
  • Help each other in the Dataprep Community Discord and Mail list & Forum.
  • Twitter
  • Contribute bug fixes.
  • Providing use cases and writing down your user experience.

Please take a look at our wiki for development documentations!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataprep-0.2.7.tar.gz (65.5 kB view details)

Uploaded Source

Built Distribution

dataprep-0.2.7-py3-none-any.whl (75.1 kB view details)

Uploaded Python 3

File details

Details for the file dataprep-0.2.7.tar.gz.

File metadata

  • Download URL: dataprep-0.2.7.tar.gz
  • Upload date:
  • Size: 65.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.5 CPython/3.7.5 Linux/4.4.0-184-generic

File hashes

Hashes for dataprep-0.2.7.tar.gz
Algorithm Hash digest
SHA256 4d1aa1d78c1fee58f0203d1c91a23e341a50c9945e5fa7a4678b279e8800ee9a
MD5 48ee18a1b5ca58d6b0a486a96279e58f
BLAKE2b-256 bfcef49de7ce71471f31c97b0c838f7dfb26b8c7dc347fe36072826e445f3877

See more details on using hashes here.

File details

Details for the file dataprep-0.2.7-py3-none-any.whl.

File metadata

  • Download URL: dataprep-0.2.7-py3-none-any.whl
  • Upload date:
  • Size: 75.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.5 CPython/3.7.5 Linux/4.4.0-184-generic

File hashes

Hashes for dataprep-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 4daf633a8329bbbbc6e7c493f9da2d7ae75ba5f99f2f58efec0f31f5ff6fcfb5
MD5 f591ee17cdb90dbdab764c978c1bfcbc
BLAKE2b-256 d03a0e5c9c6645f7d470b96c82f86a9067919bc71abbc331e45c2ec0bb53133f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page