Skip to main content

Dataprep: Data Preparation in Python

Project description

DataPrep Build Status

Documentation | Mail List & Forum

Dataprep is a collection of functions that helps you accomplish tasks before you build a predictive model.

Implementation Status

Currently, you can use dataprep to:

  • Collect data from common data sources (through dataprep.data_connector)
  • Do your exploratory data analysis (through dataprep.eda)
  • ...

Installation

pip install dataprep

dataprep is in its alpha stage now, so please manually specific the version number.

Examples & Usages

More detailed examples can be found at the examples folder.

Data Connector

You can download Yelp business search result into a pandas DataFrame, using two lines of code, without taking deep looking into the Yelp documentation!

from dataprep.data_connector import Connector

dc = Connector("yelp", auth_params={"access_token":"<Your yelp access token>"})
df = dc.query("businesses", term="ramen", location="vancouver")

DataConnectorResult

EDA

There are common tasks during the exploratory data analysis stage, like a quick look at the columnar distribution, or understanding the correlations between columns.

The EDA module categorizes these EDA tasks into functions helping you finish EDA tasks with a single function call.

  • Want to understand the distributions for each DataFrame column? Use plot.
from dataprep.eda import plot

df = ...

plot(df)
  • Want to understand the correlation between columns? Use plot_correlation.
from dataprep.eda import plot_correlation

df = ...

plot_correlation(df)
  • Or, if you want to understand the impact of the missing values for each column, use plot_missing.
from dataprep.eda import plot_missing

df = ...

plot_missing(df)
  • You can even drill down to get more information by given plot, plot_correlation and plot_missing a column name.
df = ...

plot_missing(df, x="some_column_name")

Don't forget to checkout the examples folder for detailed demonstration!

Contribution

Contribution is always welcome. If you want to contribute to dataprep, be sure to read the contribution guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataprep-0.1.0.tar.gz (40.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataprep-0.1.0-py3-none-any.whl (50.1 kB view details)

Uploaded Python 3

File details

Details for the file dataprep-0.1.0.tar.gz.

File metadata

  • Download URL: dataprep-0.1.0.tar.gz
  • Upload date:
  • Size: 40.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.0b9 CPython/3.7.5 Linux/4.4.0-169-generic

File hashes

Hashes for dataprep-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fc0875bd549912aa6401a3df3ec4866cd2d969241a5ba08b1de85daa0016526e
MD5 d76737b44562bd25ea3f22e4411b08fa
BLAKE2b-256 86dbd8fa984ec981d83dffdb2cbd7b1bb46995a3b5b7e634c55b4ded289a1e41

See more details on using hashes here.

File details

Details for the file dataprep-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dataprep-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 50.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.0b9 CPython/3.7.5 Linux/4.4.0-169-generic

File hashes

Hashes for dataprep-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 609449f6cef1c8d43bd583e143439a41a070549c7b5a3884914c4ab7b3bd8177
MD5 3b8777d94689d1bd7cc3ee041350ecd7
BLAKE2b-256 1eebcc59b474b5bb5ba12cfeb7f202eec2b39a252ede2f9474ac763e94bd7c4c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page