Skip to main content

Dataprep: Data Preparation in Python

Project description

Dataprep Build Status

Documentation | Mail List & Forum

Dataprep let you prepare your data using a single library with a few lines of code.

Currently, you can use dataprep to:

  • Collect data from common data sources (through dataprep.data_connector)
  • Do your exploratory data analysis (through dataprep.eda)
  • ...more modules are coming

Installation

pip install dataprep

Examples & Usages

Detailed examples can be found in the examples folder.

EDA

There are common tasks during the exploratory data analysis stage, like a quick look at the columnar distribution, or understanding the correlations between columns.

The EDA module categorizes these EDA tasks into functions helping you finish EDA tasks with a single function call.

  • Want to understand the distributions for each DataFrame column? Use plot.
from dataprep.eda import plot

df = ...

plot(df)
  • Want to understand the correlation between columns? Use plot_correlation.
from dataprep.eda import plot_correlation

df = ...

plot_correlation(df)
  • Or, if you want to understand the impact of the missing values for each column, use plot_missing.
from dataprep.eda import plot_missing

df = ...

plot_missing(df)
  • You can even drill down to get more information by given plot, plot_correlation and plot_missing a column name.
df = ...

plot_missing(df, x="some_column_name")

Don't forget to checkout the examples folder for detailed demonstration!

Data Connector

You can download Yelp business search result into a pandas DataFrame, using two lines of code, without taking deep looking into the Yelp documentation!

from dataprep.data_connector import Connector

dc = Connector("yelp", auth_params={"access_token":"<Your yelp access token>"})
df = dc.query("businesses", term="ramen", location="vancouver")

DataConnectorResult

Contribution

Dataprep is in its early stage. Any contribution including:

  • Filing an issue
  • Providing use cases
  • Writing down your user experience
  • Submitting a PR
  • ...

are greatly appreciated!

Please take a look at our wiki for development documentations!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataprep-0.2.2.tar.gz (43.6 kB view details)

Uploaded Source

Built Distribution

dataprep-0.2.2-py3-none-any.whl (54.2 kB view details)

Uploaded Python 3

File details

Details for the file dataprep-0.2.2.tar.gz.

File metadata

  • Download URL: dataprep-0.2.2.tar.gz
  • Upload date:
  • Size: 43.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.17 CPython/3.7.3 Darwin/19.3.0

File hashes

Hashes for dataprep-0.2.2.tar.gz
Algorithm Hash digest
SHA256 64fbb7adcf980fbb50b141cf7d782ab35fe0d1f15e4c0c19583212d5d98bbea9
MD5 4de4ce8e8e535c7784e6044a7283de8e
BLAKE2b-256 361dddbda5f832e8cdb4c43aaa88d4628876b8134fefbe25ef97f242f6db3596

See more details on using hashes here.

File details

Details for the file dataprep-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: dataprep-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 54.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.17 CPython/3.7.3 Darwin/19.3.0

File hashes

Hashes for dataprep-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 63f59f7dd93d2dace7fe9b7939ebad649430bb52f6e5944ba9e724ccc5106428
MD5 fcade04eb8c5453797f038f3fe2744c2
BLAKE2b-256 48c4e712573ad9ad34275419026ac12e2c4e193a66e544895d98c354b8abe7df

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page