Dataprep: Data Preparation in Python
Project description
Documentation | Forum | Mail List
Dataprep lets you prepare your data using a single library with a few lines of code.
Currently, you can use dataprep
to:
- Collect data from common data sources (through
dataprep.connector
) - Do your exploratory data analysis (through
dataprep.eda
) - ...more modules are coming
Releases
Installation
pip install -U dataprep
Examples & Usages
The following examples can give you an impression of what dataprep can do:
EDA
There are common tasks during the exploratory data analysis stage, like a quick look at the columnar distribution, or understanding the correlations between columns.
The EDA module categorizes these EDA tasks into functions helping you finish EDA tasks with a single function call.
- Want to understand the distributions for each DataFrame column? Use
plot
.
- Want to understand the correlation between columns? Use
plot_correlation
.
- Or, if you want to understand the impact of the missing values for each column, use
plot_missing
.
You can drill down to get more information by given plot
, plot_correlation
and plot_missing
a column name.: E.g. for plot_missing
for numerical column usingplot
:
for categorical column usingplot
:
Don't forget to checkout the examples folder for detailed demonstration!
Connector
Connector provides a simple way to collect data from different websites, offering several benefits:
- A unified API: you can fetch data using one or two lines of code to get data from many websites.
- Auto Pagination: it automatically does the pagination for you so that you can specify the desired count of the returned results without even considering the count-per-request restriction from the API.
- Smart API request strategy: it can issue API requests in parallel while respecting the rate limit policy.
In the following examples, you can download the Yelp business search result into a pandas DataFrame, using only two lines of code, without taking deep looking into the Yelp documentation! More examples can be found here: Examples
Contribute
There are many ways to contribute to Dataprep.
- Submit bugs and help us verify fixes as they are checked in.
- Review the source code changes.
- Engage with other Dataprep users and developers on StackOverflow.
- Help each other in the Dataprep Community Discord and Mail list & Forum.
- Contribute bug fixes.
- Providing use cases and writing down your user experience.
Please take a look at our wiki for development documentations!
Acknowledgement
Some functionalities of DataPrep are inspired by the following packages.
-
Inspired the report functionality and insights provided in DataPrep.eda.
-
Inspired the missing value analysis in DataPrep.eda.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dataprep-0.2.14.tar.gz
.
File metadata
- Download URL: dataprep-0.2.14.tar.gz
- Upload date:
- Size: 121.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.10 CPython/3.8.4 Linux/4.4.0-184-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 93da9be3a2efdb0d4c2937c845c43135b9215bd9760f66b06b0134d8751c1b7f |
|
MD5 | d3236c0e51055523894022be976d7ba3 |
|
BLAKE2b-256 | 34406d720d38081c9f4bb891c52f93c30cd9592aec612d9484fac58bfeb41d78 |
File details
Details for the file dataprep-0.2.14-py3-none-any.whl
.
File metadata
- Download URL: dataprep-0.2.14-py3-none-any.whl
- Upload date:
- Size: 155.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.10 CPython/3.8.4 Linux/4.4.0-184-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2fdb77c2399e0bd6a02d200171443f266dc06f626f157299dfd7e8351cd4cc4b |
|
MD5 | 94722e3ebdda3004c8b4de88d342ace1 |
|
BLAKE2b-256 | f6955ed571a5f115be0e3aa8f93561ffdc5c72a17105cd48dbcc43584a908872 |