Dataprep: Data Preparation in Python
Project description
Documentation | Forum | Mail List
Dataprep lets you prepare your data using a single library with a few lines of code.
Currently, you can use dataprep
to:
- Collect data from common data sources (through
dataprep.connector
) - Do your exploratory data analysis (through
dataprep.eda
) - ...more modules are coming
Releases
Installation
pip install -U dataprep
Examples & Usages
The following examples can give you an impression of what dataprep can do:
EDA
There are common tasks during the exploratory data analysis stage, like a quick look at the columnar distribution, or understanding the correlations between columns.
The EDA module categorizes these EDA tasks into functions helping you finish EDA tasks with a single function call.
- Want to understand the distributions for each DataFrame column? Use
plot
.
- Want to understand the correlation between columns? Use
plot_correlation
.
- Or, if you want to understand the impact of the missing values for each column, use
plot_missing
.
You can drill down to get more information by given plot
, plot_correlation
and plot_missing
a column name.: E.g. for plot_missing
for numerical column usingplot
:
for categorical column usingplot
:
Don't forget to checkout the examples folder for detailed demonstration!
Connector
Connector provides a simple way to collect data from different websites, offering several benefits:
- A unified API: you can fetch data using one or two lines of code to get data from many websites.
- Auto Pagination: it automatically does the pagination for you so that you can specify the desired count of the returned results without even considering the count-per-request restriction from the API.
- Smart API request strategy: it can issue API requests in parallel while respecting the rate limit policy.
In the following examples, you can download the Yelp business search result into a pandas DataFrame, using only two lines of code, without taking deep looking into the Yelp documentation! More examples can be found here: Examples
Contribute
There are many ways to contribute to Dataprep.
- Submit bugs and help us verify fixes as they are checked in.
- Review the source code changes.
- Engage with other Dataprep users and developers on StackOverflow.
- Help each other in the Dataprep Community Discord and Mail list & Forum.
- Contribute bug fixes.
- Providing use cases and writing down your user experience.
Please take a look at our wiki for development documentations!
Acknowledgement
Some functionalities of DataPrep are inspired by the following packages.
-
Inspired the report functionality and insights provided in DataPrep.eda.
-
Inspired the missing value analysis in DataPrep.eda.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for dataprep-0.2.13-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7cd7f3085c35a3c4a750c635ba6c92138a258519d39169cd8af29e2d8062a9d8 |
|
MD5 | 032fafae12a80dbbefbe0bda4639d521 |
|
BLAKE2b-256 | 57e5ff096ffbc24966cc56c15d7edcb240eff7f222001482f78c1ccc5207b784 |