This library takes one data frame and returns another with a detailed profile of each column
Project description
DataProfile Library
Description: the principal fuction of this library is create a data base profile fast view. For this recorre a dataframe in order to analyze diferentes relevants features for each column.
The princial features are: Count: count the number of records. Return a numeric
Count distinct: count the number of distincs recors. Return a numeric.
Unique: count the unique records. Return a numeric.
Id probability: calculate a probability that the column be a id. For that evaluate the data type, the name of the column, the number of unique ids, the amoun of empty and null records. And with all this information estimate a probability. Return a percent.
Email probability: Find the probability that the column contains emails. To do this, count the number of @ and valid domains, then estimate a probbility. Return a percent.
Duplicate: Count the duplicate recors per column. Return a numeric. Return a numeric.
Numeric: Define whether the data type is numeric. Returns a "True" only if all records in the column are numeric.
Letter: Define whether the data type is string. Returns a "True" only if all records in the column are string.
Bool: Define whether the data type is bool. Returns a "True" only if all records in the column are bool.
Empty: Count the number of empty records per column. Return a numeric.
Cero: Count the number of ceros per column. Return a numeric.
Null: Count the number of null records per column. Return a numeric.
Install Requires: Pandas Numpy Prettytable
Fuctions: dataprofile(DF): this is the main function. takes as input a DataFrame and returns another one with all the features described above.
Example:
-
The first step is install the library using pip instal dataprofile:
-
The second step is import the dataprofile librari: import dataprofile as dp.
-
The therd step is creat or importa a Dataframe. In this case use read_csv from Pandas for import a csv and creat a DataFrame.
-
The fored step is use the fuction dataprofile on a Dataframe on this way dp.dataprofile(DataFrame)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dataprofile-1.0.1.tar.gz
.
File metadata
- Download URL: dataprofile-1.0.1.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 05dee6a7f1aed09bbfd7e5f3d01f1be8fda22b8475b435a53e300c7cb4b87173 |
|
MD5 | 2670d00c0c6e87854b6d6885f64d0063 |
|
BLAKE2b-256 | f29747b010d8ce0036ec32bc4a2c81ea64829bfc4a95ddeb3c7665777e1b8368 |
File details
Details for the file dataprofile-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: dataprofile-1.0.1-py3-none-any.whl
- Upload date:
- Size: 7.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 918b08112301f8f12a2214d75bf664236eade30bdda8fe6623c022779ab86940 |
|
MD5 | 9f57db45a077d5f7bc38dae9d8873dd2 |
|
BLAKE2b-256 | 8cae67847940b65d617cb36da75ea043fbe8b6feed5cc2b8cacd7a8f1f6663d3 |