Skip to main content

This library takes one data frame and returns another with a detailed profile of each column

Project description

DataProfile Library

Description: the principal fuction of this library is create a data base profile fast view. For this recorre a dataframe in order to analyze diferentes relevants features for each column.

The princial features are: Count: count the number of records. Return a numeric

                       Count distinct: count the number of distincs recors. Return a numeric.

                       Unique: count the unique records. Return a numeric.

                       Id probability: calculate a probability that the column be a id. For that evaluate the data type, the name of the column, the number of unique ids, the amoun of empty and null records. And with all this information estimate a probability. Return a percent.

                       Email probability: Find the probability that the column contains emails. To do this, count the number of @ and valid domains, then estimate a probbility. Return a percent.

                       Duplicate: Count the duplicate recors per column. Return a numeric. Return a numeric.

                       Numeric: Define whether the data type is numeric. Returns a "True" only if all records in the column are numeric.

                       Letter: Define whether the data type is string. Returns a "True" only if all records in the column are string.

                       Bool: Define whether the data type is bool. Returns a "True" only if all records in the column are bool.

                       Empty: Count the number of empty records per column. Return a numeric.

                       Cero: Count the number of ceros per column. Return a numeric.

                       Null: Count the number of null records per column. Return a numeric.

Install Requires: Pandas Numpy Prettytable

Fuctions: dataprofile(DF): this is the main function. takes as input a DataFrame and returns another one with all the features described above.

Example:

  1. The first step is install the library using pip instal dataprofile: alt text

  2. The second step is import the dataprofile librari: import dataprofile as dp. alt text

  3. The therd step is creat or importa a Dataframe. In this case use read_csv from Pandas for import a csv and creat a DataFrame. alt text

  4. The fored step is use the fuction dataprofile on a Dataframe on this way dp.dataprofile(DataFrame) alt text

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataprofile-1.0.1.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

dataprofile-1.0.1-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file dataprofile-1.0.1.tar.gz.

File metadata

  • Download URL: dataprofile-1.0.1.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for dataprofile-1.0.1.tar.gz
Algorithm Hash digest
SHA256 05dee6a7f1aed09bbfd7e5f3d01f1be8fda22b8475b435a53e300c7cb4b87173
MD5 2670d00c0c6e87854b6d6885f64d0063
BLAKE2b-256 f29747b010d8ce0036ec32bc4a2c81ea64829bfc4a95ddeb3c7665777e1b8368

See more details on using hashes here.

File details

Details for the file dataprofile-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: dataprofile-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 7.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for dataprofile-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 918b08112301f8f12a2214d75bf664236eade30bdda8fe6623c022779ab86940
MD5 9f57db45a077d5f7bc38dae9d8873dd2
BLAKE2b-256 8cae67847940b65d617cb36da75ea043fbe8b6feed5cc2b8cacd7a8f1f6663d3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page