Skip to main content

Pandas categorical profiling. Generates html profile report for categorical dataset. Also provides several handful functions.

Project description

pandas-cat

PyPI - License PyPI - Python Version PyPI - Wheel PyPI - Status PyPI - Downloads

The pandas-cat is a Pandas's categorical profiling library.

pandas-cat is abbreviation of PANDAS-CATegorical profiling. This package provides profile for categorical attributes as well as (optional) adjustments of data set, e.g. estimating whether variable is numeric and order categories with respect to numbers etc.

The pandas-cat in more detail

The package creates (html) profile of the categorical dataset. It supports both ordinal (ordered) categories as well as nominal ones. Moreover, it overcomes typical issues with categorical, mainly ordered data that are typically available, like that categories are de facto numbers, or numbers with some enhancement and should be treated as ordered.

For example, in dataset Accidents

attribute Hit Objects in can be used as:

  • unordered: 0.0 10.0 7.0 11.0 4.0 2.0 8.0 1.0 9.0 6.0 5.0 12.0 nan
  • ordered: 0.0 1.0 10.0 11.0 12.0 2.0 4.0 5.0 6.0 7.0 8.0 9.0 nan
  • as analyst wishes (package does): 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0 nan

Typical issues are (numbers are nor numbers):

  • categories are intervals (like 75-100, 101-200)
  • have category with some additional information (e.g. Over 75, 60+, <18, Under 16)
  • have n/a category explicitly coded sorted in data

Therefore this library provides profiling as well as somehow automatic data preparation.

Currently, there are two methods in place:

  • profile -- profiles a dataset, categories and their correlations
  • prepare -- prepares a dataset, tries to understand label names (if they are numbers) and sort them

Installation

You can install the package using

pip install pandas-cat

Usage

The usage of this package is simple. Sample code follows (it uses dataset Accidents based on Kaggle dataset)

import pandas as pd
from pandas_cat import pandas_cat

#read dataset. You can download it and setup path to local file.
df = pd.read_csv ('https://petrmasa.com/pandas-cat/data/accidents.zip', encoding='cp1250', sep='\t')

#use only selected columns
df=df[['Driver_Age_Band','Driver_IMD','Sex','Journey']]

#longer demo report uses this set of columns instead of the first one
#df=df[['Driver_Age_Band','Driver_IMD','Sex','Journey','Hit_Objects_in','Hit_Objects_off','Casualties','Severity','Area','Vehicle_Age','Road_Type','Speed_limit','Light','Vehicle_Location','Vehicle_Type']]


#for profiling, use following code
pandas_cat.profile(df=df,dataset_name="Accidents",opts={"auto_prepare":True})

#for just adjusting dataset, use following code

df = pandas_cat.prepare(df)

Data and sample reports

Sample reports are here - basic and longer. Note that these reports have been generated with code above.

The dataset is downloaded from the web (each time you run the code). If you want, you can download sample dataset here and store it locally.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas-cat-0.1.2.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

pandas_cat-0.1.2-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file pandas-cat-0.1.2.tar.gz.

File metadata

  • Download URL: pandas-cat-0.1.2.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.5

File hashes

Hashes for pandas-cat-0.1.2.tar.gz
Algorithm Hash digest
SHA256 dc1e48791c1eba6a0be0f05f9278b3656d55022988d0b330587dcc9142f21c39
MD5 28ca03a038a26c5d16a72263999ad9a1
BLAKE2b-256 eb26b58f0212c851f3959117870c4c0fead2acf94692a8e56f3e9cbd73d25752

See more details on using hashes here.

File details

Details for the file pandas_cat-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: pandas_cat-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 12.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.5

File hashes

Hashes for pandas_cat-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 525f8b5e2fd19ae604732a3a2a26f20be94398a176e9f7173bdbb6ea969b4fd4
MD5 4d6889357d6ce6326de7ddc9eaf9eb06
BLAKE2b-256 1db8812042e6866317df4533539658dc7ed08ea6bccfb6cf960ff2f746cf276c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page