Pandas categorical profiling. Generates html profile report for categorical dataset. Also provides several handful functions.

These details have not been verified by PyPI

Project links

Homepage

Project description

pandas-cat

The pandas-cat is a Pandas's categorical profiling library.

pandas-cat is abbreviation of PANDAS-CATegorical profiling. This package provides profile for categorical attributes as well as (optional) adjustments of data set, e.g. estimating whether variable is numeric and order categories with respect to numbers etc.

The pandas-cat in more detail

The package creates (html) profile of the categorical dataset. It supports both ordinal (ordered) categories as well as nominal ones. Moreover, it overcomes typical issues with categorical, mainly ordered data that are typically available, like that categories are de facto numbers, or numbers with some enhancement and should be treated as ordered.

For example, in dataset Accidents

attribute Hit Objects in can be used as:

unordered: 0.0 10.0 7.0 11.0 4.0 2.0 8.0 1.0 9.0 6.0 5.0 12.0 nan
ordered: 0.0 1.0 10.0 11.0 12.0 2.0 4.0 5.0 6.0 7.0 8.0 9.0 nan
as analyst wishes (package does): 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0 nan

Typical issues are (numbers are nor numbers):

categories are intervals (like 75-100, 101-200)
have category with some additional information (e.g. Over 75, 60+, <18, Under 16)
have n/a category explicitly coded sorted in data

Therefore this library provides profiling as well as somehow automatic data preparation.

Currently, there are two methods in place:

profile -- profiles a dataset, categories and their correlations
prepare -- prepares a dataset, tries to understand label names (if they are numbers) and sort them

Installation

You can install the package using

pip install pandas-cat

Usage

The usage of this package is simple. Sample code follows (it uses dataset Accidents based on Kaggle dataset)

import pandas as pd
from pandas_cat import pandas_cat

#read dataset. You can download it and setup path to local file.
df = pd.read_csv ('https://petrmasa.com/pandas-cat/data/accidents.zip', encoding='cp1250', sep='\t')

#use only selected columns
df=df[['Driver_Age_Band','Driver_IMD','Sex','Journey']]

#longer demo report uses this set of columns instead of the first one
#df=df[['Driver_Age_Band','Driver_IMD','Sex','Journey','Hit_Objects_in','Hit_Objects_off','Casualties','Severity','Area','Vehicle_Age','Road_Type','Speed_limit','Light','Vehicle_Location','Vehicle_Type']]


#for profiling, use following code
pandas_cat.profile(df=df,dataset_name="Accidents",opts={"auto_prepare":True})

#for just adjusting dataset, use following code

df = pandas_cat.prepare(df)

Data and sample reports

Sample reports are here - basic and longer. Note that these reports have been generated with code above.

The dataset is downloaded from the web (each time you run the code). If you want, you can download sample dataset here and store it locally.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.2

Dec 27, 2023

This version

0.1.1

Jun 25, 2023

0.1.0

Apr 19, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas-cat-0.1.1.tar.gz (13.3 kB view hashes)

Uploaded Jun 25, 2023 Source

Built Distribution

pandas_cat-0.1.1-py3-none-any.whl (12.0 kB view hashes)

Uploaded Jun 25, 2023 Python 3

Hashes for pandas-cat-0.1.1.tar.gz

Hashes for pandas-cat-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`25b91561a8c76865a5a5a9eb52d71690d85074c56d7d0d1b9ba75d2c914a4602`
MD5	`0647371a413444f02dc3f2136d175eec`
BLAKE2b-256	`1069a1b30afc451440893eab279367124b507897463efe1a95301b0fcc5e0c7d`

Hashes for pandas_cat-0.1.1-py3-none-any.whl

Hashes for pandas_cat-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1d2f2f271b3b501c5bed5b99d3cb18570a211aa48ccb4347a04a44f69654ea44`
MD5	`d2f7c950d4a5830c841234d2197de4e9`
BLAKE2b-256	`21f992e9dc98837b714b5bac94f842fca9017fd02114ccf4e2ab8ed5dc0a8e78`