Pandas categorical utils - lightweight automatic category ordering, missing category detection, profiling - attribute frequencies, correlations, category correlations.

These details have not been verified by PyPI

Project links

Homepage

Project description

pandas-cat

The pandas-cat is a Pandas's categorical utils library.

pandas-cat is abbreviation of PANDAS-CATegorical utils. This package provides

automatic ordering for ordinal variables - a lightweigth module for converting string categories to ordered ones if possible (based on numbers inside texts, like "Over 25"
advanced missing value detection - detection of typical missing data encoding (typical = detect encodings that we have manually identified in more than 100+ datasets)
categorical data profiling - profile for categorical attributes

The pandas-cat in more detail

Ordinal data ordering

This package tries to convert strings to ordered categories. For example (Vehicle_Age in Accidents dataset),

ORIGINAL (unordered)                                           : 1 6 5 4 9 14 >20 10 8 15 12 11 16-20 3 13 2 7   
ALPHABETICALLY ORDERED (strings do not allow numeric ordering) : >20 1 10 11 12 13 14 15 16-20 2 3 4 5 6 7 8 9 
AS ANALYST WISHES (package does)                               : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16-20 >20

Typical issues are (numbers are nor numbers):

categories are intervals (like 75-100, 101-200)
have category with some additional information (e.g. Over 75, 60+, <18, Under 16)
have n/a or other string category explicitly coded sorted in data

Missing detection and replacement

Missing values are typically encoded in many ways (N/A, #N/A, NA, invalid, NO INFO, NOT RATED, NOT GIVEN, NOT DEFINED, undefined, ...). We have manually went through more than 100 datasets with missing values and detected typical encoding of missing values and added to this library as an automatic missing detection and replacement.

The build-in list can be easily extended (by adding next strings as parameters). Also, when some string should be preserved, package can be instructed to keep it.

Profiling

The package creates (html) profile of the categorical dataset. It supports both ordinal (ordered) categories as well as nominal ones. Currently, there are two templates available

standard - standard template with embedded charts
interactive - interactive template with dynamically generated charts

The report contains:

an overview - basic information about the dataset, consumption in the memory and category names
profiles - profiles of attributes - charts with frequencies of categories (all, not first TOP n as typical for universal profiling packages)
correlations - correlations between attributes and values

Installation and using the package

You can install the package using

pip install pandas-cat

To load your dataset into a Pandas DataFrame, you can use the read_csv() method for CSV files or the read_excel() method for Excel files. Both methods support a parameter called keep_default_na, which you can set to False. This prevents Pandas from detecting missing values, as pandas-cat offers a much more comprehensive detection system, including all the values Pandas detects. For faster report generation, you can select specific columns for analysis by filtering them directly in Pandas.

Sample Code

import pandas as pd
from pandas_cat import pandas_cat

# Read dataset. You can download it and set up a path to the local file.
df = pd.read_csv('https://petrmasa.com/pandas-cat/data/accidents.zip',
                 encoding='cp1250', sep='\t')

# Use only selected columns
df = df[['Driver_Age_Band', 'Driver_IMD', 'Sex', 'Journey']]

# Generate a profile report with the default template
pandas_cat.profile(df=df, dataset_name="Accidents", opts={"auto_prepare": True})

For longer demo report use this set of columns instead of the first one

df = df[['Driver_Age_Band','Driver_IMD','Sex','Journey','Hit_Objects_in','Hit_Objects_off','Casualties','Severity','Area','Vehicle_Age','Road_Type','Speed_limit','Light','Vehicle_Location','Vehicle_Type']]

To generate interactive report, set the template to interactive

pandas_cat.profile(df=df, dataset_name="Accidents", template="interactive", opts={"auto_prepare": True})

For advanced customization, use additional options

pandas_cat.profile(
    df=df,
    dataset_name="Accidents",
    template="interactive",
    opts={
        "auto_prepare": True,
        "cat_limit": 60,  # Maximum categories for profiling
        "na_values": ["MyNA", "MyNull"],  # Custom missing values
        "na_ignore": ["NA"],  # Exclude specific values from missing detection
        "keep_default_na": True  # Use default missing values build-in list
    }
)

To adjust the dataset only without generating a report

df = pandas_cat.prepare(df)

Data and sample reports

Sample reports are here

The dataset is downloaded from the web (each time you run the code). If you want, you can download sample dataset here and store it locally.

Credits

Petr Masa - Base package, basic data preparation

Jan Nejedly - Interactive report, handling missing values

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.5

May 31, 2026

This version

0.1.4

Apr 9, 2026

0.1.3

Jan 1, 2025

0.1.2

Dec 27, 2023

0.1.1

Jun 25, 2023

0.1.0

Apr 19, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_cat-0.1.4.tar.gz (47.7 kB view details)

Uploaded Apr 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pandas_cat-0.1.4-py3-none-any.whl (57.5 kB view details)

Uploaded Apr 9, 2026 Python 3

File details

Details for the file pandas_cat-0.1.4.tar.gz.

File metadata

Download URL: pandas_cat-0.1.4.tar.gz
Upload date: Apr 9, 2026
Size: 47.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for pandas_cat-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`5cdd19d30821e80455f4d00b241de129e9176431cb91feca2d00d268fb902db4`
MD5	`608f09a80b50af3708f389700da04c57`
BLAKE2b-256	`2cdce41391ac25d97c8fc967dfc1acf7c672f5c6d312ecc675b5d3373786ea99`

See more details on using hashes here.

File details

Details for the file pandas_cat-0.1.4-py3-none-any.whl.

File metadata

Download URL: pandas_cat-0.1.4-py3-none-any.whl
Upload date: Apr 9, 2026
Size: 57.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for pandas_cat-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`037606b74e176bbb2fe2062a6d108c3675a476ef657e60bd3b33b959ee70a787`
MD5	`1289e8105a8b511f856947042583be5a`
BLAKE2b-256	`1fa51904a96d103f46e13497957b3b5a07b3d7b6b940ec824347b402f8e9cdc8`

See more details on using hashes here.

pandas-cat 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pandas-cat

The pandas-cat is a Pandas's categorical utils library.

The pandas-cat in more detail

Ordinal data ordering

Missing detection and replacement

Profiling

Installation and using the package

Sample Code

Data and sample reports

Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes