Python implementation of Stata's tabulate command for pandas DataFrames
Project description
pandas-tabulate
Python implementation of Stata's tabulate command for pandas DataFrames.
pandas-tabulate brings the power and familiarity of Stata's tabulate command to Python, providing comprehensive cross-tabulation and frequency analysis tools that seamlessly integrate with pandas DataFrames.
Key Features
- Comprehensive tabulation: One-way and two-way frequency tables
- Statistical analysis: Chi-square tests, Fisher exact tests, and other statistical measures
- Flexible formatting: Multiple output formats and customization options
- Missing value handling: Configurable treatment of missing data
- Stata compatibility: Familiar syntax and output format for Stata users
- Performance optimized: Efficient implementation using pandas and NumPy
Installation
pip install pandas-tabulate
Quick Start
import pandas as pd
import pandas_tabulate as ptab
# Create sample data
df = pd.DataFrame({
'gender': ['M', 'F', 'M', 'F', 'M', 'F', 'M', 'F'],
'education': ['High', 'Low', 'High', 'High', 'Low', 'Low', 'High', 'Low'],
'income': [50000, 30000, 60000, 45000, 35000, 25000, 55000, 28000]
})
# One-way tabulation
result = ptab.tabulate(df['gender'])
print(result)
# Two-way tabulation with statistics
result = ptab.tabulate(df['gender'], df['education'],
chi2=True, exact=True)
print(result)
Available Functions
Core Tabulation Functions
tabulate(var1, var2=None, **kwargs)- Main tabulation functiononeway(variable, **kwargs)- One-way frequency tablestwoway(var1, var2, **kwargs)- Two-way cross-tabulation
Statistical Tests
- Chi-square test - Test of independence for categorical variables
- Fisher exact test - Exact test for small sample sizes
- Likelihood ratio test - Alternative test of independence
- Cramér's V - Measure of association strength
Output Options
- Frequencies - Raw counts
- Percentages - Row, column, and total percentages
- Cumulative - Cumulative frequencies and percentages
- Missing handling - Include/exclude missing values
Detailed Examples
One-way Tabulation
import pandas as pd
import pandas_tabulate as ptab
# Basic frequency table
df = pd.DataFrame({'status': ['A', 'B', 'A', 'C', 'B', 'A', 'C']})
result = ptab.oneway(df['status'])
print(result)
# With percentages and cumulative statistics
result = ptab.oneway(df['status'],
percent=True,
cumulative=True)
print(result)
Two-way Cross-tabulation
# Basic cross-tabulation
result = ptab.twoway(df['gender'], df['education'])
print(result)
# With row and column percentages
result = ptab.twoway(df['gender'], df['education'],
row_percent=True,
col_percent=True)
print(result)
# With statistical tests
result = ptab.twoway(df['gender'], df['education'],
chi2=True,
exact=True,
cramers_v=True)
print(result)
Missing Value Handling
import numpy as np
# Data with missing values
df_missing = pd.DataFrame({
'var1': ['A', 'B', np.nan, 'A', 'C'],
'var2': ['X', np.nan, 'Y', 'X', 'Y']
})
# Exclude missing values (default)
result = ptab.twoway(df_missing['var1'], df_missing['var2'])
# Include missing values
result = ptab.twoway(df_missing['var1'], df_missing['var2'],
missing=True)
Stata to Python Translation Guide
| Stata Command | pandas-tabulate Equivalent |
|---|---|
tabulate var1 |
ptab.oneway(df['var1']) |
tabulate var1, missing |
ptab.oneway(df['var1'], missing=True) |
tabulate var1 var2 |
ptab.twoway(df['var1'], df['var2']) |
tabulate var1 var2, chi2 |
ptab.twoway(df['var1'], df['var2'], chi2=True) |
tabulate var1 var2, exact |
ptab.twoway(df['var1'], df['var2'], exact=True) |
tabulate var1 var2, row col |
ptab.twoway(df['var1'], df['var2'], row_percent=True, col_percent=True) |
Function Reference
tabulate(var1, var2=None, **kwargs)
Main tabulation function that automatically determines whether to perform one-way or two-way tabulation.
Parameters:
var1: pandas Series - First variablevar2: pandas Series, optional - Second variable for cross-tabulationpercent: bool, default False - Show percentagescumulative: bool, default False - Show cumulative statisticschi2: bool, default False - Perform chi-square testexact: bool, default False - Perform Fisher exact testmissing: bool, default False - Include missing values
Returns:
- TabulationResult object with tables and statistics
Statistical Tests
All statistical tests return results with:
- Test statistic
- p-value
- Degrees of freedom (where applicable)
- Critical value
- Interpretation
Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Setup
git clone https://github.com/brycewang-stanford/pandas-tabulate.git
cd pandas-tabulate
pip install -e ".[dev]"
python -m pytest tests/
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Inspired by Stata's tabulate command
- Built on pandas, NumPy, and SciPy
- Thanks to the open-source community for feedback and contributions
Support
- Bug Reports: GitHub Issues
- Feature Requests: GitHub Discussions
- Email: brycew6m@stanford.edu
If this package helps your research, please consider starring the repository!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pandas_tabulate-0.1.0.tar.gz.
File metadata
- Download URL: pandas_tabulate-0.1.0.tar.gz
- Upload date:
- Size: 9.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7bd608a848c02f949f543ed7d404b5b760e8f78c730e900214232e4ea567b451
|
|
| MD5 |
48ca26244549a4fecf15200a171e60c7
|
|
| BLAKE2b-256 |
442df79636694d2cb8d261192e11d3f465e96c7c7c79b066c24dc7983f2749c9
|
File details
Details for the file pandas_tabulate-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pandas_tabulate-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fdd8000e6f11579b83ee75480c4934f517bb2c51ecdf625b0ec7e042251a59f1
|
|
| MD5 |
ffb42d3821eb13c76aca17a57c505947
|
|
| BLAKE2b-256 |
0ff25ee6960abfacd83877b0737f88ed5ce1e401bf94fc9f2ec664e5d106ca16
|