A Python package for tidying and visualizing survey data
Project description
TidyViz
A Python package for survey data cleaning and visualization.
Features
Data Cleaning
- Expand/collapse multiple choice responses
- Validate response ranges
- Detect missing data patterns
- Flag straight-lining and speeders
- Check logical consistency
Visualization
- Single choice bar charts
- Multiple choice bar charts
- Custom color palettes
- Survey-appropriate styling
Installation
Install from PyPI:
pip install tidyviz
Development installation:
git clone https://github.com/pingfan-hu/tidyviz.git
cd tidyviz
pip install -e ".[dev]"
Quick Start
import pandas as pd
import tidyviz as tv
# Load survey data
df = pd.read_csv('survey.csv')
# Clean: Expand multiple choice column
df_expanded = tv.tidy.expand_multiple_choice(df, 'colors')
# Validate: Check response ranges
df_clean, invalid = tv.tidy.check_response_range(
df, 'satisfaction', min_val=1, max_val=5
)
# Visualize: Plot single choice responses
tv.viz.set_survey_style(palette='categorical')
tv.viz.plot_single_choice(df, 'contact_method',
title='Preferred Contact',
show_percentages=True)
Documentation
Data Cleaning (tv.tidy)
Multiple Choice Handling
# Expand comma-separated values to binary columns
df_exp = tv.tidy.expand_multiple_choice(df, 'colors', sep=',')
# Creates: colors_Red, colors_Blue, colors_Green...
# Collapse binary columns back to comma-separated
df_col = tv.tidy.collapse_multiple_choice(df_exp, 'colors')
Response Validation
# Flag invalid responses
df, invalid_mask = tv.tidy.check_response_range(
df, 'rating', min_val=1, max_val=5,
handle_invalid='flag'
)
# Remove invalid responses
df_clean, _ = tv.tidy.check_response_range(
df, 'rating', min_val=1, max_val=5,
handle_invalid='remove'
)
Data Quality Checks
# Detect missing data patterns
info = tv.tidy.detect_missing_patterns(df)
# Returns: complete_rows, rows_with_missing, missing_rates
# Flag straight-liners (same response across questions)
flags = tv.tidy.flag_straight_liners(df, ['Q1', 'Q2', 'Q3'])
# Detect speeders (unusually fast completion)
flags = tv.tidy.detect_speeders(df, 'completion_time',
method='iqr')
# Check logical consistency
rules = [{
'name': 'age_check',
'condition': lambda row: row['age'] >= 18
}]
df = tv.tidy.check_logical_consistency(df, rules)
Visualization (tv.viz)
Single Choice Questions
# Basic bar chart
tv.viz.plot_single_choice(df, 'method')
# With customization
tv.viz.plot_single_choice(
df, 'method',
title='Preferred Method',
show_percentages=True,
sort_by='count', # or 'name'
color_palette='sequential'
)
Multiple Choice Questions
# First expand the data
df_exp = tv.tidy.expand_multiple_choice(df, 'colors')
color_cols = [c for c in df_exp.columns if c.startswith('colors_')]
# Plot multiple choice
tv.viz.plot_multiple_choice(
df_exp, color_cols,
title='Favorite Colors',
show_percentages=True,
sort_by='count'
)
Styling
# Set global style
tv.viz.set_survey_style(
style='default', # or 'minimal', 'presentation'
palette='categorical' # or 'sequential', 'Set2', etc.
)
# Get color palette
colors = tv.viz.get_palette('categorical', n_colors=5)
Examples
See the examples/ directory for complete workflows:
example_tidy.py- Data cleaning pipelineexample_viz.py- Visualization examples
Documentation
- Quick Start - Get up and running in minutes
- User Manual - Complete API reference, tutorials, and workflows
Requirements
- Python ≥ 3.8
- pandas ≥ 1.3.0
- numpy ≥ 1.20.0
- matplotlib ≥ 3.4.0
- seaborn ≥ 0.11.0
Development
# Run tests
pytest
# Format code
black src/ tests/
# Lint code
flake8 src/ tests/
# Build package
python -m build
Author
Pingfan Hu
- Website: https://pingfanhu.com
- GitHub: @pingfan-hu
- Email: pingfan0727@gmail.com
License
MIT License - see LICENSE file for details.
Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
Citation
@software{tidyviz2025,
title = {TidyViz: Survey Data Analysis for Python},
author = {Hu, Pingfan},
year = {2025},
url = {https://github.com/pingfan-hu/tidyviz}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tidyviz-0.1.1.tar.gz.
File metadata
- Download URL: tidyviz-0.1.1.tar.gz
- Upload date:
- Size: 14.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
507f99a69295a955151182fc6333f068e340c4cd65ebd2c26f3582874d02a7c7
|
|
| MD5 |
0102c2e9013bc3aa35437462858e2483
|
|
| BLAKE2b-256 |
428a6a6bb1a7696ad820df92f5003b76dc604c5f9882f294230a34f96df33d19
|
File details
Details for the file tidyviz-0.1.1-py3-none-any.whl.
File metadata
- Download URL: tidyviz-0.1.1-py3-none-any.whl
- Upload date:
- Size: 14.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59d8463a6f00cc6adfdaea07818623f46989ba7422c439d4d373c7745fd4d11d
|
|
| MD5 |
770a14809ff7ca25eef2e4b01285dac4
|
|
| BLAKE2b-256 |
94c616bc3b68118ab97ccafe75515f189992d3592edddffc1519fdde0965f127
|