Skip to main content

A Python package for tidying and visualizing survey data

Project description

TidyViz

Python 3.8+ License: MIT

A Python package for survey data cleaning and visualization.

Features

Data Cleaning

  • Expand/collapse multiple choice responses
  • Validate response ranges
  • Detect missing data patterns
  • Flag straight-lining and speeders
  • Check logical consistency

Visualization

  • Single choice bar charts
  • Multiple choice bar charts
  • Custom color palettes
  • Survey-appropriate styling

Installation

pip install tidyviz

Development installation:

git clone https://github.com/pingfan-hu/tidyviz.git
cd tidyviz
pip install -e ".[dev]"

Quick Start

import pandas as pd
import tidyviz as tv

# Load survey data
df = pd.read_csv('survey.csv')

# Clean: Expand multiple choice column
df_expanded = tv.tidy.expand_multiple_choice(df, 'colors')

# Validate: Check response ranges
df_clean, invalid = tv.tidy.check_response_range(
    df, 'satisfaction', min_val=1, max_val=5
)

# Visualize: Plot single choice responses
tv.viz.set_survey_style(palette='categorical')
tv.viz.plot_single_choice(df, 'contact_method',
                          title='Preferred Contact',
                          show_percentages=True)

Documentation

Data Cleaning (tv.tidy)

Multiple Choice Handling

# Expand comma-separated values to binary columns
df_exp = tv.tidy.expand_multiple_choice(df, 'colors', sep=',')
# Creates: colors_Red, colors_Blue, colors_Green...

# Collapse binary columns back to comma-separated
df_col = tv.tidy.collapse_multiple_choice(df_exp, 'colors')

Response Validation

# Flag invalid responses
df, invalid_mask = tv.tidy.check_response_range(
    df, 'rating', min_val=1, max_val=5,
    handle_invalid='flag'
)

# Remove invalid responses
df_clean, _ = tv.tidy.check_response_range(
    df, 'rating', min_val=1, max_val=5,
    handle_invalid='remove'
)

Data Quality Checks

# Detect missing data patterns
info = tv.tidy.detect_missing_patterns(df)
# Returns: complete_rows, rows_with_missing, missing_rates

# Flag straight-liners (same response across questions)
flags = tv.tidy.flag_straight_liners(df, ['Q1', 'Q2', 'Q3'])

# Detect speeders (unusually fast completion)
flags = tv.tidy.detect_speeders(df, 'completion_time',
                                 method='iqr')

# Check logical consistency
rules = [{
    'name': 'age_check',
    'condition': lambda row: row['age'] >= 18
}]
df = tv.tidy.check_logical_consistency(df, rules)

Visualization (tv.viz)

Single Choice Questions

# Basic bar chart
tv.viz.plot_single_choice(df, 'method')

# With customization
tv.viz.plot_single_choice(
    df, 'method',
    title='Preferred Method',
    show_percentages=True,
    sort_by='count',  # or 'name'
    color_palette='sequential'
)

Multiple Choice Questions

# First expand the data
df_exp = tv.tidy.expand_multiple_choice(df, 'colors')
color_cols = [c for c in df_exp.columns if c.startswith('colors_')]

# Plot multiple choice
tv.viz.plot_multiple_choice(
    df_exp, color_cols,
    title='Favorite Colors',
    show_percentages=True,
    sort_by='count'
)

Styling

# Set global style
tv.viz.set_survey_style(
    style='default',  # or 'minimal', 'presentation'
    palette='categorical'  # or 'sequential', 'Set2', etc.
)

# Get color palette
colors = tv.viz.get_palette('categorical', n_colors=5)

Examples

See the examples/ directory for complete workflows:

  • example_tidy.py - Data cleaning pipeline
  • example_viz.py - Visualization examples

API Reference

tidyviz.tidy

Function Description
expand_multiple_choice() Convert comma-separated values to binary columns
collapse_multiple_choice() Convert binary columns to comma-separated values
check_response_range() Validate responses within expected range
detect_missing_patterns() Analyze missing data patterns
flag_straight_liners() Detect identical responses across questions
detect_speeders() Identify unusually fast completion times
check_logical_consistency() Validate custom logical rules

tidyviz.viz

Function Description
plot_single_choice() Bar chart for single-choice questions
plot_multiple_choice() Bar chart for multiple-choice questions
set_survey_style() Apply survey-appropriate styling
get_palette() Get color palette for visualizations

Requirements

  • Python ≥ 3.8
  • pandas ≥ 1.3.0
  • numpy ≥ 1.20.0
  • matplotlib ≥ 3.4.0
  • seaborn ≥ 0.11.0

Development

# Run tests
pytest

# Format code
black src/ tests/

# Lint code
flake8 src/ tests/

# Build package
python -m build

Author

Pingfan Hu

License

MIT License - see LICENSE file for details.

Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

Citation

@software{tidyviz2025,
  title = {TidyViz: Survey Data Analysis for Python},
  author = {Hu, Pingfan},
  year = {2025},
  url = {https://github.com/pingfan-hu/tidyviz}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tidyviz-0.1.0.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tidyviz-0.1.0-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file tidyviz-0.1.0.tar.gz.

File metadata

  • Download URL: tidyviz-0.1.0.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for tidyviz-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ca87813d1c27fb2c0ed3cf3dccdfde43c264c155a598ddaaccd7829908a0dae2
MD5 d6b4789e93a421c4a9a9bbd38f588665
BLAKE2b-256 0caace6435204b3d4a911fa759edfd7dac1065f379d18f58edd6fa45efa2ec3a

See more details on using hashes here.

File details

Details for the file tidyviz-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tidyviz-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for tidyviz-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 49bee4b9620391061be4866275cc622f221e1b07bddd82f0fe3933796e59d9c1
MD5 d3de7e5ca82d44c7411dfcfb033f41ad
BLAKE2b-256 8245d299157471694ef0599ad93558e3acf75a38b9ee3f704bd8a44dcfd681db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page