Skip to main content

A Python package for tidying and visualizing survey data

Project description

TidyViz

PyPI version Python 3.8+ License: MIT Downloads

A Python package for survey data cleaning and visualization.

Features

Data Cleaning

  • Expand/collapse multiple choice responses
  • Validate response ranges
  • Detect missing data patterns
  • Flag straight-lining and speeders
  • Check logical consistency

Visualization

  • Single choice bar charts
  • Multiple choice bar charts
  • Custom color palettes
  • Survey-appropriate styling

Installation

Install from PyPI:

pip install tidyviz

Development installation:

git clone https://github.com/pingfan-hu/tidyviz.git
cd tidyviz
pip install -e ".[dev]"

Quick Start

import pandas as pd
import tidyviz as tv

# Load survey data
df = pd.read_csv('survey.csv')

# Clean: Expand multiple choice column
df_expanded = tv.tidy.expand_multiple_choice(df, 'colors')

# Validate: Check response ranges
df_clean, invalid = tv.tidy.check_response_range(
    df, 'satisfaction', min_val=1, max_val=5
)

# Visualize: Plot single choice responses
tv.viz.set_survey_style(palette='categorical')
tv.viz.plot_single_choice(df, 'contact_method',
                          title='Preferred Contact',
                          show_percentages=True)

Documentation

Data Cleaning (tv.tidy)

Multiple Choice Handling

# Expand comma-separated values to binary columns
df_exp = tv.tidy.expand_multiple_choice(df, 'colors', sep=',')
# Creates: colors_Red, colors_Blue, colors_Green...

# Collapse binary columns back to comma-separated
df_col = tv.tidy.collapse_multiple_choice(df_exp, 'colors')

Response Validation

# Flag invalid responses
df, invalid_mask = tv.tidy.check_response_range(
    df, 'rating', min_val=1, max_val=5,
    handle_invalid='flag'
)

# Remove invalid responses
df_clean, _ = tv.tidy.check_response_range(
    df, 'rating', min_val=1, max_val=5,
    handle_invalid='remove'
)

Data Quality Checks

# Detect missing data patterns
info = tv.tidy.detect_missing_patterns(df)
# Returns: complete_rows, rows_with_missing, missing_rates

# Flag straight-liners (same response across questions)
flags = tv.tidy.flag_straight_liners(df, ['Q1', 'Q2', 'Q3'])

# Detect speeders (unusually fast completion)
flags = tv.tidy.detect_speeders(df, 'completion_time',
                                 method='iqr')

# Check logical consistency
rules = [{
    'name': 'age_check',
    'condition': lambda row: row['age'] >= 18
}]
df = tv.tidy.check_logical_consistency(df, rules)

Visualization (tv.viz)

Single Choice Questions

# Basic bar chart
tv.viz.plot_single_choice(df, 'method')

# With customization
tv.viz.plot_single_choice(
    df, 'method',
    title='Preferred Method',
    show_percentages=True,
    sort_by='count',  # or 'name'
    color_palette='sequential'
)

Multiple Choice Questions

# First expand the data
df_exp = tv.tidy.expand_multiple_choice(df, 'colors')
color_cols = [c for c in df_exp.columns if c.startswith('colors_')]

# Plot multiple choice
tv.viz.plot_multiple_choice(
    df_exp, color_cols,
    title='Favorite Colors',
    show_percentages=True,
    sort_by='count'
)

Styling

# Set global style
tv.viz.set_survey_style(
    style='default',  # or 'minimal', 'presentation'
    palette='categorical'  # or 'sequential', 'Set2', etc.
)

# Get color palette
colors = tv.viz.get_palette('categorical', n_colors=5)

Examples

See the examples/ directory for complete workflows:

Documentation

📚 Documentation

Requirements

  • Python ≥ 3.8
  • pandas ≥ 1.3.0
  • numpy ≥ 1.20.0
  • matplotlib ≥ 3.4.0
  • seaborn ≥ 0.11.0

Development

# Run tests
pytest

# Format code
black src/ tests/

# Lint code
flake8 src/ tests/

# Build package
python -m build

Author

Pingfan Hu

License

MIT License - see LICENSE file for details.

Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

Citation

@software{tidyviz2025,
  title = {TidyViz: Survey Data Analysis for Python},
  author = {Hu, Pingfan},
  year = {2025},
  url = {https://github.com/pingfan-hu/tidyviz}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tidyviz-0.1.1.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tidyviz-0.1.1-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file tidyviz-0.1.1.tar.gz.

File metadata

  • Download URL: tidyviz-0.1.1.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for tidyviz-0.1.1.tar.gz
Algorithm Hash digest
SHA256 507f99a69295a955151182fc6333f068e340c4cd65ebd2c26f3582874d02a7c7
MD5 0102c2e9013bc3aa35437462858e2483
BLAKE2b-256 428a6a6bb1a7696ad820df92f5003b76dc604c5f9882f294230a34f96df33d19

See more details on using hashes here.

File details

Details for the file tidyviz-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: tidyviz-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 14.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for tidyviz-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 59d8463a6f00cc6adfdaea07818623f46989ba7422c439d4d373c7745fd4d11d
MD5 770a14809ff7ca25eef2e4b01285dac4
BLAKE2b-256 94c616bc3b68118ab97ccafe75515f189992d3592edddffc1519fdde0965f127

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page