A Python package for quick processing and transforming SPSS files

These details have not been verified by PyPI

Project description

TidySPSS

A Python package for quick processing, transforming, and managing SPSS (.sav) files with support for Excel and CSV inputs. This package is built on top of pyreadstat and pandas to give you flexible, production-ready template for processing and transforming data files into SPSS format with full metadata control.

Philosophy

"Make simple things simple, and complex things possible"

📄 Processing Flow

LOAD → TRANSFORM → CONFIGURE → SAVE

LOAD: Read file with metadata preservation
TRANSFORM: Apply any pandas operations directly
CONFIGURE: Set SPSS-specific options
SAVE: Output with all configurations applied

Features

📁 Multi-format support: Read from SPSS (.sav/.zsav), Excel (.xlsx/.xls), and CSV files
🔄 Comprehensive transformations: Reorder, rename, drop, and keep columns with ease
🏷️ Metadata management: Full support for SPSS labels, formats, measures, and display widths
🔧 Value replacement: Replace specific values across columns
📊 Column positioning: Advanced column reordering with range specifications
🔀 File merging: Combine multiple data files by stacking rows with source tracking
🌐 Encoding support: Automatic handling of multiple character encodings
🔧 Production-ready: Comprehensive logging and error handling

Installation

Install using pip:

pip install tidyspss

Or using uv:

uv add tidyspss

Quick Start

Basic Usage

from tidyspss import read_input_file, process_and_save

# Read a file (automatically detects format)
df, meta = read_input_file("data.sav")  # or .xlsx, .csv

# Process and save with transformations
df, meta = process_and_save(
    df=df,
    meta=meta,
    output_path="output.sav",
    user_variable_rename={"old_name": "new_name"},
    user_variable_drop=["unwanted_col1", "unwanted_col2"],
    user_column_labels={"Q1": "Question 1", "Q2": "Question 2"}
)

Merging Multiple Files

from tidyspss import add_cases, process_and_save

# Merge multiple files by stacking rows
files = ["wave1.sav", "wave2.xlsx", "wave3.csv"]
merged_df, merged_meta = add_cases(
    input_files=files,
    meta_priority=1,  # Use first file's metadata as base
    source_name="source_file"  # Column name for tracking source
)

# The merged dataframe will have a 'source_file' column 
# containing the filename each row came from

# Process and save the merged data
merged_df, merged_meta = process_and_save(
    df=merged_df,
    meta=merged_meta,
    output_path="merged_output.sav"
)

API Reference

Main Functions

`read_input_file(file_path)`

Reads a file into a pandas DataFrame with metadata.

Supports: .sav, .zsav, .xlsx, .xls, .csv
Returns: (DataFrame, metadata) tuple

`add_cases(input_files, meta_priority=1, source_name="mrgsrc")`

Merges multiple data files by stacking rows (concatenating).

Parameters:

input_files: List of file paths to merge (can be .sav, .zsav, .xlsx, .xls, or .csv)
meta_priority: Which file's metadata to use as base
- If int: 1-based index of the file in input_files list
- If str: exact filename that exists in input_files list
source_name: Column name for tracking source file of each record (default: "mrgsrc")

Returns:

(merged_df, merged_meta): Tuple of concatenated DataFrame with source tracking column and consolidated metadata

Example:

# Use first file's metadata
df, meta = add_cases(["data1.sav", "data2.xlsx"], meta_priority=1)

# Use specific file's metadata
df, meta = add_cases(["data1.sav", "data2.xlsx"], meta_priority="data2.xlsx")

# Custom source column name
df, meta = add_cases(files, source_name="wave_source")

`process_and_save(df, meta, output_path, **kwargs)`

Processes DataFrame with configurations and saves to SPSS format.

Parameters:

df: Input DataFrame
meta: Metadata from SPSS file (or None)
output_path: Path for output .sav file
user_column_position: Dict for column reordering
user_variable_drop: List of columns to drop
user_variable_keep: List of columns to keep (drops all others)
user_variable_rename: Dict for renaming columns
user_value_replacement: Dict for replacing values
user_column_labels: Dict of column labels
user_variable_value_labels: Dict of value labels
user_variable_format: Dict of variable formats
user_variable_measure: Dict of variable measures
user_variable_display_width: Dict of display widths
user_missing_ranges: Dict of missing value ranges
user_note: File note string
user_file_label: File label string
user_compress: Boolean for file compression
user_row_compress: Boolean for row compression

Requirements

Python ≥ 3.11
pandas ≥ 2.3.0
pyreadstat ≥ 1.3.0
openpyxl ≥ 3.0.0

License

MIT License - see LICENSE file for details.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Sep 17, 2025

0.1.1

Aug 27, 2025

0.1.0

Aug 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tidyspss-0.2.0.tar.gz (10.5 kB view details)

Uploaded Sep 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tidyspss-0.2.0-py3-none-any.whl (11.9 kB view details)

Uploaded Sep 17, 2025 Python 3

File details

Details for the file tidyspss-0.2.0.tar.gz.

File metadata

Download URL: tidyspss-0.2.0.tar.gz
Upload date: Sep 17, 2025
Size: 10.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.9

File hashes

Hashes for tidyspss-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`082fe8a2f0f2107343b8d5246e952a07919d914d2add0e6493078c96490cc537`
MD5	`a0d4d818eae0c02638717bee3dc87ff5`
BLAKE2b-256	`4ea104337aa25f0f49d1c0afcb6ddaeef6ce3dc665f7c9fa0e80c3d1f61626a5`

See more details on using hashes here.

File details

Details for the file tidyspss-0.2.0-py3-none-any.whl.

File metadata

Download URL: tidyspss-0.2.0-py3-none-any.whl
Upload date: Sep 17, 2025
Size: 11.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.9

File hashes

Hashes for tidyspss-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a0ca2e03b0a239ff29cccf624d461d04d6810e6c224fed4d544921f6a889839d`
MD5	`2313f3e69e89a3f51c28c2d29a245828`
BLAKE2b-256	`f33f7b68a2841dd4b471206e7b76dda1cae538c6117bd0054fd68618bd92114d`

See more details on using hashes here.

tidyspss 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

TidySPSS

Philosophy

📄 Processing Flow

Features

Installation

Quick Start

Basic Usage

Merging Multiple Files

API Reference

Main Functions

`read_input_file(file_path)`

`add_cases(input_files, meta_priority=1, source_name="mrgsrc")`

`process_and_save(df, meta, output_path, **kwargs)`

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes