A Python package for quick processing and transforming SPSS files
Project description
TidySPSS
A Python package for quick processing, transforming, and managing SPSS (.sav) files with support for Excel and CSV inputs. This package is built on top of pyreadstat and pandas to give you flexible, production-ready template for processing and transforming data files into SPSS format with full metadata control.
Philosophy
"Make simple things simple, and complex things possible"
📄 Processing Flow
LOAD → TRANSFORM → CONFIGURE → SAVE
- LOAD: Read file with metadata preservation
- TRANSFORM: Apply any pandas operations directly
- CONFIGURE: Set SPSS-specific options
- SAVE: Output with all configurations applied
Features
- 📁 Multi-format support: Read from SPSS (.sav/.zsav), Excel (.xlsx/.xls), and CSV files
- 🔄 Comprehensive transformations: Reorder, rename, drop, and keep columns with ease
- 🏷️ Metadata management: Full support for SPSS labels, formats, measures, and display widths
- 🔧 Value replacement: Replace specific values across columns
- 📊 Column positioning: Advanced column reordering with range specifications
- 🔀 File merging: Combine multiple data files by stacking rows with source tracking
- 🌐 Encoding support: Automatic handling of multiple character encodings
- 🔧 Production-ready: Comprehensive logging and error handling
Installation
Install using pip:
pip install tidyspss
Or using uv:
uv add tidyspss
Quick Start
Basic Usage
from tidyspss import read_input_file, process_and_save
# Read a file (automatically detects format)
df, meta = read_input_file("data.sav") # or .xlsx, .csv
# Process and save with transformations
df, meta = process_and_save(
df=df,
meta=meta,
output_path="output.sav",
user_variable_rename={"old_name": "new_name"},
user_variable_drop=["unwanted_col1", "unwanted_col2"],
user_column_labels={"Q1": "Question 1", "Q2": "Question 2"}
)
Merging Multiple Files
from tidyspss import add_cases, process_and_save
# Merge multiple files by stacking rows
files = ["wave1.sav", "wave2.xlsx", "wave3.csv"]
merged_df, merged_meta = add_cases(
input_files=files,
meta_priority=1, # Use first file's metadata as base
source_name="source_file" # Column name for tracking source
)
# The merged dataframe will have a 'source_file' column
# containing the filename each row came from
# Process and save the merged data
merged_df, merged_meta = process_and_save(
df=merged_df,
meta=merged_meta,
output_path="merged_output.sav"
)
API Reference
Main Functions
read_input_file(file_path)
Reads a file into a pandas DataFrame with metadata.
- Supports: .sav, .zsav, .xlsx, .xls, .csv
- Returns:
(DataFrame, metadata)tuple
add_cases(input_files, meta_priority=1, source_name="mrgsrc")
Merges multiple data files by stacking rows (concatenating).
Parameters:
input_files: List of file paths to merge (can be .sav, .zsav, .xlsx, .xls, or .csv)meta_priority: Which file's metadata to use as base- If int: 1-based index of the file in input_files list
- If str: exact filename that exists in input_files list
source_name: Column name for tracking source file of each record (default: "mrgsrc")
Returns:
(merged_df, merged_meta): Tuple of concatenated DataFrame with source tracking column and consolidated metadata
Example:
# Use first file's metadata
df, meta = add_cases(["data1.sav", "data2.xlsx"], meta_priority=1)
# Use specific file's metadata
df, meta = add_cases(["data1.sav", "data2.xlsx"], meta_priority="data2.xlsx")
# Custom source column name
df, meta = add_cases(files, source_name="wave_source")
process_and_save(df, meta, output_path, **kwargs)
Processes DataFrame with configurations and saves to SPSS format.
Parameters:
df: Input DataFramemeta: Metadata from SPSS file (or None)output_path: Path for output .sav fileuser_column_position: Dict for column reorderinguser_variable_drop: List of columns to dropuser_variable_keep: List of columns to keep (drops all others)user_variable_rename: Dict for renaming columnsuser_value_replacement: Dict for replacing valuesuser_column_labels: Dict of column labelsuser_variable_value_labels: Dict of value labelsuser_variable_format: Dict of variable formatsuser_variable_measure: Dict of variable measuresuser_variable_display_width: Dict of display widthsuser_missing_ranges: Dict of missing value rangesuser_note: File note stringuser_file_label: File label stringuser_compress: Boolean for file compressionuser_row_compress: Boolean for row compression
Requirements
- Python ≥ 3.11
- pandas ≥ 2.3.0
- pyreadstat ≥ 1.3.0
- openpyxl ≥ 3.0.0
License
MIT License - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tidyspss-0.2.0.tar.gz.
File metadata
- Download URL: tidyspss-0.2.0.tar.gz
- Upload date:
- Size: 10.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
082fe8a2f0f2107343b8d5246e952a07919d914d2add0e6493078c96490cc537
|
|
| MD5 |
a0d4d818eae0c02638717bee3dc87ff5
|
|
| BLAKE2b-256 |
4ea104337aa25f0f49d1c0afcb6ddaeef6ce3dc665f7c9fa0e80c3d1f61626a5
|
File details
Details for the file tidyspss-0.2.0-py3-none-any.whl.
File metadata
- Download URL: tidyspss-0.2.0-py3-none-any.whl
- Upload date:
- Size: 11.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0ca2e03b0a239ff29cccf624d461d04d6810e6c224fed4d544921f6a889839d
|
|
| MD5 |
2313f3e69e89a3f51c28c2d29a245828
|
|
| BLAKE2b-256 |
f33f7b68a2841dd4b471206e7b76dda1cae538c6117bd0054fd68618bd92114d
|