A Python package for working with SPSS/SAV files with two-track architecture separating data and metadata operations

These details have not been verified by PyPI

Project links

Project description

ultrasav

⚡ An 'Ultra-powerful' Python package for preparing production-ready SPSS/SAV files using a two-track architecture that separates data and metadata operations.

"Specium Ray for your data!" - Transform SPSS files with the power of Ultra

💡 Motivation

ultrasav is built as a thoughtful wrapper around the excellent pyreadstat package. We're not here to reinvent the wheel for reading and writing SAV files - pyreadstat already does that brilliantly!

Instead, ultrasav provides additional transformation tools for tasks that are commonly done by folks who work with SAV files regularly:

🏷️ Rename variables - Change variable names in batch with clean methodology
🔄 Recode values - Transform codes across multiple variables with clean syntax
🏷️ Update labels - Batch update variable labels and value labels without losing track
📊 Reorganize columns - Move variables to specific positions for standardized layouts
📀 Merge files intelligently - Stack survey data while preserving all metadata
🎯 Handle missing values - Consistent missing value definitions across datasets
🦸 Inspect & report metadata - Generate datamaps and validation reports with metaman

🎯 Core Philosophy

ultrasav follows a simple but powerful principle: Data and Metadata are two independent layers that only come together at read/write time.

┌─────────────┐         ┌─────────────┐
│   DATA      │         │  METADATA   │
│  DataFrame  │         │   Labels    │
│  Operations │         │   Formats   │
└─────────────┘         └─────────────┘
      │                         │
      └────────┬────────────────┘
               │
               ▼
         ┌─────────────┐
         │  WRITE SAV  │
         └─────────────┘

The Common Problems

If you work with SPSS files in Python, you've probably asked yourself:

How do I bulk update variable labels and value labels?
How do I quickly relocate variables to ideal positions?
How do I merge datasets — and more specifically, how are the labels being merged?
How can I see a comprehensive datamap of my data?
Most importantly: How do I prepare a tidy SPSS file with clean labels and metadata that is production-ready?

ultrasav answers all of these.

The ultrasav Way

import ultrasav as ul

# Read → splits into two independent tracks
df, meta = ul.read_sav("survey.sav")

# Track 1 - Data: Transform data freely
data = ul.Data(df) # Wrap df into our Data class
df = data.move(first=['id']).rename({'Q1': 'satisfaction'}).replace({'satisfaction': {6: 99}}).to_native()

# Track 2 - Metadata: Update metadata independently
meta = ul.Metadata(meta) # Wrap meta into our Metadata class
meta.column_labels = {'satisfaction': 'Overall satisfaction'}
meta.variable_value_labels={'recommend': {0: 'No', 1: 'Yes'}}
 

# Convergence: Reunite at write time
ul.write_sav(df, meta, "clean_survey.sav")

The goal is to provide you with a clean and easy-to-understand way to transform your SPSS data that you can use in real production workflows with minimal tweaking.

🚀 DataFrame-Agnostic Design

One of ultrasav's superpowers is being dataframe-agnostic — it works seamlessly with both polars and pandas thanks to narwhals under the hood:

🐻‍❄️ Polars by default - Blazing fast performance out of the box
🐼 Pandas fully supported - Use output_format="pandas" when needed
🔄 Switch freely - Convert between pandas and polars anytime
🔧 Future-proof - Ready for whatever dataframe library comes next

Default output format: Polars — All operations return polars DataFrames by default for blazing-fast performance. Pandas is fully supported via the output_format="pandas" parameter.

import ultrasav as ul

# Polars by default
df_pl, meta = ul.read_sav("survey.sav", output_format="polars")

# Or explicitly request pandas
df_pd, meta = ul.read_sav("survey.sav", output_format="pandas")

# The Data class works with either
data = ul.Data(df_pl)  # Works with both Polars and pandas!

# Transform using ultrasav's consistent API
data = data.rename({"Q1": "satisfaction"}).replace({'satisfaction': {6: 99}})
df_native = data.to_native()  # Get back your polars DataFrame

Who Is This For?

📊 Market Researchers - Merge waves, standardize labels, prepare deliverables
🔬 Data Scientists - Clean survey data, prepare features, maintain metadata
🏭 Data Engineers - Build robust pipelines that preserve SPSS metadata
🎓 Academic Researchers - Manage longitudinal studies, harmonize datasets
📈 Anyone working with SPSS - If you use SAV files regularly, this is for you!

🚀 Installation

# Using uv
uv add ultrasav

# Or using pip
pip install ultrasav

📚 Quick Start

Basic Usage

import ultrasav as ul

# Read SPSS file - automatically splits into data and metadata
df, meta = ul.read_sav("survey.sav")
# Note: You can also use pyreadstat directly - our classes work with pyreadstat meta objects too

# Track 1: Process data independently
data = ul.Data(df)  # Wrap in Data class for transformations
data = data.move(first=["ID", "Date"])  # Reorder columns
data = data.rename({"Q1": "Satisfaction"})  # Rename columns
data = data.replace({"Satisfaction": {99: None}})  # Replace values
df = data.to_native()  # Back to native DataFrame

# Track 2: Process metadata independently
meta.column_labels = {"Satisfaction": "Customer Satisfaction Score"}
meta.variable_value_labels = {
    "Satisfaction": {1: "Very Dissatisfied", 5: "Very Satisfied"}
}
meta.variable_measure = {
    'Satisfaction': 'ordinal',
    'Gender': 'nominal',
    'Age': 'scale',
}

# Convergence: Write both tracks to SPSS
ul.write_sav(df, meta, "cleaned_survey.sav")

Merging Files

import ultrasav as ul

# Merge multiple files vertically with automatic metadata handling
df, meta = ul.add_cases([
    "wave1.sav",
    "wave2.sav", 
    "wave3.sav"
])

# Metadata is automatically preserved from top to bottom.
# A source-tracking column is automatically added to show each row's origin.
# Example: mrgsrc: ["wave1.sav", "wave2.sav", "wave3.sav"]

ul.write_sav(df, meta, "merged_output.sav")

Advanced Merging

import ultrasav as ul

# Use specific metadata template for all files
standard_meta = ul.Metadata()  # Create an empty meta object
standard_meta.column_labels = {"Q1": "Satisfaction", "Q2": "Loyalty"}
standard_meta.variable_value_labels = {
    "Satisfaction": {1: "Very Dissatisfied", 5: "Very Satisfied"}
}

data, meta = ul.add_cases(
    inputs=["file1.sav", "file2.sav", "file3.csv"],
    meta=[standard_meta],  # Apply this metadata to merged data
    source_col="mrgsrc",  # Auto append column 'mrgsrc' to track source files
    output_format="polars"  # Explicit format (polars is default)
)

Writing Back

# Read SPSS file
df, meta = ul.read_sav("huge_survey.sav")

# All ultrasav operations work the same
df = ul.Data(df).rename({"Q1": "satisfaction"}).drop(["unused_var"]).to_native()

# Efficient write-back
# Simply provide the 'meta' object; labels and formats are applied automatically.
# Compatible with both ultrasav and pyreadstat meta objects.
ul.write_sav(df, meta, "processed_data.sav")

🦸 Metaman: The Metadata Submodule

ultrasav includes metaman, a powerful submodule for metadata inspection, extraction, and reporting. All metaman functions are accessible directly from the top-level ul namespace.

Generate Validation Datamaps

Create comprehensive datamaps showing variable types, value distributions, and data quality metrics:

import ultrasav as ul

df, meta = ul.read_sav("survey.sav")

# Create a validation datamap
datamap = ul.make_datamap(df, meta)

# Export to beautifully formatted Excel
# This function supports polars only at the moment
ul.map_to_excel(datamap, "validation_report.xlsx")

# Use custom color schemes
ul.map_to_excel(
    datamap, 
    "validation_report.xlsx",
    alternating_group_formats=ul.get_color_scheme("pastel_blue")
)

The datamap includes:

Variable names and labels
Variable types (single-select, multi-select, numeric, text, date)
Value codes and labels
Value counts and percentages
Missing data flags
Missing value label detection

Note: Variable types are inferred from both SPSS data and metadata on a best-effort basis and may not always perfectly reflect the true underlying types.

Extract Metadata to Python Files

Save existing metadata (if any) from a sav file as importable Python dictionaries for reuse across projects:

import ultrasav as ul

df, meta = ul.read_sav("survey.sav")

# Extract metadata (labels) to in-memory python object
meta_dict = ul.get_meta(meta)

# Extract and save ALL metadata to a Python file
meta_dict = ul.get_meta(meta, include_all=True, output_path="survey_labels.py")

Create Labels from Excel Templates

Build label dictionaries from scratch using Excel templates - perfect for translating surveys or standardizing labels:

import ultrasav as ul

# Excel file with 'col_label' and 'value_label' sheets
col_labels, val_labels = ul.make_labels(
    input_path="label_template.xlsx",
    output_path="translated_labels.py"  # optional
)

Excel Structure:

Your Excel file should have two sheets:

Column Labels Sheet (default sheet name: "col_label"):

variable label

age Age of respondent

gender Gender

income Annual household income
Value Labels Sheet (default sheet name: "value_label"):

variable value label

gender 1 Male

gender 2 Female

income 1 Under $25k

income 2 $25k-50k

variable	label
age	Age of respondent
gender	Gender
income	Annual household income

variable	value	label
gender	1	Male
gender	2	Female
income	1	Under $25k
income	2	$25k-50k

📖 API Reference

Core Functions

`read_sav(filepath, output_format="polars")`

Read SPSS file and return separated data and metadata. This is a wrapper around pyreadstat.read_sav with some additional encoding handling

df, meta = ul.read_sav("survey.sav")

`write_sav(data, meta, filepath)`

Write data and metadata to SPSS file.

ul.write_sav(df, meta, "processed_data.sav")

`add_cases(inputs, meta=None, source_col="mrgsrc")`

Merge multiple files/dataframes vertically with metadata handling, return merged data and metadata.

df_merged, meta_merged = ul.add_cases(["wave1.sav","wave2.sav", "wave3.sav"])

Classes

`Data`

Handles all dataframe operations while maintaining compatibility with both Polars and pandas.

import ultrasav as ul

df, meta = ul.read_sav("survey.sav")  # Returns a Polars DataFrame and meta object

# Convert polars or pandas df into our ul.Data() class
data = ul.Data(df)

# Data Class Methods
# move - to relocate columns
data = data.move(
    first=['respondent_id'],
    last=['timestamp'],
    before={'age': 'gender'},  # place 'age' column before 'gender'
    after={'wave': ['age', 'gender', 'income']}  # place demographic columns after 'wave'
)

# rename - to rename columns
data = data.rename({"old": "new"})

# replace - to replace/recode values
data = data.replace({"col": {1: 100}})

# select - to select columns
data = data.select(['age', 'gender'])

# drop - to drop columns
data = data.drop(['id', 'language'])

# to_native - to return ul.Data(df) back to its native dataframe
df = data.to_native()  # Get back Polars/pandas DataFrame

# Optionally, use chaining for cleaner code
df = (
    ul.Data(df)
    .move(first=['respondent_id'])
    .rename({"old": "new"})
    .replace({"col": {1: 100}})
    .select(['age', 'gender'])
    .drop(['id', 'language'])
    .to_native() 
)

`Metadata`

Manages all SPSS metadata independently from data.

import ultrasav as ul

df, meta = ul.read_sav("survey.sav")

meta = ul.Metadata(meta)

# All updatable metadata

meta.column_labels = {"Q1": "Question 1"}
meta.variable_value_labels = {"Q1": {1: "Yes", 0: "No"}}
meta.variable_measure = {"age": "scale"}
meta.variable_format = {"age": "F3.0", "city_name": "A50"}
meta.variable_display_width = {"city_name": 50,}
meta.missing_ranges = {"Q1": [99], "Q2": [{"lo":998,"hi":999}]}
meta.notes = "Created on 2025-02-15"
meta.file_label = "My Survey 2025"

# Optionally, use '.update()' to update everything at once
meta = meta.update(
    column_labels = {"Q1": "Question 1"},
    variable_value_labels = {"Q1": {1: "Yes", 0: "No"}},
    variable_measure = {"age": "scale"},
    variable_format = {"age": "F3.0", "city_name": "A50"},
    ...
)

# You can update any writable metadata fields supported by pyreadstat.

Metadata Updating Logic

Original metadata is preserved and never destroyed
User updates overlay on top of originals
When you set meta.column_labels = {"Q1": "New Label"}:
- This updates Q1's column label if there is an existing column label within the original meta.column_labels
- If Q1 is not in the original metadata, then Q1's new label will simply be appended at the bottom of the meta.column_labels dict
- All other column labels remain unchanged
- Original metadata still exists underneath
- This update logic applies to all updatable metadata

Note on variable_value_labels Update Behavior:

When updating meta.variable_value_labels, the entire value-label dictionary for a variable is replaced, not merged.

# Original metadata
meta.variable_value_labels = {"Q1": {1: "Yes", 2: "No", 99: "Unsure"}}

# User update
meta.variable_value_labels = {"Q1": {1: "Yes", 0: "No"}}

# Result for Q1 becomes:
{"Q1": {1: "Yes", 0: "No"}}  # Previous values 2 and 99 are NOT preserved

This means:

Only the value-label pairs explicitly provided in the update are kept
The entire dictionary for that variable is replaced at once
Variable-level entries are preserved (e.g., "Q1" still exists), but value-level merging does not occur

This follows ultrasav's design principle: metadata updates overlay at the variable level — never partially merged — ensuring clean and intentional metadata after each update.

Critical Design Choice:

When you rename an existing column "Q1" to "Q1a" in data, the associated metadata does not automatically carry over
You must explicitly provide new metadata for the newly renamed column "Q1a"
No automatic tracking or mapping between old and new names

🦸 Metaman Functions

`make_datamap(df, meta, output_format=None)`

Create a validation datamap from data and metadata.

datamap = ul.make_datamap(df, meta)

`map_to_excel(df, file_path, **kwargs)`

Export datamap to formatted Excel with merged cells and alternating colors.

ul.map_to_excel(datamap, "report.xlsx") # Saves datamap to Excel
ul.map_to_excel(datamap, "report.xlsx", alternating_group_formats=ul.get_color_scheme("pastel_blue"))

`get_meta(meta, output_path=None, include_all=False)`

Extract metadata to a Python file or dictionary.

meta_dict = ul.get_meta(meta)  # Returns meta_dict in memory
ul.get_meta(meta, output_path="labels.py")  # Saves to file

`make_labels(input_path, output_path=None)`

Create label dictionaries from an Excel template.

col_labels, val_labels = ul.make_labels("template.xlsx") # Returns label dicts in memory
col_labels, val_labels = ul.make_labels("template.xlsx", "labels.py") # Saves to file

`detect_variable_type(df, meta, column)`

Detect variable type (single-select, multi-select, numeric, text, date).

var_type = ul.detect_variable_type(df, meta, "Q1")

`get_color_scheme(name)`

Get a color scheme for Excel formatting.

scheme = ul.get_color_scheme("pastel_blue")
# Options: "classic_grey", "pastel_green", "pastel_blue", "pastel_purple", "pastel_indigo"

⚡ Why "ultrasav"?

The name combines "Ultra" (inspired by Ultraman) with "SAV" (SPSS file format), representing the ultra-powerful transformation capabilities of this package. Just like Ultraman's Specium Ray, ultrasav splits and recombines data with precision and power!

And metaman? He's the metadata superhero who swoops in to inspect, validate, and report on your SPSS data! 🦸

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

Built on top of pyreadstat for SPSS file handling
Uses narwhals for dataframe compatibility
Excel export powered by xlsxwriter

📬 Contact

Author: Albert Li

📄 Version History

0.1.0: Initial release with two-track architecture for data/metadata separation and metaman submodule for metadata inspection & reporting

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.14

Feb 2, 2026

0.2.13

Jan 25, 2026

0.2.12

Jan 25, 2026

0.2.11

Jan 23, 2026

0.2.10

Jan 23, 2026

0.2.9

Jan 22, 2026

0.2.8

Jan 22, 2026

0.2.7

Jan 22, 2026

0.2.6

Jan 20, 2026

0.2.5

Jan 20, 2026

0.2.4

Jan 20, 2026

0.2.3

Jan 20, 2026

0.2.2

Jan 19, 2026

0.2.1

Jan 14, 2026

0.1.10

Jan 7, 2026

0.1.9

Dec 19, 2025

0.1.8

Dec 19, 2025

0.1.7

Dec 3, 2025

0.1.6

Dec 3, 2025

0.1.5

Dec 3, 2025

0.1.4

Dec 3, 2025

0.1.3

Dec 3, 2025

This version

0.1.2

Dec 3, 2025

0.1.1

Dec 3, 2025

0.1.0

Dec 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ultrasav-0.1.2.tar.gz (60.3 kB view details)

Uploaded Dec 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ultrasav-0.1.2-py3-none-any.whl (71.0 kB view details)

Uploaded Dec 3, 2025 Python 3

File details

Details for the file ultrasav-0.1.2.tar.gz.

File metadata

Download URL: ultrasav-0.1.2.tar.gz
Upload date: Dec 3, 2025
Size: 60.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.5

File hashes

Hashes for ultrasav-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`a4666e0caec72936a9dcbf4ca68daa8666859a7440b901193c05b7e6965b877e`
MD5	`37b0ef8f388d12d9a0bd58499221d191`
BLAKE2b-256	`e6e27871e00f41114f723c4a29f5d6b994e8da26cd69219d87703d9adf42a3ba`

See more details on using hashes here.

File details

Details for the file ultrasav-0.1.2-py3-none-any.whl.

File metadata

Download URL: ultrasav-0.1.2-py3-none-any.whl
Upload date: Dec 3, 2025
Size: 71.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.5

File hashes

Hashes for ultrasav-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b1398d95178539c0dddeb03b56ffbef5d94ab98559cac4a59234c85fa9face35`
MD5	`55d7810aa1c76ca631f6b2c248f75ecd`
BLAKE2b-256	`679fdd530354adbbb8dadb370a2f876deea9146fb09369c597b9586da056df8c`

See more details on using hashes here.

ultrasav 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

ultrasav

💡 Motivation

🎯 Core Philosophy

The Common Problems

The ultrasav Way

🚀 DataFrame-Agnostic Design

Who Is This For?

🚀 Installation

📚 Quick Start

Basic Usage

Merging Files

Advanced Merging

Writing Back

🦸 Metaman: The Metadata Submodule

Generate Validation Datamaps

Extract Metadata to Python Files

Create Labels from Excel Templates

📖 API Reference

Core Functions

read_sav(filepath, output_format="polars")

write_sav(data, meta, filepath)

add_cases(inputs, meta=None, source_col="mrgsrc")

Classes

Data

Metadata

🦸 Metaman Functions

make_datamap(df, meta, output_format=None)

map_to_excel(df, file_path, **kwargs)

get_meta(meta, output_path=None, include_all=False)

make_labels(input_path, output_path=None)

detect_variable_type(df, meta, column)

get_color_scheme(name)

⚡ Why "ultrasav"?

🤝 Contributing

📄 License

🙏 Acknowledgments

📬 Contact

📄 Version History

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`read_sav(filepath, output_format="polars")`

`write_sav(data, meta, filepath)`

`add_cases(inputs, meta=None, source_col="mrgsrc")`

`Data`

`Metadata`

`make_datamap(df, meta, output_format=None)`

`map_to_excel(df, file_path, **kwargs)`

`get_meta(meta, output_path=None, include_all=False)`

`make_labels(input_path, output_path=None)`

`detect_variable_type(df, meta, column)`

`get_color_scheme(name)`