Read and write SPSS (.sav and .zsav) files to/from pandas dataframes
Project description
Overview
pyspssio is a python package for reading and writing SPSS (.sav and .zsav) files to/from pandas dataframes.
This package uses the I/O Module for SPSS Statistics v27 available at https://www.ibm.com/.
WARNING: This is an early release with limited testing. Use with caution.
Links
Motivation
Main reason for creating this package is to fill gaps by other similar packages.
savReaderWriter
- doesn't support python > 3.5
- not particularly user friendly
pyreadstat
- doesn't read or write multi response set definitions
- datetime conversion quirks
- issues reading/writing long string variables (https://github.com/Roche/pyreadstat/issues/119)
pyspssio
supports recent versions of python and can read/write most SPSS file metadata properties. The usecols
argument when reading files also accepts a callable for more flexible variable selection.
Basic Usage
Installation
pip install pyspssio
Import
import pyspssio
Reading
Read data and metadata
df, meta = pyspssio.read_sav("spss_file.sav")
Read metadata only
meta = pyspssio.read_metadata("spss_file.sav")
Read data in chunks of chunksize
(number of rows/records)
for df in pyspssio.read_sav("spss_file.sav", chunksize=1000):
# do something
Note: metadata is not returned when reading in chunks
Writing
Write dataframe to file.
pyspssio.write_sav("spss_file.sav", df)
Appending
Append existing SPSS file with new records.
pyspssio.write_sav("spss_file.sav", df)
Note: Cannot modify metadata when appending new records. Be careful with strings that might be longer than the already defined width as they may be automatically truncated without warning.
Other Notes
Date/Time Variables
Date and datetime variables - These are converted to/from full datetime objects, even for formats like DATE, QYR, and WKYR which don't display a time component. Users can opt to use Pandas' .dt
accessor to extract specific components or force a specific accuracy (e.g., minute, day, hour) after reading the data (ex. .dt.floor
). The var_formats
and/or var_formats_tuple
metadata attributes can be used to see the original SPSS formats.
Time variables - These are converted to/from timestamp objects.
Python/Pandas stores datetimes in nanseconds while SPSS stores them in seconds. Due to conversions that must take place, there may be some small (ms) discrepancies between an original dataframe used to write an SPSS file and a dataframe read back from the same SPSS file.
I/O Module Procedures
List of available I/O module procedures and class for which they fall under. See official documentation for details on each one.
Some of these procedures are implemented as hidden methods referenced within a more generalized function/property. For example, instead of calling spssSetVarLabel
manually for each variable, users should assign all variable labels at once by setting self.var_labels = {var1: label1, var2: label2, ...}
.
All I/O module procedures can be accessed directly with self.spssio.[procedure]
.
SPSSFile
spssOpenRead
spssCloseRead
spssOpenWrite
spssCloseWrite
spssOpenAppend
spssCloseAppend
spssHostSysmisVal
spssLowHighVal
spssSetLocale
spssGetInterfaceEncoding
spssSetInterfaceEncoding
spssGetFileEncoding
spssIsCompatibleEncoding
spssGetCompression
spssSetCompression
spssGetReleaseInfo
spssGetNumberofCases
spssGetNumberofVariables
Header
spssGetFileAttributes
spssSetFileAttributes
spssGetVarNames
spssSetVarName
spssGetVarHandle
spssGetVarPrintFormat
spssSetVarPrintFormat
spssSetVarWriteFormat
spssGetVarMeasureLevel
spssSetVarMeasureLevel
spssGetVarAlignment
spssSetVarAlignment
spssGetVarColumnWidth
spssSetVarColumnWidth
spssGetVarLabelLong
spssSetVarLabel
spssGetVarRole
spssSetVarRole
spssGetVarCValueLabels
spssSetVarCValueLabel
spssGetVarNValueLabels
spssSetVarNValueLabel
spssGetVarCMissingValues
spssSetVarCMissingValues
spssGetVarNMissingValues
spssSetVarNMissingValues
spssGetMultRespCount
spssGetMultRespDefs
spssGetMultRespDefsEx - replaces spssGetMultRespDefs
spssSetMultRespDefs
spssAddMultRespDefExt
spssGetCaseSize
spssGetCaseWeightVar
spssSetCaseWeightVar
spssGetVarAttributes
spssSetVarAttributes
spssGetVarCompatName
spssGetVariableSets
spssSetVariableSets
spssCommitHeader
Reader
spssSeekNextCase
spssWholeCaseIn
Writer
spssWholeCaseOut
spssSetValueChar
spssSetValueNumeric
spssCommitCaseRecord
Not Implemented (yet)
spssAddMultRespDefC
spssAddMultRespDefN
spssGetMultRespDefByIndex
spssConvertDate - manual conversion instead
spssConvertSPSSDate - manual conversion instead
spssConvertSPSSTime - manual conversion instead
spssConvertTime - manual conversion instead
spssCopyDocuments
spssGetDEWFirst
spssGetDEWGUID
spssGetDewInfo
spssGetDEWNext
spssSetDEWFirst
spssSetDEWGUID
spssSetDEWNext
spssGetDateVariables
spssGetEstimatedNofCases
spssGetFileAttribute - uses spssGetFileAttributes instead
spssGetFileCodePage
spssGetIdString
spssGetSystemString
spssGetTextInfo
spssGetTimeStamp
spssGetValueChar - uses spssWholeCaseIn instead
spssGetValueNumeric - uses spssWholeCaseIn instead
spssAddVarAttribute - uses spssSetVarAttributes instead
spssGetVarCValueLabel - uses spssGetVarCValueLabels instead
spssGetVarCValueLabelLong - uses spssGetVarCValueLabels instead
spssGetVarInfo
spssGetVarLabel - uses spssGetVarLabelLong instead
spssGetVarNValueLabel - uses spssGetVarNValueLabels instead
spssGetVarNValueLabelLong - uses spssGetVarNValueLabels instead
spssGetVarWriteFormat - uses spssGetVarPrintFormat instead (print/write formats tied together)
spssOpenAppendEx
spssOpenReadEx
spssOpenWriteCopy
spssOpenWriteCopyEx
spssOpenWriteCopyExDict
spssOpenWriteCopyExFile
spssOpenWriteEx
spssQueryType7
spssReadCaseRecord - uses spssWholeCaseIn instead
spssSetDateVariables
spssSetIdString
spssSetTempDir
spssSetTextInfo
spssSetVarCValueLabels - uses spssSetVarCValueLabel instead
spssSetVarNValueLabels - uses spssSetVarNValueLabel instead
spssSysmisVal - uses spssHostSysmisVal instead
spssValidateVarname
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pyspssio-0.4.3.tar.gz
.
File metadata
- Download URL: pyspssio-0.4.3.tar.gz
- Upload date:
- Size: 52.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec9ef8f529b217063222779a63246a8ab21a96c147c5c9ba4162ee4a4d8ea8ae |
|
MD5 | 4f34ccaaee8d2dfe0e786372d1b5ca6c |
|
BLAKE2b-256 | 9e3a0e0251d6fb2b83bff4bffc301b5f2b473bfd6a9f17dba9b627a00604328c |
File details
Details for the file pyspssio-0.4.3-py3-none-any.whl
.
File metadata
- Download URL: pyspssio-0.4.3-py3-none-any.whl
- Upload date:
- Size: 52.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9799e6d7ad3326c30c2c6dbf9937ce51615c3fcb84529b5d0812b347773e07c9 |
|
MD5 | bff55995bad26cb64314b8dea7d5cfe7 |
|
BLAKE2b-256 | 6eab19cd644e84bff27d33b9d9666e9073a6bbbe2c3894802335b8aa0ae74cca |