xport·PyPI

SAS XPORT file reader

These details have not been verified by PyPI

Project links

Homepage

Project description

Read and write SAS Transport files (*.xpt).

SAS uses a handful of archaic file formats: XPORT/XPT, CPORT, SAS7BDAT. If someone publishes their data in one of those formats, this Python package will help you convert the data into a more useful format. If someone, like the FDA, asks you for an XPT file, this package can write it for you.

What’s it for?

XPORT is the binary file format used by a bunch of United States government agencies for publishing data sets. It made a lot of sense if you were trying to read data files on your IBM mainframe back in 1988.

The official SAS specification for XPORT is relatively straightforward. The hardest part is converting IBM-format floating point to IEEE-format, which the specification explains in detail.

There was an update to the XPT specification for SAS v8 and above. This module has not yet been updated to work with the new version. However, if you’re using SAS v8+, you’re probably not using XPT format. The changes to the format appear to be trivial changes to the metadata, but this module’s current error-checking will raise a ValueError. If you’d like an update for v8, please let me know by submitting an issue.

Installation

This project requires Python v3.7+. Grab the latest stable version from PyPI.

$ python -m pip install --upgrade xport

Reading XPT

This module follows the common pattern of providing load and loads functions for reading data from a SAS file format.

import xport.v56

with open('example.xpt', 'rb') as f:
    library = xport.v56.load(f)

The XPT decoders, xport.load and xport.loads, return a xport.Library, which is a mapping (dict-like) of xport.Dataset``s. The ``xport.Dataset` is a subclass of pandas.DataFrame with SAS metadata attributes (name, label, etc.). The columns of a xport.Dataset are xport.Variable types, which are subclasses of pandas.Series with SAS metadata (name, label, format, etc.).

If you’re not familiar with Pandas’s dataframes, it’s easy to think of them as a dictionary of columns, mapping variable names to variable data.

The SAS Transport (XPORT) format only supports two kinds of data. Each value is either numeric or character, so xport.load decodes the values as either str or float.

Note that since XPT files are in an unusual binary format, you should open them using mode 'rb'.

You can also use the xport module as a command-line tool to convert an XPT file to CSV (comma-separated values) file. The xport executable is a friendly alias for python -m xport. Caution: if this command-line does not work with the lastest version, it should be working with version 2.0.2. To get this version, we can either download the files from this link or simply type the following command line your bash terminal: pip install xport==2.0.2.

$ xport example.xpt > example.csv

Writing XPT

The xport package follows the common pattern of providing dump and dumps functions for writing data to a SAS file format.

import xport
import xport.v56

ds = xport.Dataset()
with open('example.xpt', 'wb') as f:
    xport.v56.dump(ds, f)

Because the xport.Dataset is an extension of pandas.DataFrame, you can create datasets in a variety of ways, converting easily from a dataframe to a dataset.

import pandas as pd
import xport
import xport.v56

df = pandas.DataFrame({'NUMBERS': [1, 2], 'TEXT': ['a', 'b']})
ds = xport.Dataset(df, name='MAX8CHRS', label='Up to 40!')
with open('example.xpt', 'wb') as f:
    xport.v56.dump(ds, f)

SAS Transport v5 restricts variable names to 8 characters (with a strange preference for uppercase) and labels to 40 characters. If you want the relative comfort of SAS Transport v8’s limit of 246 characters, please make an enhancement request.

It’s likely that most people will be using Pandas dataframes for the bulk of their analysis work, and will want to convert to XPT at the very end of their process.

import pandas as pd
import xport
import xport.v56

df = pd.DataFrame({
    'alpha': [10, 20, 30],
    'beta': ['x', 'y', 'z'],
})

...  # Analysis work ...

ds = xport.Dataset(df, name='DATA', label='Wonderful data')

# SAS variable names are limited to 8 characters.  As with Pandas
# dataframes, you must change the name on the dataset rather than
# the column directly.
ds = ds.rename(columns={k: k.upper()[:8] for k in ds})

# Other SAS metadata can be set on the columns themselves.
for k, v in ds.items():
    v.label = k.title()
    if v.dtype == 'object':
        v.format = '$CHAR20.'
    else:
        v.format = '10.2'

# Libraries can have multiple datasets.
library = xport.Library({'DATA': ds})

with open('example.xpt', 'wb') as f:
    xport.v56.dump(library, f)

Feature requests

I’m happy to fix bugs, improve the interface, or make the module faster. Just submit an issue and I’ll take a look. If you work for a corporation or well-funded non-profit, please consider a sponsorship.

Thanks

Current and past sponsors include:

Contributing

This project is configured to be developed in a Conda environment.

$ git clone git@github.com:selik/xport.git
$ cd xport
$ make install          # Install into a Conda environment
$ conda activate xport  # Activate the Conda environment
$ make install-html     # Build the docs website

Authors

Original version by Jack Cushman, 2012.

Major revisions by Michael Selik, 2016 and 2020.

Minor revisions by Alfred Chan, 2020.

Minor revisions by Derek Croote, 2021.

Change Log

v0.1.0, 2012-05-02: Initial release.
v0.2.0, 2016-03-22: Major revision.
v0.2.0, 2016-03-23: Add numpy and pandas converters.
v1.0.0, 2016-10-21: Revise API to the pattern of from/to <format>
v2.0.0, 2016-10-21: Reader yields regular tuples, not namedtuples
v3.0.0, 2020-04-20: Revise API to the load/dump pattern. Enable specifying dataset name, variable names, labels, and formats.
v3.1.0, 2020-04-20: Allow dumps(dataframe) instead of requiring a Dataset.
v3.2.2, 2020-09-03: Fix a bug that incorrectly displays a - (dash) when it’s a null for numeric field.
v3.3.0, 2021-12-25: Enable reading Transport Version 8/9 files. Merry Christmas!
v3.4.0, 2021-12-25: Add support for special missing values, like .A, that extend float.
v3.5.0, 2021-12-31: Enable writing Transport Version 8 files. Happy New Year!
v3.5.1, 2022-02-01: Fix issues with writing Dataset.label and Variable.label.
v3.6.0, 2022-02-02: Add beta support for changing the text encoding for data and metadata.
v3.6.1, 2022-02-15: Fix issue with v8 format when the dataset has no long labels.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

3.6.1

Feb 16, 2022

3.6.0

Feb 3, 2022

3.5.1

Feb 1, 2022

3.5.0

Jan 1, 2022

3.4.0

Dec 25, 2021

3.3.2

Dec 25, 2021

3.3.1

Dec 25, 2021

3.3.0

Dec 25, 2021

3.2.1

May 26, 2020

3.2.0

May 21, 2020

3.1.6

May 9, 2020

3.1.5

May 4, 2020

3.1.4

May 4, 2020

3.1.3

Apr 28, 2020

3.1.2

Apr 24, 2020

3.1.1

Apr 21, 2020

3.1.0

Apr 21, 2020

3.0.0

Apr 21, 2020

2.0.2

Jan 6, 2017

2.0.1

Oct 26, 2016

2.0.0

Oct 26, 2016

1.1.3

Oct 22, 2016

1.1.2

Oct 22, 2016

1.1.1

Oct 22, 2016

1.1.0

Oct 22, 2016

1.0.0

Oct 22, 2016

0.6.4

Aug 25, 2016

0.6.3

Aug 25, 2016

0.6.2

Aug 25, 2016

0.6.1

Aug 25, 2016

0.3.4

Jul 5, 2016

0.3.3

Jul 5, 2016

0.3.2

Mar 23, 2016

0.3.1

Mar 23, 2016

0.3.0

Mar 23, 2016

0.2.0

Mar 23, 2016

0.1.0

May 2, 2012

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xport-3.6.1.tar.gz (38.4 kB view details)

Uploaded Feb 16, 2022 Source

Built Distribution

xport-3.6.1-py2.py3-none-any.whl (29.4 kB view details)

Uploaded Feb 16, 2022 Python 2Python 3

File details

Details for the file xport-3.6.1.tar.gz.

File metadata

Download URL: xport-3.6.1.tar.gz
Upload date: Feb 16, 2022
Size: 38.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for xport-3.6.1.tar.gz
Algorithm	Hash digest
SHA256	`da1e461bd35235498a56fcb61f01824c72bdf9760f049eac8adc4e0cbbc2e17e`
MD5	`e3f73f482a25d7f3364184ea209f15fa`
BLAKE2b-256	`8c027fb6ff8572b9c6e725598c72ea9a14833fc0a3073889d265a3c6b9f4a8f0`

See more details on using hashes here.

File details

Details for the file xport-3.6.1-py2.py3-none-any.whl.

File metadata

Download URL: xport-3.6.1-py2.py3-none-any.whl
Upload date: Feb 16, 2022
Size: 29.4 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for xport-3.6.1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`f3e189468a34252d17cac26989d55c951c6dfd868bdc6f23c3b735e2cef8aafe`
MD5	`3ec706c9782420043d42c45b912203cd`
BLAKE2b-256	`947ac842e37b6221934aca6dd8810246e0f1a027c372238f7aa5fcdb3f7938f0`

See more details on using hashes here.

xport 3.6.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

What’s it for?

Installation

Reading XPT

Writing XPT

Feature requests

Thanks

Contributing

Authors

Change Log

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes