xport

SAS XPORT file reader

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python
Topic
- Text Processing
- Utilities

Project description

Python reader for SAS XPORT data transport files (*.xpt).

What’s it for?

XPORT is the binary file format used by a bunch of United States government agencies for publishing data sets. It made a lot of sense if you were trying to read data files on your IBM mainframe back in 1988.

The official SAS specification for XPORT is relatively straightforward. The hardest part is converting IBM-format floating point to IEEE-format, which the specification explains in detail.

There was an update to the XPT specification for SAS v8 and above. This module has not yet been updated to work with the new version. However, if you’re using SAS v8+, you’re probably not using XPT format. The changes to the format appear to be trivial changes to the metadata, but this module’s current error-checking will raise a ParseError. If you’d like an update for v8, please let me know by submitting an issue.

Reading XPT

This module mimics the csv module of the standard library, providing Reader and DictReader classes. Note that xport.Reader is capitalized, unlike csv.reader.

with open('example.xpt', 'rb') as f:
    for row in xport.Reader(f):
        print row

Values in the row will be either a unicode string or a float, as specified by the XPT file metadata. Note that since XPT files are in an unusual binary format, you should open them using mode 'rb'. For convenience, you can also use the NamedTupleReader to get each row as a namedtuple, with an attribute for each field in the dataset.

The Reader object has a handful of metadata attributes:

Reader.fields – Names of the fields in each observation.
Reader.version – SAS version number used to create the XPT file.
Reader.os – Operating system used to create the XPT file.
Reader.created – Date and time that the XPT file was created.
Reader.modified – Date and time that the XPT file was last modified.

The module also provides a handful of utility functions for reading the whole XPT file and loading the rows into a Python data structure. The to_rows function will simply return a list of rows. The to_columns function will return the data as columns rather than rows. The columns will be an OrderedDict mapping the column labels as strings to the column values as lists of either strings or floats. For convenient conversion to a NumPy array or Pandas dataframe, you can use to_numpy and to_dataframe.

with open('example.xpt', 'rb') as f:
    columns = xport.to_columns(f)

with open('example.xpt', 'rb') as f:
    a = xport.to_numpy(f)

with open('example.xpt', 'rb') as f:
    df = xport.to_dataframe(f)

You can also use the xport module as a command-line tool to convert an XPT file to CSV (comma-separated values) file.:

$ python -m xport example.xpt > example.csv

If you want to access specific records, you should gather the rows in a list or use one of itertools recipes for quickly consuming and throwing away unncessary elements.

# Collect all the records in a list for random access
rows = list(xport.Reader(f))

# Select only record 42
from itertools import islice
row = next(islice(xport.Reader(f), 42, None))

# Select only the last 42 records
from collections import deque
rows = deque(xport.Reader(f), maxlen=42)

Writing XPT

The from_columns function will write an XPT file from a mapping of labels (as string) to columns (as iterable) or an iterable of (label, column) pairs.

# a mapping of labels to columns
mapping = {'numbers': [1, 3.14, 42],
           'text': ['life', 'universe', 'everything']}

with open('answers.xpt', 'wb') as f:
    xport.from_columns(mapping, f)

Column labels are restricted to 40 characters. Column names are restricted to 8 characters and will be automatically created based on the column label – the first 8 characters, non-alphabet characters replaced with underscores, padded to 8 characters if necessary. All text strings, including column labels, will be converted to bytes using the ISO-8859-1 encoding.

Unfortunately, writing XPT files cannot cleanly mimic the csv module, because we must examine all rows before writing any rows to correctly write the XPT file headers.

The from_rows function expects an iterable of iterables, like a list of tuples. In this case, the column labels have not been specified and will automatically be assigned as ‘x0’, ‘x1’, ‘x2’, …, ‘xM’.

rows = [('a', 1), ('b', 2)]

with open('example.xpt', 'wb') as f:
    xport.from_rows(rows, f)

To specify the column labels for from_rows, each row can be a mapping (such as a dict) of the column labels to that row’s values. Each row should have the same keys. Passing in rows as namedtuples, or any instance of a tuple that has a ._fields attribute, will set the column labels to the attribute names of the first row.

rows = [{'letters': 'a', 'numbers': 1},
        {'letters': 'b', 'numbers': 2}]

with open('example.xpt', 'wb') as f:
    xport.from_rows(rows, f)

Feature requests

I’m happy to fix bugs, improve the interface, or make the module faster. Just submit an issue and I’ll take a look.

Recent changes

Switched from load/dump with mode flags to to_rows, to_columns, from_rows and from_columns.
Reader yields regular tuples, not namedtuples.

Authors

Original version by Jack Cushman, 2012. Major revision by Michael Selik, 2016.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python
Topic
- Text Processing
- Utilities

Release history Release notifications | RSS feed

3.6.1

Feb 16, 2022

3.6.0

Feb 3, 2022

3.5.1

Feb 1, 2022

3.5.0

Jan 1, 2022

3.4.0

Dec 25, 2021

3.3.2

Dec 25, 2021

3.3.1

Dec 25, 2021

3.3.0

Dec 25, 2021

3.2.1

May 26, 2020

3.2.0

May 21, 2020

3.1.6

May 9, 2020

3.1.5

May 4, 2020

3.1.4

May 4, 2020

3.1.3

Apr 28, 2020

3.1.2

Apr 24, 2020

3.1.1

Apr 21, 2020

3.1.0

Apr 21, 2020

3.0.0

Apr 21, 2020

This version

2.0.2

Jan 6, 2017

2.0.1

Oct 26, 2016

2.0.0

Oct 26, 2016

1.1.3

Oct 22, 2016

1.1.2

Oct 22, 2016

1.1.1

Oct 22, 2016

1.1.0

Oct 22, 2016

1.0.0

Oct 22, 2016

0.6.4

Aug 25, 2016

0.6.3

Aug 25, 2016

0.6.2

Aug 25, 2016

0.6.1

Aug 25, 2016

0.3.4

Jul 5, 2016

0.3.3

Jul 5, 2016

0.3.2

Mar 23, 2016

0.3.1

Mar 23, 2016

0.3.0

Mar 23, 2016

0.2.0

Mar 23, 2016

0.1.0

May 2, 2012

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xport-2.0.2.tar.gz (12.5 kB view details)

Uploaded Jan 6, 2017 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

xport-2.0.2-py2.py3-none-any.whl (14.3 kB view details)

Uploaded Jan 6, 2017 Python 2Python 3

File details

Details for the file xport-2.0.2.tar.gz.

File metadata

Download URL: xport-2.0.2.tar.gz
Upload date: Jan 6, 2017
Size: 12.5 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for xport-2.0.2.tar.gz
Algorithm	Hash digest
SHA256	`0afe035727d4464d748334dbc57c760a82d1493b8c117493738d8b4c2f4ddc65`
MD5	`0505fac4785fd71e70bd205f1864709c`
BLAKE2b-256	`8c70e5d865841041de846a48b9609630852aa1b4e59a7681f3d9c27e198622b0`

See more details on using hashes here.

File details

Details for the file xport-2.0.2-py2.py3-none-any.whl.

File metadata

Download URL: xport-2.0.2-py2.py3-none-any.whl
Upload date: Jan 6, 2017
Size: 14.3 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No

File hashes

Hashes for xport-2.0.2-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`fccaba947214d6b0036ffbfcc444015963f731019ab664307bc98a440e04c7a0`
MD5	`a5ad580a92be7158bf2ed9d35b70f4d6`
BLAKE2b-256	`6aa0ade37253fe2c7a457a9a8703e93e4b1517dd53315e3941416ee4f7463f08`

See more details on using hashes here.

xport 2.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

What’s it for?

Reading XPT

Writing XPT

Feature requests

Recent changes

Authors

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes