Skip to main content

Lazy and eager reading of Stata and SAS files into Polars

Project description

polars_io

Lazily read Stata (.dta), SAS (.sas7bdat, .xpt), fixed-width (.txt, .dat, etc.), and newline delimited (.txt) files in polars.

Installation

pip install polars_io
# Or:
uv add polars_io

Usage

import polars as pl
import polars_io as pio

# Lazily load a sas file.
lf = pio.scan_sas7bdat("huge_SAS_file.sas7bdat")

# Get its schema.
lf.collect_schema()

# Take a look at the first few rows.
lf.head().collect()

# Projection and predicate pushdown work!
(
    lf
    .filter(pl.col("birth_year").is_between(2000, 2010))
    .select(pl.col("usage").mean())
    .collect()
)

# Load fixed-width files.
col_locations = {"year": (10, 14), "population": (14, 20)}
pio.scan_fwf("populations.txt", col_locations)

# Eager versions of all functions are also available.
pio.read_dta("mortality_rates.dta")

See the documentation for more info.

Details

The Stata and SAS implementations make use of the readstat C library via the Python bindings provided by pyreadstat. For numeric types, reading uses zero-copy conversions from numpy -> pyarrow -> polars and should be faster and have lower memory overhead than reading the data into pandas and then calling pl.from_pandas (benchmarks welcome).

Contributing

PRs adding support for reading other formats are very welcome! (E.g., .Rdata, Stata .dct, SPSS files, etc.)

Known Issues

This packages fails to some read files with non-utf8 metadata (e.g., column labels, notes on .dta files). This is a known issue with upstream packages that is being worked on (see Roche/pyreadstat#298 and WizardMac/ReadStat#344).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_io-0.5.0.tar.gz (64.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polars_io-0.5.0-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file polars_io-0.5.0.tar.gz.

File metadata

  • Download URL: polars_io-0.5.0.tar.gz
  • Upload date:
  • Size: 64.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.4

File hashes

Hashes for polars_io-0.5.0.tar.gz
Algorithm Hash digest
SHA256 120cf441a8caa48d2ea1b76101f712db9bbd6298fe8b36d0f028f4d801c9c0a5
MD5 3d2779047727d6c354e42bf8b8909f5b
BLAKE2b-256 63dc3f3cf72df75631acc9b0ffe068359817fe6db913addff326db19a2951506

See more details on using hashes here.

File details

Details for the file polars_io-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: polars_io-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.4

File hashes

Hashes for polars_io-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aee49129f60449593b200c76e9857943def878115b323359d1df2c69738a5fcc
MD5 d2d27562fab2052e171bfb988aec9284
BLAKE2b-256 5bcc13c95ad3fc5c55e79a87d71af50ecf1d0cd30047035d9534d3ddefe62bbe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page