Skip to main content

Lazy and eager reading of Stata and SAS files into Polars

Project description

polars_io

Lazily read Stata (.dta), SAS (.sas7bdat, .xpt), fixed-width (.txt, .dat, etc.), and newline delimited (.txt) files in polars.

Installation

pip install polars_io
# Or:
uv add polars_io

Usage

import polars as pl
import polars_io as pio

# Lazily load a sas file.
lf = pio.scan_sas7bdat("huge_SAS_file.sas7bdat")

# Get its schema.
lf.collect_schema()

# Take a look at the first few rows.
lf.head().collect()

# Projection and predicate pushdown work!
(
    lf
    .filter(pl.col("birth_year").is_between(2000, 2010))
    .select(pl.col("usage").mean())
    .collect()
)

# Load fixed-width files.
col_locations = {"year": (10, 14), "population": (14, 20)}
pio.scan_fwf("populations.txt", col_locations)

# Eager versions of all functions are also available.
pio.read_dta("mortality_rates.dta")

See the documentation for more info.

Details

The Stata and SAS implementations make use of the readstat C library via the Python bindings provided by pyreadstat. For numeric types, reading uses zero-copy conversions from numpy -> pyarrow -> polars and should be faster and have lower memory overhead than reading the data into pandas and then calling pl.from_pandas (benchmarks welcome).

Contributing

PRs adding support for reading other formats are very welcome! (E.g., .Rdata, Stata .dct, SPSS files, etc.)

Known Issues

This packages fails to some read files with non-utf8 metadata (e.g., column labels, notes on .dta files). This is a known issue with upstream packages that is being worked on (see Roche/pyreadstat#298 and WizardMac/ReadStat#344).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_io-0.4.2.tar.gz (63.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polars_io-0.4.2-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file polars_io-0.4.2.tar.gz.

File metadata

  • Download URL: polars_io-0.4.2.tar.gz
  • Upload date:
  • Size: 63.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.4

File hashes

Hashes for polars_io-0.4.2.tar.gz
Algorithm Hash digest
SHA256 acefc03130f6bf57c6e7f5c20682c4a3722ce81578951486b336c5d542f444a1
MD5 eaac043e74ca822776af2cbb56573e61
BLAKE2b-256 1f2b805149978a7ac9e3260f68ec88dd7c3577a77fe49fc7c17b3d90e14e00c9

See more details on using hashes here.

File details

Details for the file polars_io-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: polars_io-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 8.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.4

File hashes

Hashes for polars_io-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 30d5decaed5bc2d80d6e1b1e4c3e56bf8192f1b66de38f1dea46fde237010039
MD5 e901a2506e1aba157b31c525e43e7139
BLAKE2b-256 a41dabe6858ee01d7835d35d9d99eb83b850c567bf81181570ba31052fc1cc1f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page