Skip to main content

Lazy and eager reading of Stata and SAS files into Polars

Project description

polars_io

Lazily read Stata (.dta), SAS (.sas7bdat, .xpt), and fixed-width (.txt, .dat, etc.) files in polars.

Installation

pip install polars_io # or uv add polars_io

Usage

import polars as pl
import polars_io as pio

# lazily load a sas file
lf = pio.scan_sas7bdat("huge_SAS_file.sas7bdat")

# get its schema
lf.collect_schema()

# take a look at the first few rows
lf.head().collect()

# projection and predicate pushdown works!
(
    lf
    .filter(pl.col("birth_year").is_between(2000, 2010))
    .select(pl.col("usage").mean())
    .collect()
)

# load fixed-width files
col_locations = { "year" : (10, 14), "population" : (14, 20) }
pio.scan_fwf("populations.txt", col_locations)

# eager versions of all functions are also available
pio.read_dta("mortality_rates.dta")

See the documentation for more info.

Details

The Stata and SAS implementations make use the readstat C library via the python bindings provided by pyreadstat. For numeric types, reading uses zero-copy conversions from numpy -> pyarrow -> polars and should be faster and have lower memory overhead than reading the data into pandas and then calling pl.from_pandas (benchmarks welcome).

Contributing

PRs adding support for reading other formats are very welcome! (E.g. .Rdata, Stata .dct, etc.)

Issues

This packages fails to some read files with non-utf8 metadata (e.g. column labels, notes on .dta files). This is a known issue with upstream packages that is being worked on (see Roche/pyreadstat#298 and WizardMac/ReadStat#344).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_io-0.3.0.tar.gz (30.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polars_io-0.3.0-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file polars_io-0.3.0.tar.gz.

File metadata

  • Download URL: polars_io-0.3.0.tar.gz
  • Upload date:
  • Size: 30.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.10

File hashes

Hashes for polars_io-0.3.0.tar.gz
Algorithm Hash digest
SHA256 edb5ae844fe5e260c9e8cb217b9afe71de8b5fb51efc0c294948a518b4dd79ff
MD5 2aa1e3f13b65ad8519aa31220cd617ba
BLAKE2b-256 d35bb952e7be354c0c10d31effc48f53dccda8bebb01ef610899c6d180a8d87e

See more details on using hashes here.

File details

Details for the file polars_io-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: polars_io-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 7.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.10

File hashes

Hashes for polars_io-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fb8107a7217d1ff68b31af6bd961c2786084dd54141163fd5170d54c5f2f1f06
MD5 c32521a8014c4082cc58a59bf23a92d2
BLAKE2b-256 ed702ef63366d67eb68dc73e15a82c911ba9f7ade5615ce0b3a4385cf6a36e58

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page