Skip to main content

A package to work with IPUMS microdata with Polars.

Project description

polars_ipums

A package to work with IPUMS microata in Python. Used in-house at Opportunity Insights.

Example

# convert IPUMS microdata export to a hive-partioned Parquet dataset

import polars as pl
from polars_ipums import create_parquet_dataset

input_path = "~/Downloads/ipums_export"
output_path = "~/Desktop/parquet_ipums"

labels = {
    # use the default IPUMS labels for the sex column
    "sex": {},
    # use custom labels for the race/hispanic origin column
    "rachsing": {
        "White": "White",
        "Black/African American": "Black",
        "American Indian/Alaska Native": "AIAN",
        "Asian/Pacific Islander": "Asian",
        "Hispanic/Latino": "Hispanic",
    },
}

# give a few columns more human-readable names
renames = {
    "rachsing": "race",
    "countyfip": "county",
    "ftotinc": "family_income",
    "hhincome": "household_income",
}

# custom education column!
educd = pl.col("educd")
my_education = (
    pl.when(educd.le(61))
    .then(0)
    .when(educd.is_between(62, 64))
    .then(1)
    .when(educd.is_between(65, 100))
    .then(2)
    .when(educd.is_between(101, 116))
    .then(3)
    .alias("my_education")
)

create_parquet_dataset(
    input_path,
    output_path,
    labels=labels,
    partition_by=["year"],
    renames=renames,
    additional_columns=[my_education],
    override_output=True,
    verbose=True,
)

# load a few rows back into memory
ipums_microdata = (
    pl.scan_parquet(output_path / "**/*.parquet", hive_partitioning=True)
    .head()
    .collect()
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_ipums-0.0.1.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polars_ipums-0.0.1-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file polars_ipums-0.0.1.tar.gz.

File metadata

  • Download URL: polars_ipums-0.0.1.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.23

File hashes

Hashes for polars_ipums-0.0.1.tar.gz
Algorithm Hash digest
SHA256 023364e25929a440fa88eb91855000b44bf5ae9e8a80c893c94f6dd3a0e892d2
MD5 cb622617f4b3de3502627e6c2634f29a
BLAKE2b-256 0dfbcff977372cc836f69cb20de96a0e2cf5a6e46a05ce7da3e96900bdc01704

See more details on using hashes here.

File details

Details for the file polars_ipums-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for polars_ipums-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 78c3dd52d36bfa91be37771fb29103133e2a386a52d31e397ce1eecfa8a0c838
MD5 010ec8b9a6e84321f5af5b3c0339531f
BLAKE2b-256 42f7da44da173749206e205e19fb1bb01717f5ad67afca8e1ca76aa09a220ab7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page