Skip to main content

Lightweight Python package for intuitively exploring and acquiring U.S. Census data with spatial integration

Project description

pycen

Lightweight Python package for exploring and acquiring U.S. Census data with intuitive spatial integration.

flowchart TD
    A[Need Census data?]

    A --> B & C

    subgraph PYCEN["<i>pycen</i>"]
        direction TB
        B[<b>`explore`</b><br/>Intuitive metadata<br/>keyword search]
        C[<b>`acquire`</b><br/>Data + boundaries<br/>in one call]

        C --> D
        C --> E

        D[<b>`quick_check`</b><br/>Quality validation]
        E[<b>`quick_viz`</b><br/>Instant maps]
    end

    B --> F
    D & E --> F[Domain analysis]

    style A fill:#94a3b8,stroke:#334155,stroke-width:2px,color:#000
    style B fill:#3b82f6,stroke:#1e40af,stroke-width:2px,color:#fff
    style C fill:#3b82f6,stroke:#1e40af,stroke-width:2px,color:#fff
    style D fill:#22c55e,stroke:#15803d,stroke-width:2px,color:#fff
    style E fill:#22c55e,stroke:#15803d,stroke-width:2px,color:#fff
    style F fill:#94a3b8,stroke:#334155,stroke-width:2px,color:#000
    style PYCEN fill:#1e293b,stroke:#64748b,stroke-width:2px,color:#fff

overview

pycen makes the exploration and acquisition of U.S. Census data accessible and intuitive for spatial workflows. The explore module presents browsable Census API metadata via topic-organized, interactive nested tables, with customizable themes to highlight curated variable recipes. It also supports natural‑language keyword searches for efficient variable discovery. The acquire module streamlines data processing: one function call returns both data and boundaries as a GeoDataFrame with built-in quality checks and rapid visualizations;simple tabular or boundaries-only downloads are separately callable. pycen pulls live data products with efficient local caches to keep iterations fast, smooth, and reproducible. The multi‑year fetch function enables longitudinal comparisons tracking change over time.

sample use

basic workflow

import pycen
from pycen import explore, acquire

# 1. Explore variables
# `browse` and `search` return interactive tables
# `lookup` returns details
explore.browse(year=2023, dataset="acs5").show()
explore.search("vehicle", year=2023, dataset="acs5").show()
explore.lookup("B08201_002E", year=2021, dataset="acs5")

# 2. Acquire data
## continental US income gini map
gdf = acquire.get_censhp(
    variables={"B19083_001E":"gini_index"},
    geography="place",               # if no state/county, gets nationwide
    dataset="acs5",
    year=2023,
)
acquire.quick_check(gdf)             # returns N/A summary
acquire.quick_viz(gdf, "gini_index") # returns map + distribution histogram
acquire.quick_viz(gdf, "gini_index", palette="viridis") # optional customizable palette
acquire.quick_viz(gdf, "gini_index",save_path='gini_index.png') # optional save

## finer scale
## Cook County income gini at tract level
gdf = acquire.get_censhp(
    variables={"B19083_001E":"gini_index"},
    geography="tract",
    county="Cook County County",
    state="IL",
    dataset="acs5",
    year=2023,
)
acquire.quick_viz(gdf, "gini_index")

## neighborhood analyses
## Chicago super commuters
gdf = acquire.get_censhp(
    variables={"B08303_012E":"commute_over_60min", "B08303_001E":"total_commuters"},
    geography="block group",
    place="Chicago city",
    #county="Cook County",  # optional, add for clarity
    state="IL",
    dataset="acs5",
    year=2023
)
gdf["pct_super_commuters"] = gdf["commute_over_60min"] / gdf["total_commuters"] * 100
acquire.quick_viz(gdf, "pct_super_commuters")

## decennial data supports block-scale (finest)
## Chicago housing vacancy rates at block level
select_var={
    "H001003": "vacant_hh",
    "H001001": "total_hh"
}
gdf = acquire.get_censhp(
    variables=select_var,
    geography="block",
    county="Cook County",
    state="IL",
    dataset="dec_pl",
    year=2010,
)
gdf['vacancy_rate'] = gdf['vacant_hh'] / gdf['total_hh'] * 100
acquire.quick_viz(gdf, "vacancy_rate")

tabular data workflow

# 3. Tabular data only
df = acquire.get_census(
    variables=["B25032_022E"],  # renter-occupied, mobile home
    geography="tract",
    state="CA",
    year=2021,
)

# 4. Single-year, multivariable tabular data for comparative analysis
import pandas as pd
import matplotlib.pyplot as plt
from pycen import acquire

vars_race = {
    'B03002_001E': 'total',
    'B03002_003E': 'nh_white',
    'B03002_004E': 'nh_black',
    'B03002_006E': 'nh_asian',
    'B03002_005E': 'nh_aian',
    'B03002_007E': 'nh_nhpi',
    'B03002_008E': 'nh_other',
    'B03002_009E': 'nh_two_or_more',
    'B03002_012E': 'hispanic',
}

df_race = acquire.get_census(
    variables=vars_race,
    geography='county',
    state='CA',
    county='Alameda',
    dataset='acs5',
    year=2023,
)

row = df_race.iloc[0]
other = row['nh_aian'] + row['nh_nhpi'] + row['nh_other'] + row['nh_two_or_more']
vals = {
    'White (NH)': row['nh_white'],
    'Black (NH)': row['nh_black'],
    'Asian (NH)': row['nh_asian'],
    'Other (NH)': other,
    'Hispanic (any race)': row['hispanic'],
}

pct = {k: v / row['total'] * 100 for k, v in vals.items()}

plt.figure(figsize=(7, 4))
plt.bar(pct.keys(), pct.values(), color=['#4c78a8', '#f58518', '#54a24b', '#b279a2', '#e45756'])
plt.ylabel('Population %')
plt.title('Alameda County, CA -— Race/Ethnicity (ACS 2023)')
plt.xticks(rotation=25, ha='right')
plt.tight_layout()
plt.show()

# 5. Multi-year tabular data for trend analysis
# comparative tracking of remote work surge (2019–2023)
from pycen import acquire
import matplotlib.pyplot as plt

# explore.search("work from home", year=2023, dataset="acs5").show()
# B08101_049E = worked from home
df_long = acquire.get_census(
    variables={'B08101_049E': 'wfh_workers', 'B08101_001E': 'total_workers'},
    geography='county',
    state='CA',
    years=[2019, 2020, 2021, 2022, 2023],
    merge='long'
)

df_long['wfh_pct'] = (df_long['wfh_workers'] / df_long['total_workers']) * 100
bay_area = df_long[df_long['NAME'].str.contains('San Francisco|Alameda|Santa Clara|Contra Costa|San Mateo')]

for county in bay_area['NAME'].unique():
    county_data = bay_area[bay_area['NAME'] == county]
    plt.plot(county_data['year'], county_data['wfh_pct'], marker='o', label=county)

plt.title('Bay Area WFH 2019-2023')
plt.ylabel('Work From Home (%)')
plt.xlabel('Year')
plt.xticks(sorted(major['year'].unique())) 
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()

core functions

Explore

  • explore.search(query, year, dataset) - supports exact term match and fuzzy keyword search
  • explore.browse(year, dataset) - view all variables via interactive tree table with theme variable highlights
  • explore.lookup(code, year, dataset) - inspect variable details

Acquire

  • acquire.get_censhp(...) - data + boundaries --> GeoDataFrame
  • acquire.get_census(...) - data only --> DataFrame
  • acquire.get_boundaries(...) - boundaries only --> shp/gpkg
  • acquire.quick_check(gdf) - N/A values summary
  • acquire.quick_viz(gdf, column, palette, save_path) - exploratory map + distribution histogram for select variable

Info

  • pycen.get_product() - list datasets and years
  • pycen.get_geography() - list geography levels by dataset

Geo Helpers

from pycen import geography
geography.search('Oakland', state='CA') # most powerful, return all related info

# state and county lookup
geography.state('CA') # can also search by 'California' or fips code '06'
geography.county('Alameda', state='CA')

# list geographies
geography.list_places('CA', query='Oakland') # minimal search
pycen.geography.list_cbsa(query='new york',year=2023, limit=5) # specify year and return limit if multi-match
pycen.geography.list_csa(query='detroit',year=2023, limit=5) # look up csa name
geography.list_counties('CA')

Themes

  • pycen.set_theme(name_or_dict) - set active theme name or register a custom theme (dict)
  • pycen.get_theme_settings() - get active theme name (defaults to a general curation of useful variables)
  • pycen.explore.get_theme(name=None) - get theme details (dict); defaults to active theme
  • pycen.list_themes() - list available theme names (includes session custom themes)

Notes

  • Datasets: acs5, acs1, dec_pl, dec_sf1
  • Spatial features require: geopandas, pygris
  • Geographies are resolved per dataset/year from Census geography metadata (live/cache/static)
  • Optional: rich enables prettier terminal tables for explore.search().show()
  • geography.search() uses a bundled 2020 snapshot by default; if a different vintage is requested, it attempts a live code-list fetch and falls back to 2020 if unavailable

API key for higher rate limits:

pycen.set_api_key("YOUR_KEY")  # get key at api.census.gov/data/key_signup.html

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycen-0.1.0a3.tar.gz (607.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pycen-0.1.0a3-py3-none-any.whl (610.4 kB view details)

Uploaded Python 3

File details

Details for the file pycen-0.1.0a3.tar.gz.

File metadata

  • Download URL: pycen-0.1.0a3.tar.gz
  • Upload date:
  • Size: 607.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for pycen-0.1.0a3.tar.gz
Algorithm Hash digest
SHA256 1defcb85befb047645730d6120bf987b22bb7fc72e7915526be19aea48c3e037
MD5 098659d60f71b2f3a2c0e2f25ea33f39
BLAKE2b-256 bde3603ab6a987fae75e41a65e79f9857127619c3b5341fea7c8467d4b639ab2

See more details on using hashes here.

File details

Details for the file pycen-0.1.0a3-py3-none-any.whl.

File metadata

  • Download URL: pycen-0.1.0a3-py3-none-any.whl
  • Upload date:
  • Size: 610.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for pycen-0.1.0a3-py3-none-any.whl
Algorithm Hash digest
SHA256 68820a94c3c1d705eb87c8dae87a524b22d7e191b143cf16b183ba0d2335116e
MD5 c3c8887d7d5d1cf3d3470547984015ed
BLAKE2b-256 556556edb752502d65e7713e73e83dc8192ba68fab8e0596aa4b6fd31cd3d516

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page