Skip to main content

Lightweight Python package for intuitively exploring and acquiring U.S. Census data with spatial integration

Project description

pycen

Lightweight Python package for exploring and acquiring U.S. Census data with intuitive spatial integration.

flowchart TD
    A[Need Census data?]

    A --> B & C

    subgraph PYCEN["<i>pycen</i>"]
        direction TB
        B[<b>`explore`</b><br/>Intuitive metadata<br/>keyword search]
        C[<b>`acquire`</b><br/>Data + boundaries<br/>in one call]

        C --> D
        C --> E

        D[<b>`quick_check`</b><br/>Quality validation]
        E[<b>`quick_viz`</b><br/>Instant maps]
    end

    B --> F
    D & E --> F[Domain analysis]

    style A fill:#94a3b8,stroke:#334155,stroke-width:2px,color:#000
    style B fill:#3b82f6,stroke:#1e40af,stroke-width:2px,color:#fff
    style C fill:#3b82f6,stroke:#1e40af,stroke-width:2px,color:#fff
    style D fill:#22c55e,stroke:#15803d,stroke-width:2px,color:#fff
    style E fill:#22c55e,stroke:#15803d,stroke-width:2px,color:#fff
    style F fill:#94a3b8,stroke:#334155,stroke-width:2px,color:#000
    style PYCEN fill:#1e293b,stroke:#64748b,stroke-width:2px,color:#fff

overview

pycen makes the exploration and acquisition of U.S. Census data accessible and intuitive for spatial workflows. The explore module presents browsable Census API metadata via topic-organized, interactive nested tables, with customizable themes to highlight curated variable recipes. It also supports natural‑language keyword searches for efficient variable discovery. The acquire module streamlines data processing: one function call returns both data and boundaries as a GeoDataFrame with built-in quality checks and rapid visualizations;simple tabular or boundaries-only downloads are separately callable. pycen pulls live data products with efficient local caches to keep iterations fast, smooth, and reproducible. The multi‑year fetch function enables longitudinal comparisons tracking change over time.

sample use

basic workflow

import pycen
from pycen import explore, acquire

# 1. Explore variables
# `browse` and `search` return interactive tables
# `lookup` returns details
explore.browse(year=2023, dataset="acs5").show()
explore.search("vehicle", year=2023, dataset="acs5").show()
explore.lookup("B08201_002E", year=2021, dataset="acs5")

# 2. Acquire data
## continental US income gini map
gdf = acquire.get_censhp(
    variables={"B19083_001E":"gini_index"},
    geography="place",               # if no state/county, gets nationwide
    dataset="acs5",
    year=2023,
)
acquire.quick_check(gdf)             # returns N/A summary
acquire.quick_viz(gdf, "gini_index") # returns map + distribution histogram
acquire.quick_viz(gdf, "gini_index", palette="viridis") # optional customizable palette
acquire.quick_viz(gdf, "gini_index",save_path='gini_index.png') # optional save

## finer scale
## Cook County income gini at tract level
gdf = acquire.get_censhp(
    variables={"B19083_001E":"gini_index"},
    geography="tract",
    county="Cook County County",
    state="IL",
    dataset="acs5",
    year=2023,
)
acquire.quick_viz(gdf, "gini_index")

## neighborhood analyses
## Chicago super commuters
gdf = acquire.get_censhp(
    variables={"B08303_012E":"commute_over_60min", "B08303_001E":"total_commuters"},
    geography="block group",
    place="Chicago city",
    #county="Cook County",  # optional, add for clarity
    state="IL",
    dataset="acs5",
    year=2023
)
gdf["pct_super_commuters"] = gdf["commute_over_60min"] / gdf["total_commuters"] * 100
acquire.quick_viz(gdf, "pct_super_commuters")

## decennial data supports block-scale (finest)
## Chicago housing vacancy rates at block level
select_var={
    "H001003": "vacant_hh",
    "H001001": "total_hh"
}
gdf = acquire.get_censhp(
    variables=select_var,
    geography="block",
    county="Cook County",
    state="IL",
    dataset="dec_pl",
    year=2010,
)
gdf['vacancy_rate'] = gdf['vacant_hh'] / gdf['total_hh'] * 100
acquire.quick_viz(gdf, "vacancy_rate")

tabular data workflow

# 3. Tabular data only
df = acquire.get_census(
    variables=["B25032_022E"],  # renter-occupied, mobile home
    geography="tract",
    state="CA",
    year=2021,
)

# 4. Single-year, multivariable tabular data for comparative analysis
import pandas as pd
import matplotlib.pyplot as plt
from pycen import acquire

vars_race = {
    'B03002_001E': 'total',
    'B03002_003E': 'nh_white',
    'B03002_004E': 'nh_black',
    'B03002_006E': 'nh_asian',
    'B03002_005E': 'nh_aian',
    'B03002_007E': 'nh_nhpi',
    'B03002_008E': 'nh_other',
    'B03002_009E': 'nh_two_or_more',
    'B03002_012E': 'hispanic',
}

df_race = acquire.get_census(
    variables=vars_race,
    geography='county',
    state='CA',
    county='Alameda',
    dataset='acs5',
    year=2023,
)

row = df_race.iloc[0]
other = row['nh_aian'] + row['nh_nhpi'] + row['nh_other'] + row['nh_two_or_more']
vals = {
    'White (NH)': row['nh_white'],
    'Black (NH)': row['nh_black'],
    'Asian (NH)': row['nh_asian'],
    'Other (NH)': other,
    'Hispanic (any race)': row['hispanic'],
}

pct = {k: v / row['total'] * 100 for k, v in vals.items()}

plt.figure(figsize=(7, 4))
plt.bar(pct.keys(), pct.values(), color=['#4c78a8', '#f58518', '#54a24b', '#b279a2', '#e45756'])
plt.ylabel('Population %')
plt.title('Alameda County, CA -— Race/Ethnicity (ACS 2023)')
plt.xticks(rotation=25, ha='right')
plt.tight_layout()
plt.show()

# 5. Multi-year tabular data for trend analysis
# comparative tracking of remote work surge (2019–2023)
from pycen import acquire
import matplotlib.pyplot as plt

# explore.search("work from home", year=2023, dataset="acs5").show()
# B08101_049E = worked from home
df_long = acquire.get_census(
    variables={'B08101_049E': 'wfh_workers', 'B08101_001E': 'total_workers'},
    geography='county',
    state='CA',
    years=[2019, 2020, 2021, 2022, 2023],
    merge='long'
)

df_long['wfh_pct'] = (df_long['wfh_workers'] / df_long['total_workers']) * 100
bay_area = df_long[df_long['NAME'].str.contains('San Francisco|Alameda|Santa Clara|Contra Costa|San Mateo')]

for county in bay_area['NAME'].unique():
    county_data = bay_area[bay_area['NAME'] == county]
    plt.plot(county_data['year'], county_data['wfh_pct'], marker='o', label=county)

plt.title('Bay Area WFH 2019-2023')
plt.ylabel('Work From Home (%)')
plt.xlabel('Year')
plt.xticks(sorted(major['year'].unique())) 
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()

core functions

Explore

  • explore.search(query, year, dataset) - supports exact term match and fuzzy keyword search
  • explore.browse(year, dataset) - view all variables via interactive tree table with theme variable highlights
  • explore.lookup(code, year, dataset) - inspect variable details

Acquire

  • acquire.get_censhp(...) - data + boundaries --> GeoDataFrame
  • acquire.get_census(...) - data only --> DataFrame
  • acquire.get_boundaries(...) - boundaries only --> shp/gpkg
  • acquire.quick_check(gdf) - N/A values summary
  • acquire.quick_viz(gdf, column, palette, save_path) - exploratory map + distribution histogram for select variable

Info

  • pycen.get_product() - list datasets and years
  • pycen.get_geography() - list geography levels by dataset

Geo Helpers

from pycen import geography
geography.search('Oakland', state='CA') # most powerful, return all related info

# state and county lookup
geography.state('CA') # can also search by 'California' or fips code '06'
geography.county('Alameda', state='CA')

# list geographies
geography.list_places('CA', query='Oakland') # minimal search
pycen.geography.list_cbsa(query='new york',year=2023, limit=5) # specify year and return limit if multi-match
pycen.geography.list_csa(query='detroit',year=2023, limit=5) # look up csa name
geography.list_counties('CA')

Themes

  • pycen.set_theme(name_or_dict) - set active theme name or register a custom theme (dict)
  • pycen.get_theme_settings() - get active theme name (defaults to a general curation of useful variables)
  • pycen.explore.get_theme(name=None) - get theme details (dict); defaults to active theme
  • pycen.list_themes() - list available theme names (includes session custom themes)

Notes

  • Datasets: acs5, acs1, dec_pl, dec_sf1
  • Spatial features require: geopandas, pygris
  • Geographies are resolved per dataset/year from Census geography metadata (live/cache/static)
  • Optional: rich enables prettier terminal tables for explore.search().show()
  • geography.search() uses a bundled 2020 snapshot by default; if a different vintage is requested, it attempts a live code-list fetch and falls back to 2020 if unavailable

API key for higher rate limits:

pycen.set_api_key("YOUR_KEY")  # get key at api.census.gov/data/key_signup.html

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycen-0.1.0a4.tar.gz (608.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pycen-0.1.0a4-py3-none-any.whl (611.1 kB view details)

Uploaded Python 3

File details

Details for the file pycen-0.1.0a4.tar.gz.

File metadata

  • Download URL: pycen-0.1.0a4.tar.gz
  • Upload date:
  • Size: 608.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for pycen-0.1.0a4.tar.gz
Algorithm Hash digest
SHA256 a10480ec541925a24029bfdf109e9a7ee6015f27d7a762f9a64a5c9268bf3893
MD5 e934eff5cff4824c630c7e5d30d5a01d
BLAKE2b-256 5f1d9ff03070395ed81158c2b947595e1bf3edef2197b6c0eb0823eccd795631

See more details on using hashes here.

File details

Details for the file pycen-0.1.0a4-py3-none-any.whl.

File metadata

  • Download URL: pycen-0.1.0a4-py3-none-any.whl
  • Upload date:
  • Size: 611.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for pycen-0.1.0a4-py3-none-any.whl
Algorithm Hash digest
SHA256 ff0453e9f3f5d36db86afc413b957752409b829387207a28e2f100e4b44e90dd
MD5 0b65175ffef58c9582a86773a1361c0d
BLAKE2b-256 63dbe904a94ac733522b67a06f3e3e56205352141b660718511242b67d3cf6ec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page