Skip to main content

Lightweight Python package for intuitively exploring and acquiring U.S. Census data with spatial integration

Project description

pycen

Lightweight Python package for exploring and acquiring U.S. Census data with intuitive spatial integration.

flowchart TD
    A[Need Census data?]

    A --> B & C

    subgraph PYCEN["<i>pycen</i>"]
        direction TB
        B[<b>`explore`</b><br/>Intuitive metadata<br/>keyword search]
        C[<b>`acquire`</b><br/>Data + boundaries<br/>in one call]

        C --> D
        C --> E

        D[<b>`quick_check`</b><br/>Quality validation]
        E[<b>`quick_viz`</b><br/>Instant maps]
    end

    B --> F
    D & E --> F[Domain analysis]

    style A fill:#94a3b8,stroke:#334155,stroke-width:2px,color:#000
    style B fill:#3b82f6,stroke:#1e40af,stroke-width:2px,color:#fff
    style C fill:#3b82f6,stroke:#1e40af,stroke-width:2px,color:#fff
    style D fill:#22c55e,stroke:#15803d,stroke-width:2px,color:#fff
    style E fill:#22c55e,stroke:#15803d,stroke-width:2px,color:#fff
    style F fill:#94a3b8,stroke:#334155,stroke-width:2px,color:#000
    style PYCEN fill:#1e293b,stroke:#64748b,stroke-width:2px,color:#fff

overview

pycen makes the exploration and acquisition of U.S. Census data accessible and intuitive for spatial workflows. The explore module presents browsable Census API metadata via topic-organized, interactive nested tables, with customizable themes to highlight curated variable recipes. It also supports natural‑language keyword searches for efficient variable discovery. The acquire module streamlines data processing: one function call returns both data and boundaries as a GeoDataFrame with built-in quality checks and rapid visualizations;simple tabular or boundaries-only downloads are separately callable. pycen pulls live data products with efficient local caches to keep iterations fast, smooth, and reproducible. The multi‑year fetch function enables longitudinal comparisons tracking change over time.

sample use

# basic workflow
import pycen
from pycen import explore, acquire

# 1. Explore variables
# `browse` and `search` return interactive tables
# `lookup` returns details
explore.browse(year=2023, dataset="acs5").show()
explore.search("vehicle", year=2023, dataset="acs5").show()
explore.lookup("B08201_002E", year=2021, dataset="acs5")

# 2. Acquire data
## continental US income gini map
gdf = acquire.get_censhp(
    variables={"B19083_001E":"gini_index"},
    geography="place",               # if no state/county, gets nationwide
    dataset="acs5",
    year=2023,
)
acquire.quick_check(gdf)             # returns N/A summary
acquire.quick_viz(gdf, "gini_index") # returns map + distribution histogram

## finer scale
## Cook County income gini at tract level
gdf = acquire.get_censhp(
    variables={"B19083_001E":"gini_index"},
    geography="tract",
    county="Cook County County",
    state="IL",
    dataset="acs5",
    year=2023,
)
acquire.quick_viz(gdf, "gini_index")

## neighborhood analyses
## Chicago super commuters
gdf = acquire.get_censhp(
    variables={"B08303_013E":"commute_over_60min", "B08303_001E":"total_commuters"},
    geography="block group",
    county="Cook County",
    state="IL",
    dataset="acs5",
    year=2023,
    clip_to="place",                    # default off, clip to [place/cbsa/csa]
    place="Chicago city",
)
gdf["pct_super_commuters"] = gdf["commute_over_60min"] / gdf["total_commuters"] * 100
acquire.quick_viz(gdf, "pct_super_commuters")

## decennial data supports block-scale (finest)
## Chicago housing vacancy rates at block level
select_var={
    "H001003": "vacant_hh",
    "H001001": "total_hh"
}
gdf = acquire.get_censhp(
    variables=select_var,
    geography="block",
    county="Cook County",
    state="IL",
    dataset="dec_pl",
    year=2010,
)
gdf['vacancy_rate'] = gdf['vacant_hh'] / gdf['total_hh'] * 100
acquire.quick_viz(gdf, "vacancy_rate")

# 3. Tabular data only
df = acquire.get_census(
    variables=["B25032_022E"],  # renter-occupied, mobile home
    geography="tract",
    state="CA",
    year=2021,
)

# 4. Multi-year tabular data for trend analysis
# comparative tracking of remote work surge (2019–2023)
from pycen import acquire
import matplotlib.pyplot as plt

# explore.search("work from home", year=2023, dataset="acs5").show()
# B08101_049E = worked from home
df_long = acquire.get_census(
    variables={'B08101_049E': 'wfh_workers', 'B08101_001E': 'total_workers'},
    geography='county',
    state='CA',
    years=[2019, 2020, 2021, 2022, 2023],
    merge='long'
)

df_long['wfh_pct'] = (df_long['wfh_workers'] / df_long['total_workers']) * 100
bay_area = df_long[df_long['NAME'].str.contains('San Francisco|Alameda|Santa Clara|Contra Costa|San Mateo')]

for county in bay_area['NAME'].unique():
    county_data = bay_area[bay_area['NAME'] == county]
    plt.plot(county_data['year'], county_data['wfh_pct'], marker='o', label=county)

plt.title('Bay Area WFH 2019-2023')
plt.ylabel('Work From Home (%)')
plt.xlabel('Year')
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()

core functions

Explore

  • explore.search(query, year, dataset) - supports exact term match and fuzzy keyword search
  • explore.browse(year, dataset) - view all variables via interactive tree table with theme variable highlights
  • explore.lookup(code, year, dataset) - inspect variable details

Acquire

  • acquire.get_censhp(...) - data + boundaries --> GeoDataFrame
  • acquire.get_census(...) - data only --> DataFrame
  • acquire.get_boundaries(...) - boundaries only --> shp/gpkg
  • acquire.quick_check(gdf) - N/A values summary
  • acquire.quick_viz(gdf, column) - exploratory map + distribution histogram for select variable

Info

  • pycen.get_product() - list datasets and years
  • pycen.get_geography() - list geography levels by dataset

Themes

  • pycen.set_theme(name_or_dict) - set active theme name or register a custom theme (dict)
  • pycen.get_theme_settings() - get active theme name (defaults to a general curation of useful variables)
  • pycen.explore.get_theme(name=None) - get theme details (dict); defaults to active theme
  • pycen.list_themes() - list available theme names (includes session custom themes)

Notes

  • Datasets: acs5, acs1, dec_pl, dec_sf1
  • Spatial features require: geopandas, pygris
  • Geographies are resolved per dataset/year from Census geography metadata (live/cache/static)

API key for higher rate limits:

pycen.set_api_key("YOUR_KEY")  # get key at api.census.gov/data/key_signup.html

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycen-0.1.0a2.tar.gz (70.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pycen-0.1.0a2-py3-none-any.whl (74.4 kB view details)

Uploaded Python 3

File details

Details for the file pycen-0.1.0a2.tar.gz.

File metadata

  • Download URL: pycen-0.1.0a2.tar.gz
  • Upload date:
  • Size: 70.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for pycen-0.1.0a2.tar.gz
Algorithm Hash digest
SHA256 6100ee026143c48a34eecad25672ae61263cd88284c0cfa3da1fd04953bffdb5
MD5 068b6f99be6a92f1414428365847b1c9
BLAKE2b-256 8813d6f8057c3268bc9a59d3c3cc89c8525728714a95e936f81d2cded9719d83

See more details on using hashes here.

File details

Details for the file pycen-0.1.0a2-py3-none-any.whl.

File metadata

  • Download URL: pycen-0.1.0a2-py3-none-any.whl
  • Upload date:
  • Size: 74.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for pycen-0.1.0a2-py3-none-any.whl
Algorithm Hash digest
SHA256 965773f95ae40d20ce5e193b648050f30f3b0ffc72fc4626ee94066679672f26
MD5 2cc0d12ed46d1091f02015bee0a820eb
BLAKE2b-256 d9a02350f5d346889868a55e57367db98782ce4a6e14d7f8f73968532c027394

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page