Skip to main content

Offline address-to-Census data mapping for Python with PL 94-171 and ACS support

Project description

census-lookup

CI PyPI version Python 3.10+ Coverage License: MIT

A Python library for mapping US addresses to Census data locally, without relying on rate-limited APIs. Supports Census 2020 (PL 94-171) and American Community Survey (ACS) 5-Year Estimates.

Features

  • Fully offline geocoding using TIGER Address Range files (~95% match rate)
  • Lazy per-state data downloading - only download data for states you need
  • Census data at ALL geographic levels - block, block group, tract, county, and state in a single lookup
  • Two Census data sources:
    • PL 94-171 (Redistricting Data): Population, race, housing counts at block level
    • ACS 5-Year Estimates: Income, education, employment, housing characteristics at tract level
  • Efficient batch processing for large address lists
  • CLI and Python API - use from command line or in your code

Installation

# Using uv (recommended)
uv add census-lookup

# Using pip
pip install census-lookup

Quick Start

CLI (no install required)

# Look up a single address (auto-downloads data as needed)
uvx census-lookup lookup "123 Main St, Los Angeles, CA 90012"

# Include specific census variables
uvx census-lookup lookup "123 Main St, Los Angeles, CA 90012" -v P1_001N -v H1_001N

# Process a batch file (use -l to set output level for CSV columns)
uvx census-lookup batch input.csv output.csv --address-column addr -l tract

# Pre-download data for states (optional - data downloads automatically)
uvx census-lookup download CA TX NY

# List available census variables
uvx census-lookup variables

# Show cache info
uvx census-lookup info

Example Output

$ uvx census-lookup lookup "1600 Pennsylvania Avenue NW, Washington, DC 20500" -v P1_001N
{
  "input_address": "1600 Pennsylvania Avenue NW, Washington, DC 20500",
  "matched_address": "Pennsylvania Ave NW",
  "latitude": 38.898761,
  "longitude": -77.035117,
  "match_type": "interpolated",
  "match_score": 0.9,
  "state_fips": "11",
  "county_fips": "11001",
  "tract": "11001010100",
  "block_group": "110010101003",
  "block": "110010101003014",
  "P1_001N": {
    "block": 19.0,
    "block_group": 963.0,
    "tract": 2699.0,
    "county": 689545.0,
    "state": 689545.0
  }
}

Census data is returned at all geographic levels in a single lookup. Each variable contains values aggregated at block, block group, tract, county, and state levels.

With ACS variables (median income, home value):

$ uvx census-lookup lookup "1600 Pennsylvania Avenue NW, Washington, DC 20500" \
    -v B19013_001E -v B25077_001E
{
  "...": "...",
  "B19013_001E": {
    "tract": 72500.0
  },
  "B25077_001E": {
    "tract": 485000.0
  }
}

ACS variables are available at tract level and above.

Python API

from census_lookup import CensusLookup

# Initialize (first use will download data for the state)
lookup = CensusLookup(
    variables=["P1_001N", "H1_001N"],  # Population, Housing units
)

# Single address lookup
result = await lookup.geocode("123 Main St, Los Angeles, CA 90012")
print(f"Block GEOID: {result.block}")
print(f"Block Population: {result.census_data['P1_001N']['block']}")
print(f"Tract Population: {result.census_data['P1_001N']['tract']}")

# Batch processing
import pandas as pd
df = pd.read_csv("addresses.csv")
results = await lookup.geocode_batch(df["address"], progress=True)

Geographic Levels

Level GEOID Length Example
State 2 06
County 5 06037
Tract 11 06037210100
Block Group 12 060372101001
Block 15 060372101001023

Census Variables

PL 94-171 (Redistricting Data)

Available at block level and above. Includes:

  • P1: Race (total population, by race categories)
  • P2: Hispanic/Latino by Race
  • P3: Race for Population 18+ (voting age)
  • P4: Hispanic/Latino 18+
  • H1: Housing Units (total, occupied, vacant)
# Use variable groups
lookup = CensusLookup(variable_groups=["population", "housing"])

# Or specify individual variables
lookup = CensusLookup(variables=["P1_001N", "P1_003N", "H1_001N"])

ACS 5-Year Estimates (American Community Survey)

Available at tract level and above. Includes richer demographic data:

Category Key Variables Description
Income B19013_001E, B19301_001E Median household income, per capita income
Poverty B17001_001E, B17001_002E Total population, below poverty level
Education B15003_022E, B15003_023E Bachelor's degree, Master's degree
Employment B23025_004E, B23025_005E Employed, Unemployed
Housing B25077_001E, B25064_001E Median home value, median rent
Tenure B25003_002E, B25003_003E Owner-occupied, Renter-occupied
Health B27010_017E, B27010_050E Employer insurance, Medicare
Commute B08301_003E, B08301_010E Drove alone, Public transit
Internet B28002_004E, B28002_013E Broadband access, No internet
Language B16001_002E, B16001_003E English only, Spanish

Over 100+ ACS variables available. Run uvx census-lookup variables --acs for the full list

from census_lookup import CensusLookup, list_acs_variable_groups

# See available ACS variable groups
print(list_acs_variable_groups())

# Use ACS variables with your lookup
lookup = CensusLookup(
    variables=["P1_001N"],  # PL 94-171 population
    acs_variables=["B19013_001E", "B25077_001E"],  # Median income, home value
    # Or use variable groups:
    # acs_variable_groups=["income", "housing"],
)

result = await lookup.geocode("123 Main St, Los Angeles, CA 90012")
# PL 94-171 data available at all levels
print(f"Block Population: {result.census_data['P1_001N']['block']}")
# ACS data available at tract level
print(f"Median Income: ${result.census_data['B19013_001E']['tract']:,}")

Note: ACS data is available at tract level and above. When you request ACS variables, they will appear in the nested output with tract (and higher) levels populated.

Data Storage

Data is cached in ~/.census-lookup/:

~/.census-lookup/
├── catalog.json           # Tracks downloaded data
├── tiger/
│   ├── addrfeat/         # Address range features
│   └── blocks/           # Block polygons
└── census/
    ├── pl94171/          # PL 94-171 data
    └── acs5/             # ACS 5-Year data
        └── tract/        # ACS at tract level

Typical storage per state: 100-300MB (TIGER + PL 94-171), plus ~10-50MB for ACS

How It Works

  1. Parse address using the usaddress library
  2. Normalize street name for TIGER matching
  3. Match to TIGER Address Range segment
  4. Interpolate coordinates along the street segment
  5. Spatial lookup using rtree index to find containing census block
  6. Join census data using DuckDB for efficient queries

Data Sources

All data is downloaded from official US Census Bureau sources:

Development

# Clone and install with uv
git clone https://github.com/yolodex-ai/census-lookup.git
cd census-lookup
uv sync --all-extras

# Run unit tests (fast, no network required)
uv run pytest tests/unit -v

# Run functional tests (downloads real data, slower)
uv run pytest tests/functional -v -s

# Run all tests
uv run pytest tests/ -v

# Run linting
uv run ruff check src/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

census_lookup-0.2.3.tar.gz (51.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

census_lookup-0.2.3-py3-none-any.whl (55.8 kB view details)

Uploaded Python 3

File details

Details for the file census_lookup-0.2.3.tar.gz.

File metadata

  • Download URL: census_lookup-0.2.3.tar.gz
  • Upload date:
  • Size: 51.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for census_lookup-0.2.3.tar.gz
Algorithm Hash digest
SHA256 181a126548a6882d29e985e06623a0587c332f5736b0d8a47ae2db668d3651ba
MD5 52aed7774adf7a4ec5b332d6d118e481
BLAKE2b-256 b857c62ca286adae2aaa5ef3397f8ae4b6b2bb2995736a96d8a8b26c300f3093

See more details on using hashes here.

Provenance

The following attestation bundles were made for census_lookup-0.2.3.tar.gz:

Publisher: publish.yml on yolodex-ai/census-lookup

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file census_lookup-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: census_lookup-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 55.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for census_lookup-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 685582ac2ab1100edd0bec64c01eaa1cb6c40e4b50ffea838a3ae76d5d460b02
MD5 a8b05213a9e2d12aee7a37ea5d40986f
BLAKE2b-256 a285cbbc57a8fd66c174b979c3df3e7b9f9d5bb35c411fa4fb43098e0ab648b1

See more details on using hashes here.

Provenance

The following attestation bundles were made for census_lookup-0.2.3-py3-none-any.whl:

Publisher: publish.yml on yolodex-ai/census-lookup

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page