Skip to main content

Offline address-to-Census data mapping for Python with PL 94-171 and ACS support

Project description

census-lookup

CI PyPI version Python 3.10+ Coverage License: MIT

A Python library for mapping US addresses to Census 2020 block-level data locally, without relying on rate-limited APIs.

Features

  • Fully offline geocoding using TIGER Address Range files (~95% match rate)
  • Lazy per-state data downloading - only download data for states you need
  • Configurable geographic levels - block, block group, tract, or county
  • Two Census data sources:
    • PL 94-171 (Redistricting Data): Population, race, housing counts at block level
    • ACS 5-Year Estimates: Income, education, employment, housing characteristics at tract level
  • Efficient batch processing using DuckDB for fast joins
  • CLI and Python API - use from command line or in your code

Installation

# Using uv (recommended)
uv add census-lookup

# Using pip
pip install census-lookup

Quick Start

CLI (no install required)

# Look up a single address (auto-downloads data as needed)
uvx census-lookup lookup "123 Main St, Los Angeles, CA 90012" --level block

# Include specific census variables
uvx census-lookup lookup "123 Main St, Los Angeles, CA 90012" -v P1_001N -v H1_001N

# Process a batch file
uvx census-lookup batch input.csv output.csv --address-column addr --level tract

# Pre-download data for states (optional - data downloads automatically)
uvx census-lookup download CA TX NY

# List available census variables
uvx census-lookup variables

# Show cache info
uvx census-lookup info

Example Output

$ uvx census-lookup lookup "1600 Pennsylvania Avenue NW, Washington, DC 20500"
{
  "input_address": "1600 Pennsylvania Avenue NW, Washington, DC 20500",
  "matched_address": "Pennsylvania Ave NW",
  "latitude": 38.898761,
  "longitude": -77.035117,
  "match_type": "interpolated",
  "match_score": 0.9,
  "geoid": "110010101003014",
  "state_fips": "11",
  "county_fips": "11001",
  "tract": "11001010100",
  "block_group": "110010101003",
  "block": "110010101003014",
  "P1_001N": 19.0
}

With ACS variables (median income, home value):

$ uvx census-lookup lookup "1600 Pennsylvania Avenue NW, Washington, DC 20500" \
    -v B19013_001E -v B25077_001E
{
  ...
  "B19013_001E": 72500.0,
  "B25077_001E": 485000.0
}

Python API

from census_lookup import CensusLookup, GeoLevel

# Initialize (first use will download data for the state)
lookup = CensusLookup(
    geo_level=GeoLevel.BLOCK,
    variables=["P1_001N", "H1_001N"],  # Population, Housing units
)

# Single address lookup
result = lookup.geocode("123 Main St, Los Angeles, CA 90012")
print(f"GEOID: {result.geoid}")
print(f"Population: {result.census_data['P1_001N']}")

# Batch processing
import pandas as pd
df = pd.read_csv("addresses.csv")
results = lookup.geocode_batch(df["address"], progress=True)

Geographic Levels

Level GEOID Length Example
State 2 06
County 5 06037
Tract 11 06037210100
Block Group 12 060372101001
Block 15 060372101001023

Census Variables

PL 94-171 (Redistricting Data)

Available at block level and above. Includes:

  • P1: Race (total population, by race categories)
  • P2: Hispanic/Latino by Race
  • P3: Race for Population 18+ (voting age)
  • P4: Hispanic/Latino 18+
  • H1: Housing Units (total, occupied, vacant)
# Use variable groups
lookup = CensusLookup(variable_groups=["population", "housing"])

# Or specify individual variables
lookup = CensusLookup(variables=["P1_001N", "P1_003N", "H1_001N"])

ACS 5-Year Estimates (American Community Survey)

Available at tract level and above. Includes richer demographic data:

Category Key Variables Description
Income B19013_001E, B19301_001E Median household income, per capita income
Poverty B17001_001E, B17001_002E Total population, below poverty level
Education B15003_022E, B15003_023E Bachelor's degree, Master's degree
Employment B23025_004E, B23025_005E Employed, Unemployed
Housing B25077_001E, B25064_001E Median home value, median rent
Tenure B25003_002E, B25003_003E Owner-occupied, Renter-occupied
Health B27010_017E, B27010_050E Employer insurance, Medicare
Commute B08301_003E, B08301_010E Drove alone, Public transit
Internet B28002_004E, B28002_013E Broadband access, No internet
Language B16001_002E, B16001_003E English only, Spanish

Over 100+ ACS variables available. Run uvx census-lookup variables --acs for the full list

from census_lookup import CensusLookup, GeoLevel, list_acs_variable_groups

# See available ACS variable groups
print(list_acs_variable_groups())

# Use ACS variables with your lookup
lookup = CensusLookup(
    geo_level=GeoLevel.TRACT,
    variables=["P1_001N"],  # PL 94-171 population
    acs_variables=["B19013_001E", "B25077_001E"],  # Median income, home value
    # Or use variable groups:
    # acs_variable_groups=["income", "housing"],
)

result = lookup.geocode("123 Main St, Los Angeles, CA 90012")
print(f"Median Income: ${result.census_data['B19013_001E']:,}")

Note: ACS data is available at tract level and above. When using block-level geocoding with ACS variables, the ACS data is joined at tract level.

Data Storage

Data is cached in ~/.census-lookup/:

~/.census-lookup/
├── catalog.json           # Tracks downloaded data
├── tiger/
│   ├── addrfeat/         # Address range features
│   └── blocks/           # Block polygons
└── census/
    ├── pl94171/          # PL 94-171 data
    └── acs5/             # ACS 5-Year data
        └── tract/        # ACS at tract level

Typical storage per state: 100-300MB (TIGER + PL 94-171), plus ~10-50MB for ACS

How It Works

  1. Parse address using the usaddress library
  2. Normalize street name for TIGER matching
  3. Match to TIGER Address Range segment
  4. Interpolate coordinates along the street segment
  5. Spatial lookup using rtree index to find containing census block
  6. Join census data using DuckDB for efficient queries

Data Sources

All data is downloaded from official US Census Bureau sources:

Development

# Clone and install with uv
git clone https://github.com/yolodex-ai/census-lookup.git
cd census-lookup
uv sync --all-extras

# Run unit tests (fast, no network required)
uv run pytest tests/unit -v

# Run functional tests (downloads real data, slower)
uv run pytest tests/functional -v -s

# Run all tests
uv run pytest tests/ -v

# Run linting
uv run ruff check src/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

census_lookup-0.1.9.tar.gz (50.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

census_lookup-0.1.9-py3-none-any.whl (54.8 kB view details)

Uploaded Python 3

File details

Details for the file census_lookup-0.1.9.tar.gz.

File metadata

  • Download URL: census_lookup-0.1.9.tar.gz
  • Upload date:
  • Size: 50.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for census_lookup-0.1.9.tar.gz
Algorithm Hash digest
SHA256 805325533b80a34f11e377afc2aa438d744edcf80fec039be102a78bed47b3e8
MD5 ebc543faa0366b43fdca7a6d4837651a
BLAKE2b-256 9b4c3e964c541ccd0de4c9ef9fba2361ba6a393aac3e8c638d65760f361eb211

See more details on using hashes here.

Provenance

The following attestation bundles were made for census_lookup-0.1.9.tar.gz:

Publisher: publish.yml on yolodex-ai/census-lookup

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file census_lookup-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: census_lookup-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 54.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for census_lookup-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 b735e7c89f5d84232a544bb75dc6f0f1ec40ef8c3d336ace9f91e25a722fea29
MD5 2548c9573c515dbab8e310c603a1a1e1
BLAKE2b-256 d4aff582e2eb23cbcc8ba91d4386f982df6d2441d6bc154f89700f3f3e6b5db0

See more details on using hashes here.

Provenance

The following attestation bundles were made for census_lookup-0.1.9-py3-none-any.whl:

Publisher: publish.yml on yolodex-ai/census-lookup

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page