A Population Synthesizer for High Demographic Resolution Analysis.
Project description
Livelike: Vivid Synthetic Populations
This package provides a high-level wrapper for generating synthetic populations via Census APIs based on the American Community Survey (ACS) 5-Year Estimates. Synthetic populations are virtual representations of people and households produced for small census areas (block groups, tracts) and can be attributed by a variety of demographic, economic, social, worker, student, mobility, housing, health, and communication characteristics found in the ACS.
Specifying a P-MEDM Problem
Synthetic populations are generated by allocating records from the ACS Public Use Microdata Sample (PUMS) from their native spatial resolution of Public-Use Microdata Areas (100,000+ people) to small census areas (typically <8000 people) such that the aggregate characteristics of people and households align closely with population profiles of the small census areas available in the ACS Summary File (SF). This is accomplished using Penalized Maximum-Entropy Dasymetric Modeling (P-MEDM), which seeks to recreate the error variances on each small-area variable estimate in the ACS SF. LiveLike makes it simple to design and solve P-MEDM problems by fetching all of the necessary P-MEDM inputs for a given PUMA via Census APIs.
The bulk of P-MEDM setup is handled automatically by the acs module via the Census Microdata API.
In a basic use-case, inputs are simply:
- The 2010 or 2020 PUMA ID (
<State FIPS> + <PUMA FIPS>, as shown here - A Census API key (optional).
Examples are provided in the notebooks directory.
Supported Geographies
P-MEDM requires a target geography and an aggregate geography to account for error variances. The selected target geography determines the aggregate geography:
| Level | Code | Population (approx.) | Aggregate |
|---|---|---|---|
| Block group | bg |
600 - 3000 | Tract |
| Tract | trt |
1200 - 8000 | Supertract |
LiveLike handles tracts, which have no sub-county aggregation level, using a regionalization approach to generate custom "supertracts" (see notebooks/tract_supertract_2019.ipynb for an example).
Supported ACS Years
The ACS 5-Year Estimates are a rolling 5% sample of the United States population weighted to be representative of the release year (vintage), with additional adjustments for factors like income. LiveLike uses the ACS 2019 5-Year Estimates as its default vintage.
| Year | Vintage | Available |
|---|---|---|
| 2016 | ACS 2012 - 2016 5-Year Estimates | :white_check_mark: |
| 2017 | ACS 2013 - 2017 5-Year Estimates | :white_check_mark: |
| 2018 | ACS 2014 - 2018 5-Year Estimates | :white_check_mark: |
| 2019 | ACS 2015 - 2019 5-Year Estimates | :white_check_mark: |
| 2020 | ACS 2016 - 2020 5-Year Estimates | :x: |
| 2021 | ACS 2017 - 2021 5-Year Estimates | :x: |
| 2022 | ACS 2018 - 2022 5-Year Estimates | :x: |
| 2023 | ACS 2019 - 2023 5-Year Estimates | :white_check_mark: |
Currently, years between 2016 and 2019 and 2023 are supported. The gap between 2020 - 2022 is due to mixed geography problems that P-MEDM cannot directly handle (2010 PUMAs with 2020 small areas for 2020, 2021; mixture of 2010/2020 PUMAs with 2020 small areas for 2022).
P-MEDM Constraints
P-MEDM constraints are sets of residential and population characteristics common between the ACS SF and PUMS that can be used to design a P-MEDM model and attribute the synthetic population. LiveLike provides several configurations of prebuilt constraints:
-
Base (default): Baseline modeling constraints representing population totals, routine daily activities (workers, students), and mobility characteristics, available in
config.up_base_constraints_selection. -
Expanded: Baseline modeling constraints with a selection of demographic, social, economic, and housing characteristics, available in
config.up_expanded_constraints_selection. The Base constraints can be overwritten by the Expanded ones using:from config import up_expanded_constraints_selection acs.puma(..., constraints_selection=up_expanded_constraints_selection)
Several additional constraint themes (health, communications) are available outside the prebuilt configurations and can be added onto a custom constraints selection.
| Theme | Description | Base | Expanded | Notes |
|---|---|---|---|---|
| universe | Sampling universe totals (population, civilian noninstituionalized population, group quarters population, housing units, occupied housing units). | x | x | |
| worker | Worker characteristics (employment, class of worker, industry, occupation, hours worked per week). | x | x | |
| student | Student characteristics (grade level attending, public/private school). | x | x | |
| mobility | Mobility characteristics (commute time/mode, vehicles available). | x | x | |
| demographic | Basic demographics (sex, age) and living arrangement characteristics. | x | Expanded: Sex by age and household type only | |
| social | Social characteristics (race/ethnicity, language, place of birth, veteran status). | x | Expanded: Race/ethnicity only | |
| economic | Economic characteristics (household income, poverty, educational attainment). | x | Expanded: Household income and income to poverty ratio only | |
| housing | Housing characteristics (tenure, dwelling type, year built, number of rooms, house heating fuel). | x | Expanded: Dwelling type and year built only | |
| health | Health insurance coverage type. | |||
| communications | Household internet access. |
Custom Constraint Selection
Constraint selections are passed to acs.puma(constraint_selection=...) as a dict with keys representing ACS variable themes and values representing specific subjects (tables). If the value passed is a bool type, a True value will include variables for all subjects in the theme, while a False value will bypass that theme (the same as omitting the theme from the selection). If the value passed is a list type, only listed subjects will be included in the result.
Example:
custom_constraints_selection = {
"universe" : True,
"worker" : True,
"student" : True,
"mobility" : True,
"demographic" : [
"sex_age",
"hhtype",
],
"economic" : [
"hhinc",
"ipr",
],
"health" : True,
"communications" : True,
}
- Use all variables listed under the
universe,worker,student, andmobility,health, andcommunicationsthemes. - Use only household income (
hhinc) and income to poverty ratio (ipr) from theeconomictheme.
The Constraints File
The constraints file (livelike/data/constraints.csv) underlies the constraint selection process, describing relationships between available PUMS variables, P-MEDM constraints, and ACS Summary File (SF) variables, as well as year of availability for constraints. It is used to generate individual-level representations of ACS SF tables/variables based on PUMS data.
level: PUMS file level (personorhousehold).geo_base_level: Baseline geography for which the constraint is available (bg: block group;trt: tract).theme: Constraint topics/themes. Each theme points to a PUMS/SF crosswalking function inlivelike.pums.subject: The subject of the ACS SF table to be represented at the individual level using PUMS data. This column references the function in thepumsmodule used to produce a P-MEDM constraint.constraint: P-MEDM constraining variable name.pums[1...n]: Multiple columns the PUMS variables associated with each P-MEDM constraint table. These are parsed using a regex search for any columns in the file beginning withpums.code: ACS SF variable codes matching each P-MEDM constraint.desc: P-MEDM constraining variable longform description.begin_year: the initial year in which the constraint was availble.end_year: the final year in which the constraint was available.
Census API Key
Using a Census API Key is optional but is recommended to avoid hitting request limits.
- Register for a Census API Key.
- Activate your key via the confirmation email link you receive.
- In the top directory of
livelike, run:
echo YOUR_CENSUS_API_KEY > censusapikey.txt
The file that is created, censusapikey.txt, is not tracked by git. This ensures that your personal API key is never exposed on a remote branch.
Population Synthesis
Utilities for population synthesis can be found in the homesim module. Our current approach is to sample from the P-MEDM allocation matrix ($i...n$ PUMS records by $j...m$ areas) for a given area based on family status/household size, group quarters, and vacant housing, such that the area's total population is approximately preserved.
Batch operations
The multi module provides utilities for population synthesis across multiple PUMAs, including:
- Making PUMA instances across multiple geographies or replicates (alternative PUMS weights)
- Population synthesis
- Querying and extracting PUMS descriptors from Census Microdata API
Testing
Rebuilding Test Data
The scripts to rebuild test data are stored in the utilities directory. Execute them from the main directory, for example:
python utilities/prep_test_build_puma.py
python utilities/prep_test_notebook_solutions.py
Running Testing Suite Locally
To run the testing suite locally, enter:
bash run_tests.sh
Rough edges
Constraint order matters
The default P-MEDM solver, pymedm, gives different solutions when constraint order varies. This seems to be tied to floating point underflow errors in jax, a core dependency of pymedm, that seem to be caused by differing positions of the model input variables. LiveLike for both prebuilt and custom constraints, implementing a method in the puma constructor to consistently sort constraints by theme and code.
Negative replicate weights
In rare cases, the values of PUMS replicate household weights can be negative. For compatibility with P-MEDM, we zero out these negative values. See this thread for further details.
The P-MEDM population constraint is approximated as a sum of the ratio of each household member's person weight (PWGTP) to the head of household's weight (which itself roughly matches the household weight). When the head of household's replicate person weight is less than one, we use a placeholder value of 1 so that each additional household member still contributes to the population constraint for the household. We welcome community contributions for more robust improvements to this approach.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file livelike-1.5.1.tar.gz.
File metadata
- Download URL: livelike-1.5.1.tar.gz
- Upload date:
- Size: 12.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76e605547be498f844bc0c0e647691bfc648f2c065d73d254b57fe6fdf0405f7
|
|
| MD5 |
780a935837351231b35b5d882a4b734c
|
|
| BLAKE2b-256 |
b927dc003c9afc136141c2f56d680edab1c857e98d70ffe9d7195565bc900bdd
|
File details
Details for the file livelike-1.5.1-py3-none-any.whl.
File metadata
- Download URL: livelike-1.5.1-py3-none-any.whl
- Upload date:
- Size: 9.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a078b83aa0462d923071238d1f1550aa4ab68603e62a915daf5a67c964bb507
|
|
| MD5 |
0439fd7d35bcc686425514115954aff2
|
|
| BLAKE2b-256 |
c3634683e8ed17f892aae83719a080d46b32b338f15e3ea9b6c8ea9e04e7eb6a
|