Skip to main content

Some helper functions for working with Census 2020 data

Project description

Helper functions for Census 2020 data

Every decade the US Census Bureau releases data from its decennial census. However, the files they provide are quite complicated. And while they provide SAS and R, they don't provide any help for Python.

This package provides some convenience functions for playing around with all of this Census data in Python.

Requirements

We require Python 3.7.1 or above. This package does use pyarrow to make manipulating these large data sets easier. However, on some systems, you may encounter installation troubles. If you do, feel free to file an issue!

To install the package, simply run

pip install census2020

Usage

Getting the data

To use this package, you should first download the Census data. We've included a simple CLI for you to grab all of the data and preprocess it:

census2020 pull-all --output data

Here data is a folder into which all the processed data will be dumped. WARNING: It totals about 1.4GB after it's processed.

If for some reason CLI doesn't work, you can pull it by hand as follows:

from pathlib import Path

import pyarrow.parquet as pq
import us

from census2020 import downloader

output_dir = Path("data")

for state in sorted(set(us.STATES) | {us.states.DC}):
    print(f"Downloading {state.name}...")
    table = downloader.get_state(state.abbr)
    pq.write_table(table, output_dir / f"{state.abbr.lower()}.parquet")
    print(f"Done with {state.name}")

Historical data

There's a good chance you're downloading this data to compare it to historical data. If so, you can download the PL94 data from the 2010 Census similarly to above. Just run

census2020 pull-all --output data2010 --year 2010

Or if the CLI doesn't work for you, change the loop above to read:

table = downloader.get_state(state.abbr, year=2010)

Reading the data

Reading in all the data into memory can be a bit of a difficult task, so we have provided some interfaces to pyarrow's filtering features to help.

For example, suppose you wanted the total population of people who identify as both White and Asian in all Census Tracts in Kentucky, Indiana, and Ohio. Assuming you have downloaded all the data, you can run the following code:

from census2020 import readers
from census2020.constants import SummaryLevel

df = readers.read_filtered_dataset(
    "data",
    states=["KY", "IN", "OH"],
    levels=SummaryLevel.STATE_COUNTY_TRACT,
    columns="P0010013",
).to_pandas()

Here "data" is the location to which you downloaded the Census data, which can be either the 2020 or 2010 data.

Each of states, columns, and levels can be either singular values or lists of values. If no value is specified, then all states, columns, and levels available will be returned.

Codebook

More detail on the information in these files is available from the Census Bureau. In particular, a summary of the fields meanings is available in this Excel file, reporduced in this repo as field_names.xlsx.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

census2020-0.1.0.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

census2020-0.1.0-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file census2020-0.1.0.tar.gz.

File metadata

  • Download URL: census2020-0.1.0.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.8.6 Darwin/20.6.0

File hashes

Hashes for census2020-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4123bc3ca9ae8626083875b1b0d6aa27e430ff397eb6fb26eca08a6185ed960e
MD5 fd7bba5f79916ead9885b44544c2c08a
BLAKE2b-256 b3048561568b81795c74de967a0aabbacfd4ff228c153bb0db3caae903ad85e6

See more details on using hashes here.

File details

Details for the file census2020-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: census2020-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.8.6 Darwin/20.6.0

File hashes

Hashes for census2020-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5e3cb9e207f6b72afc997a936529948fa07d8dad82dfe2380b3f4e0558f4c08e
MD5 9a42262521227b710bd63f1df494557f
BLAKE2b-256 f48fba2f2142061b725376ccba8c1c7150beaef27ea6a9773cd81ae475b54e59

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page