Some helper functions for working with Census 2020 data

These details have not been verified by PyPI

Project links

Project description

Helper functions for Census 2020 data

Every decade the US Census Bureau releases data from its decennial census. However, the files they provide are quite complicated. And while they provide SAS and R, they don't provide any help for Python.

This package provides some convenience functions for playing around with all of this Census data in Python.

Requirements

We require Python 3.7.1 or above. This package does use pyarrow to make manipulating these large data sets easier. However, on some systems, you may encounter installation troubles. If you do, feel free to file an issue!

To install the package, simply run

pip install census2020

Usage

Getting the data

To use this package, you should first download the Census data. We've included a simple CLI for you to grab all of the data and preprocess it:

census2020 pull-all --output data

Here data is a folder into which all the processed data will be dumped. WARNING: It totals about 1.4GB after it's processed.

If for some reason CLI doesn't work, you can pull it by hand as follows:

from pathlib import Path

import pyarrow.parquet as pq
import us

from census2020 import downloader

output_dir = Path("data")

for state in sorted(set(us.STATES) | {us.states.DC}):
    print(f"Downloading {state.name}...")
    table = downloader.get_state(state.abbr)
    pq.write_table(table, output_dir / f"{state.abbr.lower()}.parquet")
    print(f"Done with {state.name}")

Historical data

There's a good chance you're downloading this data to compare it to historical data. If so, you can download the PL94 data from the 2010 Census similarly to above. Just run

census2020 pull-all --output data2010 --year 2010

Or if the CLI doesn't work for you, change the loop above to read:

table = downloader.get_state(state.abbr, year=2010)

Reading the data

Reading in all the data into memory can be a bit of a difficult task, so we have provided some interfaces to pyarrow's filtering features to help.

For example, suppose you wanted the total population of people who identify as both White and Asian in all Census Tracts in Kentucky, Indiana, and Ohio. Assuming you have downloaded all the data, you can run the following code:

from census2020 import readers
from census2020.constants import SummaryLevel

df = readers.read_filtered_dataset(
    "data",
    states=["KY", "IN", "OH"],
    levels=SummaryLevel.STATE_COUNTY_TRACT,
    columns="P0010013",
).to_pandas()

Here "data" is the location to which you downloaded the Census data, which can be either the 2020 or 2010 data.

Each of states, columns, and levels can be either singular values or lists of values. If no value is specified, then all states, columns, and levels available will be returned.

Codebook

More detail on the information in these files is available from the Census Bureau. In particular, a summary of the fields meanings is available in this Excel file, reporduced in this repo as field_names.xlsx.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Aug 23, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

census2020-0.1.0.tar.gz (8.7 kB view details)

Uploaded Aug 23, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

census2020-0.1.0-py3-none-any.whl (9.2 kB view details)

Uploaded Aug 23, 2021 Python 3

File details

Details for the file census2020-0.1.0.tar.gz.

File metadata

Download URL: census2020-0.1.0.tar.gz
Upload date: Aug 23, 2021
Size: 8.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.1.4 CPython/3.8.6 Darwin/20.6.0

File hashes

Hashes for census2020-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`4123bc3ca9ae8626083875b1b0d6aa27e430ff397eb6fb26eca08a6185ed960e`
MD5	`fd7bba5f79916ead9885b44544c2c08a`
BLAKE2b-256	`b3048561568b81795c74de967a0aabbacfd4ff228c153bb0db3caae903ad85e6`

See more details on using hashes here.

File details

Details for the file census2020-0.1.0-py3-none-any.whl.

File metadata

Download URL: census2020-0.1.0-py3-none-any.whl
Upload date: Aug 23, 2021
Size: 9.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.1.4 CPython/3.8.6 Darwin/20.6.0

File hashes

Hashes for census2020-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5e3cb9e207f6b72afc997a936529948fa07d8dad82dfe2380b3f4e0558f4c08e`
MD5	`9a42262521227b710bd63f1df494557f`
BLAKE2b-256	`f48fba2f2142061b725376ccba8c1c7150beaef27ea6a9773cd81ae475b54e59`

See more details on using hashes here.

census2020 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Helper functions for Census 2020 data

Requirements

Usage

Getting the data

Historical data

Reading the data

Codebook

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes