Backup utilities.

These details have not been verified by PyPI

Project description

censusdis/backup

This project is a command-line utility for bulk downloads of U.S. Census data.

Installation

pip install census_backup

Usage Examples

General help and overview

census-backup --help

Download from a group across all available bulk geographies

This is the simplest way to use this tool. It will look for all available geographies for the given dataset and vintage, then download all variables in the specified group for every geography it can.

census-backup -d acs/acs5 -v 2020 -g B02001 -o ~/tmp/backup  --log INFO

The required arguments are:

-d: the data set
-v: the vintage
-g: the variable group

The option arguments are:

-o: output directory. The default is the current working directory.
--log: logging level. INFO is useful to see what is happening.

Logging will also help you see what geographies the script was not able to download in bulk. The census API syntax does not allow all the bulk downloads one might like to do.

Running this command takes about 20 minutes. It will vary depending on the speed of your internet connection and the load on the census servers.

Data Files

When the command completes, you will find a whole tree full of files under the output root ~/tmp/backup that you chose. It will look something like this:

├── american_indian_area_alaska_native_area_hawaiian_home_land.csv
├── american_indian_area_alaska_native_area_reservation_or_statistical_entity_only.csv
├── american_indian_area_off_reservation_trust_land_only_hawaiian_home_land.csv
├── combined_new_england_city_and_town_area
│   └── state_or_part.csv
├── combined_new_england_city_and_town_area.csv
├── combined_statistical_area.csv
├── division.csv
├── metadata.json
├── metropolitan_statistical_area_micropolitan_statistical_area.csv
├── new_england_city_and_town_area.csv
├── region.csv
├── state.csv
├── state=01
│   ├── american_indian_area_alaska_native_area_hawaiian_home_land_or_part.csv
│   ├── american_indian_area_alaska_native_area_reservation_or_statistical_entity_only_or_part.csv
│   ├── american_indian_area_off_reservation_trust_land_only_hawaiian_home_land_or_part.csv
│   ├── combined_statistical_area_or_part.csv
│   ├── congressional_district.csv
│   ├── county
│   │   ├── county_subdivision.csv
│   │   ├── tract
│   │   │   └── block_group.csv
│   │   └── tract.csv
│   ├── county.csv
│   ├── metropolitan_statistical_area_micropolitan_statistical_area_or_part.csv
│   ├── place.csv
│   ├── public_use_microdata_area.csv
│   ├── school_district_unified.csv
│   ├── state_legislative_district_lower_chamber.csv
│   └── state_legislative_district_upper_chamber.csv
├── state=02
│   ├── alaska_native_regional_corporation.csv
│   ├── american_indian_area_alaska_native_area_hawaiian_home_land_or_part.csv
│   ├── american_indian_area_alaska_native_area_reservation_or_statistical_entity_only_or_part.csv
│   ├── congressional_district.csv
│   ├── county
│   │   ├── county_subdivision.csv

At the top level files like division.csv and region.csv aggregate data at those levels. If we look inside we will see that the first few columns look something like this:

Division-level data in spreadsheet form.

This is the data we downloaded at the divsion level. There is one row per division and one column per variable. The variables include annotations and margins of error. These are not always populated and not everyone looks at them, but we include them because this is a backup file.

Notice also that there are some nested files below each state, which aggregate the data at different geographic levels. For example, the file ~/tmp/backup/state=01/county/tract.csv has data at the census tract level for the state of Alabama (FIPS code 01). If we look in that file, we see

Alabama track data in spreadsheet form.

It has the same columns and the other file we looked at, but the rows are aggregated at a much finer level of geography.

Metadata

In order to learn more about what the columns represent, we can look in the file ~/tmp/backup/metadata.json. This has a lot of metadata about the run, how it was invoked, what the variables are, and how long it took. It is good for future reference so we have a record of exactly when and how the data was backed up from the census servers.

The relevant part of the metadata file for the variables looks like:

  "dataset": "acs/acs5",
  "group": "B02001",
  "vintage": 2020,
  "vatiables": [
    {
      "YEAR": 2020,
      "DATASET": "acs/acs5",
      "GROUP": "B02001",
      "VARIABLE": "B02001_001E",
      "LABEL": "Estimate!!Total:",
      "SUGGESTED_WEIGHT": NaN,
      "VALUES": null
    },
    {
      "YEAR": 2020,
      "DATASET": "acs/acs5",
      "GROUP": "B02001",
      "VARIABLE": "B02001_001EA",
      "LABEL": "Annotation of Estimate!!Total:",
      "SUGGESTED_WEIGHT": NaN,
      "VALUES": null
    },
    {
      "YEAR": 2020,
      "DATASET": "acs/acs5",
      "GROUP": "B02001",
      "VARIABLE": "B02001_001M",
      "LABEL": "Margin of Error!!Total:",
      "SUGGESTED_WEIGHT": NaN,
      "VALUES": null
    },
    {
      "YEAR": 2020,
      "DATASET": "acs/acs5",
      "GROUP": "B02001",
      "VARIABLE": "B02001_001MA",
      "LABEL": "Annotation of Margin of Error!!Total:",
      "SUGGESTED_WEIGHT": NaN,
      "VALUES": null
    },
    {
      "YEAR": 2020,
      "DATASET": "acs/acs5",
      "GROUP": "B02001",
      "VARIABLE": "B02001_002E",
      "LABEL": "Estimate!!Total:!!White alone",
      "SUGGESTED_WEIGHT": NaN,
      "VALUES": null
    },
    ...

For each variable, we have its name, label, and other attributes.

The metadata file has other relevant information like the version of census-backup that created it, the time it started and ended—that's how we knew it took about 20 minutes—and the command-line arguments that were used.

Download multiple groups

Sometimes we want to download several groups of variables at once. To do so, all we have to do is give multiple values to the -g argument. For example,

census-backup -d dec/pl -v 2020 -g H1 P1 P2 P3 P4 P5 -G +block -o ~/tmp/decpl2020 --log INFO

This downloads six different groups from the decennial public law data set that is used to apportion congressional districts among the states. This ends up being a lot of columns. This command also used -G +block to download data at the block level, which is the smallest level any census data sets are aggregated over. dec/pl is one of the few data sets available at this fine a geography.

Download geometries that have `state` as a component

For backup purposes, a command like the one we just saw is the most complete way of getting as much data as possible from a given dataset and group of variables.

But sometimes we really only care about a specific set of geography levels. This example will download at the [state], [state, county], [state, county, tract] etc... levels.

census-backup -d acs/acs5 -v 2020 -g B02001 -G state -o ~/tmp/backup-states-and-below  --log INFO

There will be fewer output data files than before, but the command will run more quickly.

Download state aggregated data only

This will not get geographies within the state. It will only get data aggregate at the state level. The + prefix says that state must be the last component of the geography, so it will not match [state, county] like it would without the +.

census-backup -d acs/acs5 -v 2020 -g B02001 -G +state -o ~/tmp/backup-states  --log INFO

Download county aggregated data within states only

This will not get geographies within the state. It will only get data aggregate at the state level.

census-backup -d acs/acs5 -v 2020 -g B02001 -G state +county -o ~/tmp/backup-state-counties --log INFO

Data sets and groups

Everything above assumed you were already familiar with the U.S. Census data model and new what data set, group, and vintage you wanted. If you are looking for what data sets are available, you can find an up-to-date list of them in the censusdis repository in datasets.py. For example, the American Community Survey (ACS) 5-year data set we used in the examples above, is listed in that file as

ACS5 = "acs/acs5"

Alternatively, you can consult the demo notebook Querying Available Data Sets.ipynb in the censusdis repo for more information about how to find data sets and groups of variables that may be of interest to you.

More Help

There are additional arguments not discussed above. For a summary, please run

census-backup --help

If you have additional questions, feel free to open a discussion in this repository. If you find a bug or have a feature request, please open an issue.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.10

Feb 2, 2025

0.1.9

Feb 1, 2025

0.1.8

Feb 1, 2025

0.1.7

Feb 1, 2025

0.1.6

Feb 1, 2025

0.1.5

Feb 1, 2025

0.1.4

Feb 1, 2025

0.1.2

Feb 1, 2025

0.1.1

Feb 1, 2025

0.1.0

Feb 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

census_backup-0.1.10.tar.gz (6.7 kB view details)

Uploaded Feb 2, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

census_backup-0.1.10-py3-none-any.whl (7.4 kB view details)

Uploaded Feb 2, 2025 Python 3

File details

Details for the file census_backup-0.1.10.tar.gz.

File metadata

Download URL: census_backup-0.1.10.tar.gz
Upload date: Feb 2, 2025
Size: 6.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.11.4 Darwin/21.6.0

File hashes

Hashes for census_backup-0.1.10.tar.gz
Algorithm	Hash digest
SHA256	`2b6ffa9c2aec8cb56e1024ac9ff1f4d3db1ef0e379c86856f413d35bcb3516cc`
MD5	`21bdb1c4faf09f2ce09c90631945bfd5`
BLAKE2b-256	`eb4b3c41a02ad515a1cad5ae44910ea4eaef323d9d32007afbda99aaa75c45d6`

See more details on using hashes here.

File details

Details for the file census_backup-0.1.10-py3-none-any.whl.

File metadata

Download URL: census_backup-0.1.10-py3-none-any.whl
Upload date: Feb 2, 2025
Size: 7.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.11.4 Darwin/21.6.0

File hashes

Hashes for census_backup-0.1.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`349aa7a0ab8002f8f957dd96f2d2e7fb1e581ca6d2872f5f9a97e8cf55297e90`
MD5	`526f225e4209f9a2ab1270edd1575858`
BLAKE2b-256	`70cb2e16874191930a24753133351c70159a2b1ae5b0301030b799011ed39559`

See more details on using hashes here.

census_backup 0.1.10

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

censusdis/backup

Installation

Usage Examples

General help and overview

Download from a group across all available bulk geographies

Data Files

Metadata

Download multiple groups

Download geometries that have `state` as a component

Download state aggregated data only

Download county aggregated data within states only

Data sets and groups

More Help

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

census_backup 0.1.10

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

censusdis/backup

Installation

Usage Examples

General help and overview

Download from a group across all available bulk geographies

Data Files

Metadata

Download multiple groups

Download geometries that have state as a component

Download state aggregated data only

Download county aggregated data within states only

Data sets and groups

More Help

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Download geometries that have `state` as a component