Skip to main content

Tools for generating Parquet files from Census 2020

Project description

census-parquet

Python tools for creating and maintaining Parquet files from US 2020 Census Data.

Install Dependencies

These tools utilize several dependencies.

To utilize the data download shell script files install wget and lftp.

For the python scripts the following dependencies should be installed:

  1. dask
  2. dask_geopandas
  3. geopandas
  4. numpy
  5. openpyxl
  6. pandas
  7. pyarrow

Usage

The scripts should be run in the following order.

First, run the three shell scripts which download all the data needed for running the python scripts:

  1. download_boundaries.sh - This script downloads the Census Boundary data needed to run boundary_processing.py
  2. download_population_stats.sh - This script downloads population stat data needed for process_blocks.py
  3. download_blocks.sh - This script downloads the Census Block data needed to run process_blocks.py

After running the shell scripts you can then run the python scripts:

  1. boundary_processing.py - This script processes the Census Boundary data and creates parquet files. The parquet files will be output into a boundary_outputs folder.
  2. process_blocks.py - This script processes Census Block data and creates parquet files. The final combined parquet file will have the name tl_2020_FULL_tabblock20.parquet.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

census-parquet-0.0.1.tar.gz (4.2 kB view hashes)

Uploaded Source

Built Distribution

census_parquet-0.0.1-py3-none-any.whl (5.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page