Tools for generating Parquet files from Census 2020
Project description
census-parquet
Python tools for creating and maintaining Parquet files from US 2020 Census Data.
Install Dependencies
These tools utilize several dependencies.
To utilize the data download shell script files install wget and lftp.
For the python scripts the following dependencies should be installed:
Usage
The scripts should be run in the following order.
First, run the three shell scripts which download all the data needed for running the python scripts:
download_boundaries.sh
- This script downloads the Census Boundary data needed to run boundary_processing.pydownload_population_stats.sh
- This script downloads population stat data needed for process_blocks.pydownload_blocks.sh
- This script downloads the Census Block data needed to run process_blocks.py
After running the shell scripts you can then run the python scripts:
boundary_processing.py
- This script processes the Census Boundary data and creates parquet files. The parquet files will be output into aboundary_outputs
folder.process_blocks.py
- This script processes Census Block data and creates parquet files. The final combined parquet file will have the nametl_2020_FULL_tabblock20.parquet
.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
census-parquet-0.0.3.tar.gz
(4.8 kB
view hashes)
Built Distribution
Close
Hashes for census_parquet-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 262b0d0195cd663eb55f26b9d62bb027744c07d7bf8b0c8816ab3e4c1a90394e |
|
MD5 | 3a461e4b806b61e12de6078e3d1c744e |
|
BLAKE2b-256 | 299e8c755b60994f39dd2ecf63dfda75723a5f4d0c7969d79f0b13b95bf936b8 |