A python tool to process the Censored QCEW for PR
Project description
QCEW Data Processing Tool
This tool is part of a collaboration between the University of Puerto Rico, Mayaguez, and Puerto Rico's Planning Board. Its main objective is to convert raw QCEW data into a DuckDB database with a geographic (geom) column that stores the coordinates of businesses.
Overview
The script takes raw Quarterly Census Employment and Wages (QCEW) data from the data/raw directory, processes it, and stores it in a DuckDB database. The raw data should be organized in subfolders by year (e.g., data/raw/2002, data/raw/2003), with each year folder containing the data for each quarter. The resulting database will be available in the data directory with a .ddb extension.
This tool also incorporates geospatial data, using latitude and longitude values to create a point geometry (geom) for each business, allowing for geographic queries.
Requirements
To run this tool, you will need the following Python packages:
ibisduckdbpolarsgeopandastqdmrequestsjsonloggingpandas
You can install the necessary dependencies using:
pip install -r requirements.txt
Or utilize the uv to ensure compatibility
uv sync
File Structure
The data should be organized in the following structure:
data/
├── raw/
│ ├── 2002/
│ ├── 2003/
│ └── ...
├── processed/
├── external/
│ └── decode.json
└── data.ddb
data/raw/: This directory contains the raw QCEW data, organized by year and quarter.data/processed/: This directory is for storing processed data.data/external/: This directory contains external files, includingdecode.json, which is required for decoding the raw data files.data.ddb: The output DuckDB database containing the processed data.
How It Works
-
Initialization: The script checks for necessary directories (raw, processed, external) and creates them if they don't exist. It also downloads external files, such as
decode.json, if not already present. This file holds the Census codification of the data. -
Data Processing:
- The tool reads raw data files, cleans them, and extracts relevant fields based on predefined column widths defined in
decode.json. - The cleaned data includes geographic coordinates (latitude and longitude), which are then transformed into a
geomcolumn of typePoint. - This processed data is inserted into a DuckDB database.
- The tool reads raw data files, cleans them, and extracts relevant fields based on predefined column widths defined in
-
Group and Aggregate Data:
- The data is grouped by
NAICScode (4-digit), year, and quarter, aggregating information such as total wages and total employment. - Additional calculations are performed for contributions to the social security, Medicare, and other funds.
- The data is grouped by
-
Joining with External Data:
- The tool also allows for joining the QCEW data with external data (e.g.,
hactable) based onNAICScodes, facilitating further analysis.
- The tool also allows for joining the QCEW data with external data (e.g.,
Key Functions
make_qcew_dataset: Processes all the raw QCEW data and inserts it into the DuckDB database.clean_txt: Cleans and formats the raw text data, extracting relevant fields and generating geographic information.group_by_naics_code: Groups data byNAICScode and aggregates the total wages and employment.unique_naics_code: Joins the grouped QCEW data with external data based on theNAICScode.pull_file: Downloads external files from a given URL (e.g.,decode.json).
Usage
- Organize your raw QCEW data by year and quarter in the
data/raw/folder. - Ensure that
decode.jsonis in thedata/external/folder. - Run the script to process the data:
python main.py
Logging
The script logs key events and warnings to a file called data_process.log. This includes information about successfully processed files, warnings for empty files, and other runtime details.
License
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for more details.
Contributing
Contributions to this tool are welcome. Please fork the repository and submit a pull request with any improvements or bug fixes.
If you have any questions or need further assistance, feel free to reach out!
Cite
@software{ouslan2026jpqcew,
author = {Ouslan, Alejandro},
title = {JP-QCEW},
month = jan,
year = 2026,
publisher = {Zenodo},
version = {3.0.1},
doi = {10.5281/zenodo.18121581},
url = {https://doi.org/10.5281/zenodo.18121581}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jp_qcew-3.2.1.tar.gz.
File metadata
- Download URL: jp_qcew-3.2.1.tar.gz
- Upload date:
- Size: 153.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Rocky Linux","version":"9.7","id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8c81e1b91315e5fd1552cf059f1f10b44955d0d0c4c884591176fe39c02a6974
|
|
| MD5 |
216b3b4232492da8181ee2deb6af1231
|
|
| BLAKE2b-256 |
ae1c1ed41cc416d798fe702de7440dc3c69124955dcda82e34815c3d782e2015
|
File details
Details for the file jp_qcew-3.2.1-py3-none-any.whl.
File metadata
- Download URL: jp_qcew-3.2.1-py3-none-any.whl
- Upload date:
- Size: 22.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Rocky Linux","version":"9.7","id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da2376944dbadf810230b016972caacd67a714f9e617a6279dc559a9cace32cc
|
|
| MD5 |
2e80aa095acee59821ed3e43c5541f64
|
|
| BLAKE2b-256 |
be9ec333079e1510b732401cc1fa1efe164898b7fd2871182a53071d1f58cf4e
|