Basic code to obtain probability distribution functions for EoL using TRI data
Project description
TRI4PLADS (FOCAPD SI)
Overview
This repository contains the code to generate discrete distribution based on TRI data, as part of the FOCAPD 2024 Special Issue invitation.
Project tree
.
├── ancillary
│ ├── cd_is_to_naics.csv
│ ├── tri_file_1a_columns.txt
│ ├── tri_file_1b_columns.txt
│ ├── tri_file_3a_columns.txt
│ └── tri_file_3c_columns.txt
├── conf
│ └── main.yaml
├── tests
├── data
│ ├── processed
│ │ └── tri_eol_additives.sqlite
│ └── raw
│ ├── US_1a_2022.txt
│ ├── US_1b_2022.txt
│ ├── US_3a_2022.txt
│ └── US_3c_2022.txt
└── src
├── __init__.py
├── data_processing
│ ├── __init__.py
│ ├── create_sqlite_db.py
│ ├── data_models.py
│ ├── frs_api_queries.py
│ ├── base.py
│ ├── main.py
│ ├── naics_api_queries.py
│ └── cdr
│ │ ├── __init__.py
│ │ ├── cleaner.py
│ │ ├── load.py
│ │ └── orchestator.py
│ └── tri
│ ├── __init__.py
│ ├── load
│ │ ├── __init__.py
│ │ └── load.py
│ ├── orchestator.py
│ ├── transform
│ │ ├── __init__.py
│ │ ├── base.py
│ │ ├── file_1a.py
│ │ ├── file_1b.py
│ │ ├── file_3a.py
│ │ └── file_3c.py
│ └── utils.py
└── generate_analysis
├── __init__.py
├── main.py
├── db_queries.py
└── interactive_cli.py
Entity relational diagram (ERD)
Requirements
- Python >=3.12, <3.13
- Poetry
Poetry
New Dependencies
When adding or updating dependencies, run poetry add or poetry update and commit the changes.
pull
When pulling the latest changes, run the following command to ensure that your local environment matches the project's dependencies.
poetry install
Run Commands
To execute commands inside the project's environment, use run as follows:
poetry run python src/main.py
Additionally, you can activate the virtual environment by running the following command:
poetry shell
Pre-commit
Changes
If there is any change in .pre-commit-config.yaml, the following command has to be run:
poetry run pre-commit autoupdate
Pull
Each time you pull changes, run the following command to ensure your local environment is up-to-date:
poetry run pre-commit install
Manually Run Hooks
To manually run all pre-commit hooks on all files in the repository, use the following command:
poetry run pre-commit run --all-files
Note: this is not required when you commit changes.
If you are running the above command or committing your changes, and one or more hooks like black or isort fail, stage their modifications to the git staging area by running git add. After that, you can run commit again.
Installing pyright language server for IDE typecheck highlighting
Detailed instructions: pyright
VSCode: search for Pylance on marketplace
Add path to executable to plugin:
which pyright-langserver
Insert that path to plugin config in your IDE as path to executable
Documentation Style
The project follows the Google style to document the code. The pre-commit hooks are configured to check this style.
Data Source and Processing
Census Bureau Data:
Get your API key in: link
Once you get your API key, include a .env file in the project root with the following:
CENSUS_DATA_API_KEY=<YOUR-CENSUS-DATA-API-KEY>
Replace <YOUR-CENSUS-DATA-API-KEY> with your actual API key.
For more information regarding the API data: link
U.S. EPA's Envirofacts
API documentation: link
Running the Data Processing Pipeline
This repository includes a data processing pipeline for handling TRI (Toxics Release Inventory) data, specifically focusing on plastic additives. The pipeline can be executed by specifying the year of data you want to process.
Running the Script
To run the data processing pipeline, navigate to the repository's main directory and execute the following command, replacing <year> with the desired year (e.g., 2022) and <bool> with True/False:
python src/data_processing/main.py --year <year> --is_drop_nan_percentage <bool>
See the help menu:
python src/data_processing/main.py --help
Changes to the database
If you generate changes to the database schema, create migrations by running:
alembic revision --autogenerate -m "<description-string>"
Then apply the migrations by running:
alembic upgrade head
Data Use
Installation
If you only want to use the data and take advantage of the existing code, you can install the tri4plads:
pip install tri4plads
Ensure you have Python >=3.12, <3.13.
Example
TODO
TRI data retrieval
The TRI data is static and not dynamic. Due to file size and scalability feel free to automatize this process. Suggestions:
- Implement TRI data retrieval from EPA's Envirofacts API.
- Implement the web scrapping strategy like in EoL4Chem repository.
Feel free to modularize more the project tree for scalability and mantainability.
SQL database engine
If you will modify the db engine (e.g., PostgreSQL) or name, feel free to include this information in the config file instead of hard coding it since it would be less error prone.
Feel free to use asyncronous queries to reduce the processing time.
Testing
Feel free to use unit or integration testing for QA. As suggestion, include it as a hook in the pre-commit file. Only smoke testing was used in the development of this project and there is not coverage yet.
Data orchestator
Feel free to use a data orchestator like Airflow or Prefect. This would be more important if you try to increase the data volume.
Note
The project structure follows a modular approach to facilitate the expansion and mantainability. In addition, it follows a single responsability principle and separation of concern. Keep this principle as part of good practices and clean code.
PyPI
The project was released as a Python packaged in PyPI.
Disclaimer
The views expressed in this article are those of the authors and do not necessarily represent the views or policies of the U.S. Environmental Protection Agency. Any mention of trade names, products, or services does not imply an endorsement by the U.S. Government or the U.S. Environmental Protection Agency. The U.S. Environmental Protection Agency does not endorse any commercial products, service, or enterprises.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tri4plads-0.1.1.tar.gz.
File metadata
- Download URL: tri4plads-0.1.1.tar.gz
- Upload date:
- Size: 13.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.12.7 Linux/6.5.0-1025-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ecb0b71c3c26ad906978b94572c5c44ae314850eed7e1d1dba693578b2bce8b1
|
|
| MD5 |
e10e739ef04036781ca23e11017f9969
|
|
| BLAKE2b-256 |
5e9472341fc39855611787f7ff2f36c3ac8b11589f773a8b460073207fd94f2c
|
File details
Details for the file tri4plads-0.1.1-py3-none-any.whl.
File metadata
- Download URL: tri4plads-0.1.1-py3-none-any.whl
- Upload date:
- Size: 11.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.12.7 Linux/6.5.0-1025-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ccd60e6a8473d5c992bfd7b0952e769918761f706e04f3cb57aa799f7e3c383
|
|
| MD5 |
addf7921aa79e4a6ef36505bad8ff6ba
|
|
| BLAKE2b-256 |
79a4a410edc3b2825640d25a5143c12b2864c62817c042cba42bbcd03a5793bd
|