Basic code to obtain probability distribution functions for EoL using TRI data

These details have not been verified by PyPI

Project description

TRI4PLADS (FOCAPD SI)

Project Logo

Overview

This repository contains the code to generate discrete distribution based on TRI data, as part of the FOCAPD 2024 Special Issue invitation.

Project tree

.
├── ancillary
│   ├── cd_is_to_naics.csv
│   ├── tri_file_1a_columns.txt
│   ├── tri_file_1b_columns.txt
│   ├── tri_file_3a_columns.txt
│   └── tri_file_3c_columns.txt
├── conf
│   └── main.yaml
├── tests
├── data
│   ├── processed
│   │   └── tri_eol_additives.sqlite
│   └── raw
│       ├── US_1a_2022.txt
│       ├── US_1b_2022.txt
│       ├── US_3a_2022.txt
│       └── US_3c_2022.txt
└──  src
    ├── __init__.py
    ├── data_processing
    │   ├── __init__.py
    │   ├── create_sqlite_db.py
    │   ├── data_models.py
    │   ├── frs_api_queries.py
    │   ├── base.py
    │   ├── main.py
    │   ├── naics_api_queries.py
    │   └── cdr
    │   │   ├── __init__.py
    │   │   ├── cleaner.py
    │   │   ├── load.py
    │   │   └── orchestator.py
    │   └── tri
    │       ├── __init__.py
    │       ├── load
    │       │   ├── __init__.py
    │       │   └── load.py
    │       ├── orchestator.py
    │       ├── transform
    │       │   ├── __init__.py
    │       │   ├── base.py
    │       │   ├── file_1a.py
    │       │   ├── file_1b.py
    │       │   ├── file_3a.py
    │       │   └── file_3c.py
    │       └── utils.py
    └── generate_analysis
        ├── __init__.py
        ├── main.py
        ├── db_queries.py
        └── interactive_cli.py

Entity relational diagram (ERD)

Project Logo

Requirements

Python >=3.12, <3.13
Poetry

Poetry

New Dependencies

When adding or updating dependencies, run poetry add or poetry update and commit the changes.

pull

When pulling the latest changes, run the following command to ensure that your local environment matches the project's dependencies.

poetry install

Run Commands

To execute commands inside the project's environment, use run as follows:

poetry run python src/main.py

Additionally, you can activate the virtual environment by running the following command:

poetry shell

Pre-commit

Changes

If there is any change in .pre-commit-config.yaml, the following command has to be run:

poetry run pre-commit autoupdate

Pull

Each time you pull changes, run the following command to ensure your local environment is up-to-date:

poetry run pre-commit install

Manually Run Hooks

To manually run all pre-commit hooks on all files in the repository, use the following command:

poetry run pre-commit run --all-files

Note: this is not required when you commit changes.

If you are running the above command or committing your changes, and one or more hooks like black or isort fail, stage their modifications to the git staging area by running git add. After that, you can run commit again.

Installing pyright language server for IDE typecheck highlighting

Detailed instructions: pyright

Pycharm

VSCode: search for Pylance on marketplace

Add path to executable to plugin:

which pyright-langserver

Insert that path to plugin config in your IDE as path to executable

Documentation Style

The project follows the Google style to document the code. The pre-commit hooks are configured to check this style.

Data Source and Processing

Census Bureau Data:

Get your API key in: link

Once you get your API key, include a .env file in the project root with the following:

CENSUS_DATA_API_KEY=<YOUR-CENSUS-DATA-API-KEY>

Replace <YOUR-CENSUS-DATA-API-KEY> with your actual API key.

For more information regarding the API data: link

U.S. EPA's Envirofacts

API documentation: link

Running the Data Processing Pipeline

This repository includes a data processing pipeline for handling TRI (Toxics Release Inventory) data, specifically focusing on plastic additives. The pipeline can be executed by specifying the year of data you want to process.

Running the Script

To run the data processing pipeline, navigate to the repository's main directory and execute the following command, replacing <year> with the desired year (e.g., 2022) and <bool> with True/False:

python src/data_processing/main.py --year <year> --is_drop_nan_percentage <bool>

See the help menu:

python src/data_processing/main.py --help

Changes to the database

If you generate changes to the database schema, create migrations by running:

alembic revision --autogenerate -m "<description-string>"

Then apply the migrations by running:

alembic upgrade head

TODO

TRI data retrieval

The TRI data is static and not dynamic. Due to file size and scalability feel free to automatize this process. Suggestions:

Implement TRI data retrieval from EPA's Envirofacts API.
Implement the web scrapping strategy like in EoL4Chem repository.

Feel free to modularize more the project tree for scalability and mantainability.

SQL database engine

If you will modify the db engine (e.g., PostgreSQL) or name, feel free to include this information in the config file instead of hard coding it since it would be less error prone.

Feel free to use asyncronous queries to reduce the processing time.

Testing

Feel free to use unit or integration testing for QA. As suggestion, include it as a hook in the pre-commit file. Only smoke testing was used in the development of this project and there is not coverage yet.

Data orchestator

Feel free to use a data orchestator like Airflow or Prefect. This would be more important if you try to increase the data volume.

Note

The project structure follows a modular approach to facilitate the expansion and mantainability. In addition, it follows a single responsability principle and separation of concern. Keep this principle as part of good practices and clean code.

PyPI

The project was released as a Python packaged in PyPI.

Disclaimer

The views expressed in this article are those of the authors and do not necessarily represent the views or policies of the U.S. Environmental Protection Agency. Any mention of trade names, products, or services does not imply an endorsement by the U.S. Government or the U.S. Environmental Protection Agency. The U.S. Environmental Protection Agency does not endorse any commercial products, service, or enterprises.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.1

Nov 17, 2024

This version

0.1.0

Nov 16, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tri4plads-0.1.0.tar.gz (40.6 kB view details)

Uploaded Nov 16, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tri4plads-0.1.0-py3-none-any.whl (51.6 kB view details)

Uploaded Nov 16, 2024 Python 3

File details

Details for the file tri4plads-0.1.0.tar.gz.

File metadata

Download URL: tri4plads-0.1.0.tar.gz
Upload date: Nov 16, 2024
Size: 40.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.4 CPython/3.12.7 Linux/6.5.0-1025-azure

File hashes

Hashes for tri4plads-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ec56210159da01f84e8f227e4a3a1e93036bb6a18dad8849d932339a17c2b7f5`
MD5	`27d534995e39c3a9248ca02581288a4c`
BLAKE2b-256	`4c1478c37d7c5e0d2e341de82ef17a18bc36730390fae93874ca43343e30fa76`

See more details on using hashes here.

File details

Details for the file tri4plads-0.1.0-py3-none-any.whl.

File metadata

Download URL: tri4plads-0.1.0-py3-none-any.whl
Upload date: Nov 16, 2024
Size: 51.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.4 CPython/3.12.7 Linux/6.5.0-1025-azure

File hashes

Hashes for tri4plads-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`979cb195c9a31a4b9e6725992c4fb61f0bac3c6b649ab24682b48783fc891f88`
MD5	`ead166803f4ea42e2e927eafdd68ca64`
BLAKE2b-256	`ee24da26dfb6fdf79fe20c3a584c458da8c73ccab80c89ce93d29ec3915fafb2`

See more details on using hashes here.

tri4plads 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

TRI4PLADS (FOCAPD SI)

Overview

Project tree

Entity relational diagram (ERD)

Requirements

Poetry

New Dependencies

pull

Run Commands

Pre-commit

Changes

Pull

Manually Run Hooks

Installing pyright language server for IDE typecheck highlighting

Documentation Style

Data Source and Processing

Census Bureau Data:

U.S. EPA's Envirofacts

Running the Data Processing Pipeline

Running the Script

Changes to the database

TODO

TRI data retrieval

SQL database engine

Testing

Data orchestator

Note

PyPI

Disclaimer

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes