Python library and CLI for loading and serving LCA data sources

Project description

LCA Data Provider - Database & Infrastructure

This module manages the storage, persistence, and lifecycle of the databases used by the centralized Life Cycle Assessment (LCA) Emission Factors library.

The architecture implements a dynamic multi-tenant pattern, allowing the library to connect to multiple separate databases isolated on demand without modifying or rebuilding the core source code.

⚠️ CRITICAL WARNING: PRODUCTION ENVIRONMENT ONLY ⚠️

Currently, the ONLY deployed environment is Production.

Therefore, the database and Redis credentials available in IdeCloud belong strictly to the production environment. Any action executed locally using these credentials (such as running Alembic migrations or executing data ingestions) will interact directly with live data and will immediately affect all applications and microservices consuming this library.

If you need a safe testing, staging, or local development environment to run tests or new ingestions, do not use the IdeCloud credentials. Instead, please contact the DevOps or IT team to request a dedicated sandbox environment.

If you have any questions about the library, please feel free to contact sergio.perezmartin@idener.ai.

Supported Databases & Versions

The following table tracks the Life Cycle Assessment (LCA) databases currently supported by the ingestion pipeline and their actively managed versions.

Provider / Database	Supported Versions	Description
BAFU	`2025`	Swiss Federal Office for the Environment (FOEN) emission factors.

Querying Emission Factors (Service Layer)

External microservices consume this library by importing the EmissionFactorService singleton. The service implements a cache-first strategy: Redis is always attempted first. On cache miss (e.g. after a Redis container restart), all records for that provider and version are fetched from PostgreSQL in a single query, the full cache is re-warmed automatically, and the requested data is returned transparently.

Installation

Add the library as a dependency in your microservice and import directly:

from lca_provider import emission_factor_service, EmissionFactorCategory, ProviderEmissionFactorDTO

Available Methods

`get_by_category` — Query a single category

dtos: list[ProviderEmissionFactorDTO] = await emission_factor_service.get_by_category(
    provider="bafu",
    version="2025",
    category=EmissionFactorCategory.material,
    name="concrete",       # optional: case-insensitive substring filter
    geography="CH",        # optional: exact match filter
)

`get_all` — Query all categories at once

dtos: list[ProviderEmissionFactorDTO] = await emission_factor_service.get_all(
    provider="bafu",
    version="2025",
    name="concrete",       # optional
    geography="CH",        # optional
)

`get_by_external_id` — Lookup a single record by UUID

Always queries PostgreSQL directly — does not use the Redis cache.

dto: ProviderEmissionFactorDTO | None = await emission_factor_service.get_by_external_id(
    provider="bafu",
    version="2025",
    external_id=uuid.UUID("..."),
)

Standard Categories (`EmissionFactorCategory`)

Value	Description
`material`	Raw and processed materials
`auxiliar_material`	Auxiliary/secondary materials
`energy_consumption`	Energy sources and consumption
`water_consumption`	Water usage
`air_emissions`	Direct air emissions
`direct_emissions`	Other direct emissions
`transport`	Transportation and logistics
`consumable`	Consumable items
`waste_treatment`	Waste processing and disposal
`manufacturing`	Manufacturing processes
`other`	Uncategorised factors

Redis Cache Architecture (Pre-warming)

To ensure ultra-low latency for consuming applications, the ETL pipeline implements a cache pre-warming strategy. Upon successful ingestion into PostgreSQL, the data is automatically mapped to standard DTOs and pushed to Redis under two key types:

Key Nomenclature:

ef:<provider_name>:<version>:category:<standard_category> — per-category key

ef:<provider_name>:<version>:all — flat list of all factors across all categories

Examples:

ef:bafu:2025:category:material
ef:bafu:2025:category:energy_consumption
ef:bafu:2025:all

Payload Structure: The value stored under each key is a stringified JSON array of ProviderEmissionFactorDTO objects.

Fallback behaviour: If Redis is unavailable or empty (e.g. after a container restart), the service fetches all records for the requested provider and version from PostgreSQL in a single query, re-warms all category keys and the all key, and returns the data to the caller. This is transparent to the consuming microservice.

Data Ingestion Command (Admin Only)

The ingestion CLI is an administrative tool intended exclusively for data engineers working directly in this repository. It is not exposed as an installable script in the distributed package, so there is no lca-cli command available in consuming microservices. However, the lca_provider.cli.main module is still part of the distributed package and can technically be invoked via python -m lca_provider.cli.main by anyone with access to the package and the required credentials. There is no code-level restriction — the protection is purely operational.

To run an ingestion, execute from the repository root with the virtual environment active:

python -m lca_provider.cli.main ingest <source> <filename> --version <supported_version> --mode <clean|upsert> [--log-level <DEBUG|INFO|WARNING|ERROR>]

Parameters

source: target source/database identifier (example: bafu).
filename: input file name located in lca_provider/ingestors/resources.
--version (required): dataset version to ingest (must match a supported version for the provider, example: 2025).
--mode (optional, default: clean):
- clean: deletes existing rows for that version, then inserts all rows from the file.
- upsert: updates existing rows by provider_code and inserts missing ones.
--log-level (optional, default: INFO): controls runtime verbosity.

File Location

Place ingestion files in:

lca_provider/ingestors/resources/

BAFU example:

python -m lca_provider.cli.main ingest bafu BAFU_2025.xlsx --version 2025 --mode clean

Failed Rows Report

If invalid rows are detected, a report is generated in the same resources folder with this format:

bafu_ingestion_failures_YYYYMMDD_HHMMSS.txt

The report includes Excel row number, source product, and failure reason for each skipped row.

Database Migration Workflow (Alembic)

This project uses Alembic with dynamic database selection. The target database is selected at runtime with the flag -x db=<database_name>.

Prerequisites

Ensure the following environment variables are defined before running Alembic:

LCA_DATA_PROVIDER_DB_HOST=<host>
LCA_DATA_PROVIDER_DB_USER=<user>
LCA_DATA_PROVIDER_DB_PASSWORD=<password>
LCA_DATA_PROVIDER_DB_PORT=<port>
LCA_DATA_PROVIDER_REDIS_HOST=<host>
LCA_DATA_PROVIDER_REDIS_PORT=<port>
LCA_DATA_PROVIDER_REDIS_USER=<user>
LCA_DATA_PROVIDER_REDIS_PASSWORD=<password>
LCA_DATA_PROVIDER_REDIS_DB=<db>

Even though Redis is not directly used by Alembic migrations, Redis settings are still required because they are part of the validated application configuration schema.

1. Configure allowed sources in `.env`

Before generating or applying migrations, the target database must be included in LCA_DATA_PROVIDER_ALLOWED_SOURCES.

Single source example:

LCA_DATA_PROVIDER_ALLOWED_SOURCES=bafu

Multiple sources example (comma-separated):

LCA_DATA_PROVIDER_ALLOWED_SOURCES=bafu,ecoinvent,my_other_source

Notes:

Values are split by commas and trimmed.
Source names are normalized to lowercase.
The db value passed through -x db=... is not normalized, so use lowercase names to avoid mismatches.
If all is present in the allowed sources list, any target database is accepted.
If the selected db is not in the allowed sources list (and all is not present), Alembic execution fails.

2. Create a new migration (autogenerate)

Run:

alembic -x db=bafu revision --autogenerate -m "your_migration_name"

Where:

-x db=bafu indicates which database/source you are targeting.
--autogenerate compares SQLAlchemy models vs current schema to create migration operations.
-m "your_migration_name" sets the migration message.

3. Apply the migration

Run:

alembic -x db=bafu upgrade head

This upgrades the selected database to the latest migration (head).

4. Recommended execution sequence

alembic -x db=<source_name> revision --autogenerate -m "<migration_name>"
alembic -x db=<source_name> upgrade head

Example:

alembic -x db=bafu revision --autogenerate -m "add_new_column_to_emission_factors"
alembic -x db=bafu upgrade head

5. Common validation checks

Confirm .env contains the target source in LCA_DATA_PROVIDER_ALLOWED_SOURCES.
Confirm the db value passed in -x db=<...> matches the intended database name.
Review generated migration scripts before running upgrade head.
Do not omit -x db=<source_name>; migrations fail immediately if no target database is provided.

Extending the Ingestion Pipeline (Adding a New Provider)

The ingestion pipeline is designed to be extended with new data sources without modifying any existing code. It uses two patterns: a Template Method (BaseIngester) that enforces a fixed ETL sequence, and a Factory (IngesterFactory) that maps provider names to their ingester classes via a decorator.

The following steps describe how to add a new provider from scratch.

Step 1 — Create the provider directory

lca_provider/ingestors/providers/<provider_name>/
    <provider_name>_ingester.py
    <provider_name>_mapping.py

Replace <provider_name> with a lowercase identifier (e.g. ecoinvent). This name must match exactly the value used in the CLI and in LCA_DATA_PROVIDER_ALLOWED_SOURCES.

Step 2 — Define the category mapping

Create <provider_name>_mapping.py with a CATEGORY_MAPPING dictionary that translates the provider's raw category strings to the 11 standard EmissionFactorCategory values. Keys must be lowercase.

from lca_provider.dtos.emission_factor_dto import EmissionFactorCategory

CATEGORY_MAPPING: dict[str, EmissionFactorCategory] = {
    "electricity": EmissionFactorCategory.energy_consumption,
    "transport, freight": EmissionFactorCategory.transport,
    "municipal waste": EmissionFactorCategory.waste_treatment,
    # ... map every raw category from the source
}

Any raw category not present in the mapping will automatically fall back to EmissionFactorCategory.other.

Step 3 — Implement the ingester

Create <provider_name>_ingester.py extending BaseIngester. Register it with the factory using the @IngesterFactory.register decorator. Only two methods are mandatory:

_extract_data() — async generator that yields raw data in batches (list of dicts). Batch size of 1000 rows is recommended.
_transform_batch() — receives one batch and returns a list of EmissionFactorEntity objects.

from typing import Any, AsyncGenerator, List
from lca_provider.db.models import EmissionFactorEntity
from lca_provider.ingestors.base_ingester import BaseIngester
from lca_provider.ingestors.ingester_factory import IngesterFactory

@IngesterFactory.register("ecoinvent")
class EcoinventIngester(BaseIngester):

    async def _extract_data(self) -> AsyncGenerator[List[dict], None]:
        # Read source file in chunks and yield each chunk as a list of dicts.
        # Example: CSV, Excel, JSON, XML, etc.
        ...
        yield batch

    def _transform_batch(self, raw_batch: List[dict]) -> List[EmissionFactorEntity]:
        entities = []
        for row in raw_batch:
            entity = EmissionFactorEntity(
                name=...,
                version=self.version,
                raw_category=...,
                subcategories=[],
                unit=...,
                gwp_value=...,
                additional_impacts={},
                extra_data={},
            )
            entities.append(entity)
        return entities

Optional overrides:

_generate_provider_code(entity) — override if the default SHA-256 hash of name|raw_category|unit|version is not unique enough for the source (e.g. BAFU adds geography to the hash).
_after_execute(imported_rows, elapsed_seconds) — post-ingestion hook for cache lifecycle, failure reports, or any other side effects. See BafuIngester._after_execute() for the reference implementation including Redis invalidation and pre-warming.

Step 4 — Add the provider to allowed sources

Add the new provider name to LCA_DATA_PROVIDER_ALLOWED_SOURCES in .env:

LCA_DATA_PROVIDER_ALLOWED_SOURCES=bafu,ecoinvent

Step 5 — Create and apply the database migration

Each provider has its own isolated PostgreSQL database. Run Alembic targeting the new provider name to create its schema:

alembic -x db=ecoinvent revision --autogenerate -m "initial_emission_factors_table"
alembic -x db=ecoinvent upgrade head

Step 6 — Place the source file and run the ingestion

Place the source data file in lca_provider/ingestors/resources/ and run:

python -m lca_provider.cli.main ingest ecoinvent <filename> --version <version> --mode clean

Summary checklist

Step	Action
1	Create `lca_provider/ingestors/providers/<name>/` directory
2	Define `CATEGORY_MAPPING` in `<name>_mapping.py`
3	Implement `<name>_ingester.py` extending `BaseIngester`, register with `@IngesterFactory.register("<name>")`
4	Add `<name>` to `LCA_DATA_PROVIDER_ALLOWED_SOURCES` in `.env`
5	Run Alembic migrations targeting the new database
6	Place source file in `resources/` and execute the ingestion CLI

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Jul 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lca_data_provider-0.1.0.tar.gz (66.5 kB view details)

Uploaded Jul 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lca_data_provider-0.1.0-py3-none-any.whl (26.8 kB view details)

Uploaded Jul 3, 2026 Python 3

File details

Details for the file lca_data_provider-0.1.0.tar.gz.

File metadata

Download URL: lca_data_provider-0.1.0.tar.gz
Upload date: Jul 3, 2026
Size: 66.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for lca_data_provider-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`cdcb183ef2695858f19664b8a2790fc67c65a441d0bb82050dd61e70325c5f91`
MD5	`3b2c25a31f3db47de3394b72b531eb86`
BLAKE2b-256	`e2744e027fda09e1e7aff8514562af7204119c30b9b7a5f94eb06a9e6363ffd9`

See more details on using hashes here.

File details

Details for the file lca_data_provider-0.1.0-py3-none-any.whl.

File metadata

Download URL: lca_data_provider-0.1.0-py3-none-any.whl
Upload date: Jul 3, 2026
Size: 26.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for lca_data_provider-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`267bc782a6c32dad0f4001ffefbc77242bea05c51e09d111e5923005bbd6fc5a`
MD5	`32915ab6201cedc75a78ff0c3780e67a`
BLAKE2b-256	`92e2867bbc608971c4ddd8089e96076cb096afc7ae6ac6303d4fb6ab9d07021d`

See more details on using hashes here.

lca-data-provider 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

LCA Data Provider - Database & Infrastructure

⚠️ CRITICAL WARNING: PRODUCTION ENVIRONMENT ONLY ⚠️

Supported Databases & Versions

Querying Emission Factors (Service Layer)

Installation

Available Methods

get_by_category — Query a single category

get_all — Query all categories at once

get_by_external_id — Lookup a single record by UUID

Standard Categories (EmissionFactorCategory)

Redis Cache Architecture (Pre-warming)

Data Ingestion Command (Admin Only)

Parameters

File Location

Failed Rows Report

Database Migration Workflow (Alembic)

Prerequisites

1. Configure allowed sources in .env

2. Create a new migration (autogenerate)

3. Apply the migration

4. Recommended execution sequence

5. Common validation checks

Extending the Ingestion Pipeline (Adding a New Provider)

Step 1 — Create the provider directory

Step 2 — Define the category mapping

Step 3 — Implement the ingester

Step 4 — Add the provider to allowed sources

Step 5 — Create and apply the database migration

Step 6 — Place the source file and run the ingestion

Summary checklist

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`get_by_category` — Query a single category

`get_all` — Query all categories at once

`get_by_external_id` — Lookup a single record by UUID

Standard Categories (`EmissionFactorCategory`)

1. Configure allowed sources in `.env`