Skip to main content

YAML-driven ETL mappers for ForSITE soil database imports.

Project description

SOIL - Structured Observation Ingestion Library

YAML-driven ETL mapper package for importing ForSITE soil datasets into the ForSITE PostGIS database model.

This package is being split out of forsite-soil-db-interface so the mapper layer can be versioned, tested, and published independently on PyPI.

Acknowledgements

Project Partners

This project was developed through joint funding provided by Agriculture and Agri-Food Canada (AAFC) under the CSBO program and Natural Resources Canada (NRCan) for the ForSITE-Soil degradation project.

The code developed as part of this project includes contributions from:

Target Public API

Basic Usage of the YAML Mapper

from pathlib import Path

from soil_etl.yaml_mapper import YamlDatasetMapper

mapper = YamlDatasetMapper(
    config_path=Path("examples/configs/example_config.yaml"),
    data_file_path=Path("data.xlsx"),
)
mapper.import_data()

For more details on the YAML format, refer to the YAML Mapper documentation.

Database Connection Configuration

Database Connection String

Database connection details can be provided when running the import. By default, the mapper still uses the existing environment/config lookup, but a call can now override it explicitly:

mapper.import_data(
    database_connection_string="postgresql+psycopg2://user:password@host:5432/database",
    database_engine_kwargs={"pool_size": 10, "pool_pre_ping": True},
)

Hydra Database Configuration Files

When using Hydra database configuration files instead of a direct connection string, pass the config directory and config name:

mapper.import_data(
    database_config_path="configs",
    database_config_name="configs.yaml",
)

The config directory can also be provided through the CONFIG_PATH environment variable:

export CONFIG_PATH=configs
export DB_USER=user
export DB_PASSWORD=password

and then called in the import:

mapper.import_data(database_config_name="configs.yaml")

or entierly in python:

import os

os.environ['CONFIG_PATH'] = "./path_to_config_dir"
os.environ['DB_USER'] = "user"
os.environ['DB_PASSWORD'] = "password"
mapper.import_data(database_config_name="configs.yaml")

YAML Mapper Configuration examples

The YAML mapper configuration is a YAML file that defines the mapping between the data file columns and the database table columns. Detailled documentation of the YAML format is available in README_yaml_format_en.md.

YAML mapper configuration examples are available in examples/configs/example_config.yaml. Packaged reference files are also kept in src/soil_etl/yaml_mapper/template.yaml and src/soil_etl/yaml_mapper/examples/on_master.yaml.

Package Status

The package is in alpha and the extractor implementation has been copied into src/soil_etl. The current tree includes the YAML mapper, binding helpers, dataclasses, enums, staging models, database interface, and ORM models needed by the migrated tests.

The remaining packaging work is to finish aligning the public import surface, tests, and documentation with the final package namespace. Some older project notes may still refer to the previous forsite_soil_extractors migration target.

Development

poetry install --with dev,test
poetry run pytest

The migrated data extractor tests are documented in docs/testing.md.

Canonical Imports

from soil_etl.yaml_mapper import YamlDatasetMapper
from soil_etl.bindings import Binding, FromColumn
from soil_etl.db import ImportationInterface

The canonical package namespace is soil_etl. Older migration notes may still refer to forsite_soil_extractors, but new code should use only soil_etl imports.

Build

python -m build
twine check dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soil_etl-0.1.0a2.tar.gz (100.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

soil_etl-0.1.0a2-py3-none-any.whl (154.8 kB view details)

Uploaded Python 3

File details

Details for the file soil_etl-0.1.0a2.tar.gz.

File metadata

  • Download URL: soil_etl-0.1.0a2.tar.gz
  • Upload date:
  • Size: 100.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for soil_etl-0.1.0a2.tar.gz
Algorithm Hash digest
SHA256 62343b74fa5a2624ea88e583b41cacd4fc154cf91d784cc24f20e70e4317b786
MD5 f6ad4b62716ada0d5f69deed23e80867
BLAKE2b-256 a0206030a2db60c2fc4f3bace80dd6b4ecb869ed963ba108ffafad52d6631116

See more details on using hashes here.

File details

Details for the file soil_etl-0.1.0a2-py3-none-any.whl.

File metadata

  • Download URL: soil_etl-0.1.0a2-py3-none-any.whl
  • Upload date:
  • Size: 154.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for soil_etl-0.1.0a2-py3-none-any.whl
Algorithm Hash digest
SHA256 d1e3b59e0d49ec8b25bb7c483f98ff68d48ed878f28874f5fb247ff99cae5f7a
MD5 23f406c6ed13ac7f62a492c7a62276ee
BLAKE2b-256 f92ec095a38e0a7da520cc297bac367336dd27c057ef2efee4fc96f4550000f9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page