Skip to main content

YAML-driven ETL mappers for ForSITE soil database imports.

Project description

SOIL - Structured Observation Ingestion Library

YAML-driven ETL mapper package for importing ForSITE soil datasets into the ForSITE PostGIS database model.

This package is being split out of forsite-soil-db-interface so the mapper layer can be versioned, tested, and published independently on PyPI.

Acknowledgements

This project was developed through joint funding provided by Agriculture and Agri-Food Canada (AAFC) under the CSBO program and Natural Resources Canada (NRCan) for the ForSITE-Soil degradation project.

The code developed as part of this project includes contributions from:

Target Public API

Basic Usage of the YAML Mapper

from pathlib import Path

from soil_etl.yaml_mapper import YamlDatasetMapper

mapper = YamlDatasetMapper(
    config_path=Path("examples/configs/example_config.yaml"),
    data_file_path=Path("data.xlsx"),
)
mapper.import_data()

Database Connection Configuration

Database Connection String

Database connection details can be provided when running the import. By default, the mapper still uses the existing environment/config lookup, but a call can now override it explicitly:

mapper.import_data(
    database_connection_string="postgresql+psycopg2://user:password@host:5432/database",
    database_engine_kwargs={"pool_size": 10, "pool_pre_ping": True},
)

Hydra Database Configuration Files

When using Hydra database configuration files instead of a direct connection string, pass the config directory and config name:

mapper.import_data(
    database_config_path="configs",
    database_config_name="configs.yaml",
)

The config directory can also be provided through the CONFIG_PATH environment variable:

export CONFIG_PATH=configs
export DB_USER=user
export DB_PASSWORD=password

and then called in the import:

mapper.import_data(database_config_name="configs.yaml")

or entierly in python:

import os
os.environ['CONFIG_PATH'] = "./path_to_config_dir"
os.environ['DB_USER'] = "user"
os.environ['DB_PASSWORD'] = "password"
mapper.import_data(database_config_name="configs.yaml")

YAML Mapper Configuration examples

The YAML mapper configuration is a YAML file that defines the mapping between the data file columns and the database table columns.

YAML mapper configuration examples are available in examples/configs/example_config.yaml. Packaged reference files are also kept in src/soil_etl/yaml_mapper/template.yaml and src/soil_etl/yaml_mapper/examples/on_master.yaml.

Package Status

The package is in alpha and the extractor implementation has been copied into src/soil_etl. The current tree includes the YAML mapper, binding helpers, dataclasses, enums, staging models, database interface, and ORM models needed by the migrated tests.

The remaining packaging work is to finish aligning the public import surface, tests, and documentation with the final package namespace. Some older project notes may still refer to the previous forsite_soil_extractors migration target.

Development

poetry install --with dev,test
poetry run pytest

The migrated data extractor tests are documented in docs/testing.md.

Canonical Imports

from soil_etl.yaml_mapper import YamlDatasetMapper
from soil_etl.bindings import Binding, FromColumn
from soil_etl.db import ImportationInterface

The canonical package namespace is soil_etl. Older migration notes may still refer to forsite_soil_extractors, but new code should use only soil_etl imports.

Build

python -m build
twine check dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soil_etl-0.1.0a1.tar.gz (100.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

soil_etl-0.1.0a1-py3-none-any.whl (154.8 kB view details)

Uploaded Python 3

File details

Details for the file soil_etl-0.1.0a1.tar.gz.

File metadata

  • Download URL: soil_etl-0.1.0a1.tar.gz
  • Upload date:
  • Size: 100.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for soil_etl-0.1.0a1.tar.gz
Algorithm Hash digest
SHA256 c642829609821a207b4c665bd7d5af8ba93acc4c2ee8110385e33d7c074b6331
MD5 54ffb39044fffef443f2eed50085494e
BLAKE2b-256 e0d1e059725ff797eb91ef2a72ea85a86aafd4dbde088600040c85bef96c11c7

See more details on using hashes here.

File details

Details for the file soil_etl-0.1.0a1-py3-none-any.whl.

File metadata

  • Download URL: soil_etl-0.1.0a1-py3-none-any.whl
  • Upload date:
  • Size: 154.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for soil_etl-0.1.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 76a187888fa8461c4fbebeadaf7f6ad3cf0c727267091a78dcc9db443ba99be1
MD5 7f0295650bf53fd5cc91186afcfbf715
BLAKE2b-256 6e5fa76e234614f0cd16319cc7ec670ed93d586aac460d28d32a1f526bd2d8bd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page