YAML-driven ETL mappers for ForSITE soil database imports.
Project description
SOIL - Structured Observation Ingestion Library
YAML-driven ETL mapper package for importing ForSITE soil datasets into the ForSITE PostGIS database model.
This package is being split out of
forsite-soil-db-interface
so the mapper layer can be versioned, tested, and published independently on PyPI.
Acknowledgements
| Project Partners |
|---|
This project was developed through joint funding provided by Agriculture and Agri-Food Canada (AAFC) under
the CSBO
program and Natural Resources Canada (NRCan) for the ForSITE-Soil degradation project.
The code developed as part of this project includes contributions from:
Target Public API
Basic Usage of the YAML Mapper
from pathlib import Path
from soil_etl.yaml_mapper import YamlDatasetMapper
mapper = YamlDatasetMapper(
config_path=Path("examples/configs/example_config.yaml"),
data_file_path=Path("data.xlsx"),
)
mapper.import_data()
For more details on the YAML format, refer to the YAML Mapper documentation.
Database Connection Configuration
Database Connection String
Database connection details can be provided when running the import. By default, the mapper still uses the existing environment/config lookup, but a call can now override it explicitly:
mapper.import_data(
database_connection_string="postgresql+psycopg2://user:password@host:5432/database",
database_engine_kwargs={"pool_size": 10, "pool_pre_ping": True},
)
Hydra Database Configuration Files
When using Hydra database configuration files instead of a direct connection string, pass the config directory and config name:
mapper.import_data(
database_config_path="configs",
database_config_name="configs.yaml",
)
The config directory can also be provided through the CONFIG_PATH environment
variable:
export CONFIG_PATH=configs
export DB_USER=user
export DB_PASSWORD=password
and then called in the import:
mapper.import_data(database_config_name="configs.yaml")
or entierly in python:
import os
os.environ['CONFIG_PATH'] = "./path_to_config_dir"
os.environ['DB_USER'] = "user"
os.environ['DB_PASSWORD'] = "password"
mapper.import_data(database_config_name="configs.yaml")
YAML Mapper Configuration examples
The YAML mapper configuration is a YAML file that defines the mapping between the data file columns and the database table columns. Detailled documentation of the YAML format is available in README_yaml_format_en.md.
YAML mapper configuration examples are available in
examples/configs/example_config.yaml. Packaged
reference files are also kept in
src/soil_etl/yaml_mapper/template.yaml and
src/soil_etl/yaml_mapper/examples/on_master.yaml.
Package Status
The package is in alpha and the extractor implementation has been copied into
src/soil_etl. The current tree includes the YAML mapper, binding helpers,
dataclasses, enums, staging models, database interface, and ORM models needed by
the migrated tests.
The remaining packaging work is to finish aligning the public import surface,
tests, and documentation with the final package namespace. Some older project
notes may still refer to the previous forsite_soil_extractors migration target.
Development
poetry install --with dev,test
poetry run pytest
The migrated data extractor tests are documented in docs/testing.md.
Canonical Imports
from soil_etl.yaml_mapper import YamlDatasetMapper
from soil_etl.bindings import Binding, FromColumn
from soil_etl.db import ImportationInterface
The canonical package namespace is soil_etl. Older migration notes may still refer to
forsite_soil_extractors, but new code should use only soil_etl imports.
Build
python -m build
twine check dist/*
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file soil_etl-0.1.0a2.tar.gz.
File metadata
- Download URL: soil_etl-0.1.0a2.tar.gz
- Upload date:
- Size: 100.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62343b74fa5a2624ea88e583b41cacd4fc154cf91d784cc24f20e70e4317b786
|
|
| MD5 |
f6ad4b62716ada0d5f69deed23e80867
|
|
| BLAKE2b-256 |
a0206030a2db60c2fc4f3bace80dd6b4ecb869ed963ba108ffafad52d6631116
|
File details
Details for the file soil_etl-0.1.0a2-py3-none-any.whl.
File metadata
- Download URL: soil_etl-0.1.0a2-py3-none-any.whl
- Upload date:
- Size: 154.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1e3b59e0d49ec8b25bb7c483f98ff68d48ed878f28874f5fb247ff99cae5f7a
|
|
| MD5 |
23f406c6ed13ac7f62a492c7a62276ee
|
|
| BLAKE2b-256 |
f92ec095a38e0a7da520cc297bac367336dd27c057ef2efee4fc96f4550000f9
|