Infer AI-friendly metadata about biosamples from multiple sources
Project description
Biosample Enricher
Infer AI-friendly environmental and geographic metadata about biosamples from multiple sources.
Overview
Biosample Enricher provides 8 specialized services for enriching biosample metadata with environmental and geographic information from authoritative data sources. Each service focuses on a specific domain (elevation, weather, soil, marine, land cover, geocoding, geographic features) and returns structured, type-safe data ready for analysis or AI applications.
Features
- 8 Specialized Services: Elevation, soil, weather, marine, land cover, forward/reverse geocoding, geographic features
- Service-Based Architecture: Independent services with focused responsibilities
- Type Safety: Full type hints with Pydantic validation and mypy checking
- Smart Caching: HTTP caching with coordinate canonicalization for efficiency
- Multiple Providers: Automatic fallback between data providers (USGS, Google, OSM, etc.)
- Click-Based CLIs: User-friendly command-line tools for each service
- Flexible Installation: Core services only, or add optional mongodb/metrics/schema extras
Installation
Prerequisites
- Python 3.11 or higher
- UV package manager (recommended)
Add to Your Project (Recommended)
# Basic installation - all 8 enrichment services
uv add biosample-enricher
# With optional dependencies
uv add biosample-enricher --extra metrics # Metrics and visualization
uv add biosample-enricher --extra mongodb # MongoDB support for NMDC/GOLD
uv add biosample-enricher --extra schema # Schema analysis tools
uv add biosample-enricher --extra all # All optional features
From Source (Development)
# Clone and install
git clone https://github.com/contextualizer-ai/biosample-enricher.git
cd biosample-enricher
uv sync
# With optional extras
uv sync --extra mongodb # MongoDB support
uv sync --extra metrics # Metrics and visualization
uv sync --extra schema # Schema analysis tools
uv sync --extra all # Everything
Quick Start
Python API
The package exports 8 services from the top level:
from biosample_enricher import (
ElevationService,
ElevationRequest,
SoilService,
WeatherService,
MarineService,
LandService,
ReverseGeocodingService,
ForwardGeocodingService,
OSMFeaturesService,
)
from datetime import date
# Get elevation for a location
elevation_service = ElevationService()
request = ElevationRequest(latitude=40.7128, longitude=-74.0060)
observations = elevation_service.get_elevation(request)
for obs in observations:
if obs.value_numeric is not None:
print(f"{obs.provider.name}: {obs.value_numeric}m")
# Output:
# usgs_3dep: 13.15m
# google_elevation: 13.26m
# open_topo_data: 25.0m
# osm_elevation: 51.0m
# Get weather data for a location and date
weather_service = WeatherService()
weather_result = weather_service.get_daily_weather(
lat=37.7749,
lon=-122.4194,
target_date=date(2024, 1, 15)
)
print(f"Temperature: {weather_result.temperature.value}°C")
print(f"Precipitation: {weather_result.precipitation.value}mm")
# Get soil properties
soil_service = SoilService()
soil_result = soil_service.enrich_location(
latitude=40.7128,
longitude=-74.0060,
depth_cm="0-5cm"
)
print(f"Provider: {soil_result.provider}")
print(f"Quality score: {soil_result.quality_score}")
# Get marine data (SST, bathymetry, chlorophyll)
marine_service = MarineService()
marine_result = marine_service.get_comprehensive_marine_data(
latitude=36.6,
longitude=-121.9,
target_date=date(2024, 1, 15)
)
if marine_result.sea_surface_temperature:
print(f"Sea surface temp: {marine_result.sea_surface_temperature.value}°C")
if marine_result.bathymetry:
print(f"Water depth: {marine_result.bathymetry.value}m")
# Reverse geocoding (coordinates -> place names)
geocoding_service = ReverseGeocodingService()
result = geocoding_service.reverse_geocode(lat=40.7128, lon=-74.0060)
if result:
print(f"Location: {result.get_formatted_address()}")
# Get nearby geographic features
osm_service = OSMFeaturesService()
features = osm_service.get_features_for_location(
latitude=37.7749,
longitude=-122.4194,
radius_m=1000
)
if features and features.named_features:
for feature in features.named_features[:5]:
print(f"{feature.name} ({feature.category}): {feature.distance_km:.2f}km")
CLI Usage
Each service has its own CLI command:
# Elevation lookup
uv run elevation-lookup lookup --lat 40.7128 --lon -74.0060
# Soil data
uv run soil-enricher lookup --lat 40.7128 --lon -74.0060 --depth 10
# Weather data
uv run weather-enricher lookup --lat 37.7749 --lon -122.4194 --date 2024-01-15
# Marine data
uv run marine-enricher lookup --lat 36.6 --lon -121.9 --date 2024-01-15
# Land cover
uv run land-enricher lookup --lat 40.7128 --lon -74.0060
# Batch processing from CSV
uv run elevation-lookup batch --input samples.csv --lat-col latitude --lon-col longitude
# Version info
uv run biosample-version
Services
1. Elevation Service
Get elevation data from multiple providers (USGS, Google, Open Topo Data).
Providers: USGS (US only, free), Google (global, requires API key), Open Topo Data (global, free)
Python:
from biosample_enricher import ElevationService, ElevationRequest
service = ElevationService()
request = ElevationRequest(latitude=40.7128, longitude=-74.0060)
observations = service.get_elevation(request)
CLI:
uv run elevation-lookup lookup --lat 40.7128 --lon -74.0060
2. Soil Service
Get soil properties (texture, pH, organic carbon, etc.).
Providers: SoilGrids (global coverage), USDA NRCS (US only)
Python:
from biosample_enricher import SoilService
service = SoilService()
soil_result = service.enrich_location(
latitude=40.7128,
longitude=-74.0060,
depth_cm="0-5cm"
)
CLI:
uv run soil-enricher lookup --lat 40.7128 --lon -74.0060 --depth 10
3. Weather Service
Get historical weather data (temperature, precipitation, humidity, etc.).
Providers: Open-Meteo (free, global), Meteostat (free, global)
Python:
from biosample_enricher import WeatherService
from datetime import date
service = WeatherService()
weather_result = service.get_daily_weather(
lat=37.7749,
lon=-122.4194,
target_date=date(2024, 1, 15)
)
CLI:
uv run weather-enricher lookup --lat 37.7749 --lon -122.4194 --date 2024-01-15
4. Marine Service
Get marine data (sea surface temperature, bathymetry, chlorophyll).
Providers: NOAA OISST (SST), GEBCO (bathymetry), ESA CCI (chlorophyll)
Python:
from biosample_enricher import MarineService
from datetime import date
service = MarineService()
marine_result = service.get_comprehensive_marine_data(
latitude=36.6,
longitude=-121.9,
target_date=date(2024, 1, 15)
)
CLI:
uv run marine-enricher lookup --lat 36.6 --lon -121.9 --date 2024-01-15
5. Land Service
Get land cover classification.
Providers: ESA WorldCover, MODIS, NLCD (US only)
Python:
from biosample_enricher import LandService
service = LandService()
land_result = service.enrich_location(
latitude=40.7128,
longitude=-74.0060
)
CLI:
uv run land-enricher lookup --lat 40.7128 --lon -74.0060
6. Reverse Geocoding Service
Convert coordinates to human-readable addresses.
Providers: OSM Nominatim (free), Google Geocoding (requires API key)
Python:
from biosample_enricher import ReverseGeocodingService
service = ReverseGeocodingService()
result = service.reverse_geocode(lat=40.7128, lon=-74.0060)
if result:
print(result.get_formatted_address())
7. Forward Geocoding Service
Convert addresses/place names to coordinates.
Providers: OSM Nominatim (free), Google Geocoding (requires API key)
Python:
from biosample_enricher import ForwardGeocodingService
service = ForwardGeocodingService()
result = service.geocode("New York City")
if result and result.locations:
for location in result.locations[:3]:
print(f"{location.formatted_address}: {location.latitude}, {location.longitude}")
8. OSM Features Service
Get nearby geographic features (parks, water bodies, landmarks).
Providers: OpenStreetMap Overpass API (free), Google Places (requires API key)
Python:
from biosample_enricher import OSMFeaturesService
service = OSMFeaturesService()
features = service.get_features_for_location(
latitude=37.7749,
longitude=-122.4194,
radius_m=1000
)
if features and features.named_features:
for feature in features.named_features[:5]:
print(f"{feature.name} ({feature.category})")
API Keys
Only required for Google services (optional - OSM alternatives available for everything):
# Single API key for all Google services
export GOOGLE_MAIN_API_KEY="your-key-here"
All other services are free and require no authentication.
Development
Setup
# Clone repository
git clone https://github.com/contextualizer-ai/biosample-enricher.git
cd biosample-enricher
# Complete development setup
make dev-setup
Testing
# Run fast tests (excludes network/slow tests)
make test-fast
# Run all tests with coverage
make test-cov
# Run specific test categories
make test-unit # Unit tests only
make test-integration # Integration tests
Code Quality
# Format, lint, type-check, test
make dev-check
# Full CI validation
make check-ci
# Individual checks
make format # Format with ruff
make lint # Lint with ruff
make type-check # Type check with mypy
make dep-check # Check dependencies with deptry
Project Structure
biosample-enricher/
├── biosample_enricher/
│ ├── __init__.py # Public API exports
│ ├── elevation/ # Elevation service
│ ├── soil/ # Soil service
│ ├── weather/ # Weather service
│ ├── marine/ # Marine service
│ ├── land/ # Land cover service
│ ├── reverse_geocoding/ # Reverse geocoding
│ ├── forward_geocoding/ # Forward geocoding
│ ├── osm_features/ # Geographic features
│ ├── models.py # Core data models
│ ├── http_cache.py # HTTP caching
│ └── cli*.py # CLI commands
├── tests/ # Test suite
├── pyproject.toml # Project configuration
└── Makefile # Development automation
Dependencies
Core Dependencies
- Always installed: pandas, rasterio, meteostat (required for weather aggregation and global soil coverage)
- CLI and data validation: click, pydantic, requests, rich, pyyaml
Optional Dependencies
- mongodb:
pymongofor fetching from NMDC/GOLD databases (evaluation/demo only) - metrics:
matplotlib,seabornfor visualization - schema:
gensonfor schema analysis
Install with: uv sync --extra mongodb or uv sync --extra all
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run checks (
make dev-check) - Commit (
git commit -m 'Add amazing feature') - Push (
git push origin feature/amazing-feature) - Open a Pull Request
See CLAUDE.md for detailed development guidelines.
License
MIT License - see LICENSE file for details.
Acknowledgments
- Built with UV for fast package management
- CLI powered by Click
- Data validation with Pydantic
- Console output with Rich
- Caching with requests-cache
Support
- Issues: GitHub Issues
- Email: info@contextualizer.ai
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file biosample_enricher-0.1.0rc1.tar.gz.
File metadata
- Download URL: biosample_enricher-0.1.0rc1.tar.gz
- Upload date:
- Size: 410.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
106fb1bd93887821e49ee833a3794513cec3acad8724972ffae6b3543905f45d
|
|
| MD5 |
b10a0e7725fce8290c29bbbc139f25c4
|
|
| BLAKE2b-256 |
2d372d3c8ce7ec880d125966ea3cb840dd78331adf05fa0e824240c5d0b3f640
|
Provenance
The following attestation bundles were made for biosample_enricher-0.1.0rc1.tar.gz:
Publisher:
release.yml on contextualizer-ai/biosample-enricher
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
biosample_enricher-0.1.0rc1.tar.gz -
Subject digest:
106fb1bd93887821e49ee833a3794513cec3acad8724972ffae6b3543905f45d - Sigstore transparency entry: 647000863
- Sigstore integration time:
-
Permalink:
contextualizer-ai/biosample-enricher@fe92e5c0d110214815b524671d556808758f0086 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/contextualizer-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@fe92e5c0d110214815b524671d556808758f0086 -
Trigger Event:
release
-
Statement type:
File details
Details for the file biosample_enricher-0.1.0rc1-py3-none-any.whl.
File metadata
- Download URL: biosample_enricher-0.1.0rc1-py3-none-any.whl
- Upload date:
- Size: 237.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f61a59926a322f964b336762a273a347aa5d00e549931e7201eabc02a3908ed
|
|
| MD5 |
a224d06919025b8f1e415635d1b20a3d
|
|
| BLAKE2b-256 |
9c92f04936060f691a8c36f4c4c5bc42565e4a031cb1bc489ceb70208ad79a24
|
Provenance
The following attestation bundles were made for biosample_enricher-0.1.0rc1-py3-none-any.whl:
Publisher:
release.yml on contextualizer-ai/biosample-enricher
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
biosample_enricher-0.1.0rc1-py3-none-any.whl -
Subject digest:
4f61a59926a322f964b336762a273a347aa5d00e549931e7201eabc02a3908ed - Sigstore transparency entry: 647000864
- Sigstore integration time:
-
Permalink:
contextualizer-ai/biosample-enricher@fe92e5c0d110214815b524671d556808758f0086 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/contextualizer-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@fe92e5c0d110214815b524671d556808758f0086 -
Trigger Event:
release
-
Statement type: