Python client for the BioMapper2 API — map biological entities to standardized knowledge graph identifiers
Project description
ddharmon
Python client for the BioMapper2 API — map biological entity names to standardized knowledge-graph identifiers (CHEBI, HMDB, PubChem, RefMet, and more).
from ddharmon import map_entity
result = map_entity("L-Histidine")
print(result.primary_curie) # RM:0129894
print(result.confidence_tier) # high
print(result.ids_for("CHEBI")) # ['15971']
Installation
# Core (async HTTP client + Pydantic models)
pip install ddharmon
Getting an API key
The BioMapper2 API requires an API key. To request access, email trent.leslie@phenomehealth.org.
Once you have a key, set it in your environment:
export BIOMAPPER_API_KEY=your-key-here
Or add it to a .env file in your project root:
BIOMAPPER_API_KEY=your-key-here
ddharmon will pick it up automatically from either location.
Quick start
Single lookup (synchronous)
from ddharmon import map_entity
result = map_entity("L-Histidine")
print(result.resolved) # True
print(result.primary_curie) # RM:0129894
print(result.chosen_kg_id) # CHEBI:15971
print(result.confidence_score) # 2.489
print(result.confidence_tier) # high (≥2.0)
print(result.ids_for("CHEBI")) # ['15971']
print(result.ids_for("refmet_id")) # ['RM0129894']
Batch mapping (synchronous)
from ddharmon import map_entities, summarize
records = [
{"name": "L-Histidine"},
{"name": "Glucose", "identifiers": {"HMDB": "HMDB00122"}},
{"name": "Sphinganine"},
]
results = map_entities(records, progress=True) # tqdm bar with [notebook]
summary = summarize(results)
print(f"{summary.resolved}/{summary.total_queried} resolved")
print(f"Resolution rate: {summary.resolution_rate:.1%}")
print(summary.vocabulary_coverage)
Async usage
import asyncio
from ddharmon import BioMapperClient
async def main() -> None:
async with BioMapperClient() as client:
# Verify connectivity
health = await client.health_check()
print(health) # {'status': 'healthy', ...}
# Single
result = await client.map_entity(
"L-Histidine",
identifiers={"HMDB": "HMDB00177"},
)
# Batch with rate limiting
results = await client.map_entities(
[{"name": "L-Histidine"}, {"name": "Glucose"}],
rate_limit_delay=0.3,
progress=True,
)
asyncio.run(main())
Jupyter notebooks
Apply nest_asyncio before using sync helpers inside a running event loop:
import nest_asyncio
nest_asyncio.apply() # required in Jupyter
from ddharmon import map_entities
results = map_entities([{"name": "L-Histidine"}], progress=True)
Preprocessing functions
from ddharmon.extras.metabolon import clean_compound_name, extract_hmdb_id
# Strip quotes and collision-energy suffixes
clean_compound_name('"1,3-Diphenylguanidine_CE45"') # '1,3-Diphenylguanidine'
clean_compound_name('"4,6-DIOXOHEPTANOIC ACID"') # '4,6-DIOXOHEPTANOIC ACID'
clean_compound_name('L-Histidine') # 'L-Histidine' (unchanged)
# Extract HMDB accessions from ms1_compound_name format
extract_hmdb_id('HMDB:HMDB03349-2257 L-Dihydroorotic acid') # 'HMDB03349'
extract_hmdb_id('HMDB00177') # 'HMDB00177'
extract_hmdb_id(None) # None
API reference
MappingResult
| Attribute | Type | Description |
|---|---|---|
query_name |
str |
Name submitted to the API |
resolved |
bool |
Whether any identifier was returned |
primary_curie |
str | None |
First CURIE in the response |
chosen_kg_id |
str | None |
Resolver-selected knowledge graph ID |
confidence_score |
float | None |
Highest score across annotators |
confidence_tier |
str |
"high" (≥2.0) / "medium" (1–2) / "low" (<1) / "unknown" |
identifiers |
dict[str, list[str]] |
Vocabulary → IDs, e.g. {"CHEBI": ["15971"]} |
hmdb_hint |
str | None |
HMDB hint passed in the request |
error |
str | None |
Error message if mapping failed |
result.ids_for("CHEBI") # ['15971']
result.ids_for("refmet_id") # ['RM0129894']
result.ids_for("PUBCHEM.COMPOUND") # []
Confidence tiers
| Score | Tier | Recommended action |
|---|---|---|
| ≥ 2.0 | high |
Accept without review |
| 1.0–2.0 | medium |
Quick sanity check |
| < 1.0 | low |
Manual review recommended |
None |
unknown |
No score returned (e.g. HMDB-hint resolved) |
Error handling
from ddharmon import (
BioMapperError, # base class
BioMapperAuthError, # 401/403 — bad API key
BioMapperRateLimitError, # 429 — throttled
BioMapperServerError, # 5xx
BioMapperTimeoutError, # request timeout
BioMapperConfigError, # missing API key / bad config
)
try:
result = map_entity("Glucose")
except BioMapperRateLimitError as e:
print(f"Throttled. Retry after: {e.retry_after}s")
except BioMapperAuthError:
print("Check your BIOMAPPER_API_KEY")
In batch mode (map_entities), per-record errors are caught and returned as
MappingResult(error=...) rather than aborting the batch.
Development
git clone https://github.com/trentleslie/ddharmon
cd ddharmon
poetry install --with dev --extras all
make check # format → lint → type-check → test
make test # tests only
make coverage # HTML coverage report
License
MIT — see LICENSE.
Related
- BioMapper2 API:
https://biomapper.expertintheloop.io
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ddharmon-0.2.0.tar.gz.
File metadata
- Download URL: ddharmon-0.2.0.tar.gz
- Upload date:
- Size: 15.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.2 CPython/3.11.7 Linux/6.17.4-76061704-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8786e2f99901b1ec5592cecb2dc7cf13b98d47c37bad7bd7499fbf99c6de7ea3
|
|
| MD5 |
4272272bfdbad4de6b68f0227dd956de
|
|
| BLAKE2b-256 |
5022333073bda82a903d39bfb29577b7f6d24bd702dddbb41803d5e922c8a78f
|
File details
Details for the file ddharmon-0.2.0-py3-none-any.whl.
File metadata
- Download URL: ddharmon-0.2.0-py3-none-any.whl
- Upload date:
- Size: 18.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.2 CPython/3.11.7 Linux/6.17.4-76061704-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0b9189b55f8287862710f0c8e677b9ce083886e8b9ead83859a805d002d6cbc8
|
|
| MD5 |
19d833f538de168a29d86dda9654e39a
|
|
| BLAKE2b-256 |
cf827f910af2440a8e84a0325a4fdaf2075d4a8b8bf68c5df422e0d5742d3cc3
|