A tool to convert activities into biomedical concepts
Project description
USDM Biomedical Concept Mapper
A Python tool for automatically mapping activities in USDM (Unified Study Data Model) files to CDISC biomedical concepts using AI-powered semantic search and LLM-based matching.
Table of Contents
- What does this project do?
- Installation
- Quick Start
- Configuration
- Command Line Usage
- Advanced Usage
- Output Examples
- Development
- License
- Contributing
- Support
What does this project do?
The USDM Biomedical Concept Mapper helps identify biomedical concepts for activities in USDM files:
- Automated Mapping: Maps activities from USDM files to standardized biomedical concepts
- AI-Powered Search: Uses Large Language Models (LLMs) to find the best matching CDISC concepts for given activities
- CDISC Integration: Utilizes the latest CDISC biomedical concepts and SDTM dataset specializations
- Batch Processing: Processes entire USDM study files and generates mapped outputs
Key Features
- Multiple Search Methods: Supports both LLM-powered exact matching and local index searching
- Configurable AI Models: Supports different commercial or open-source LLMs
- Command Line Interface: Easy-to-use CLI for batch processing and individual concept searches
Installation
Prerequisites
- Python 3.13 or higher
- Access to LLM (commercial or open-source)
Install from PyPI
pip install usdm-bc-mapper
Quick Start
-
Install the package:
pip install usdm-bc-mapper
-
Create a config file (
config.yaml) in your working directory:llm_api_key: "your-api-key-here" llm_model: "gpt-5-mini"
-
Run the mapper on your USDM file:
bcm usdm your_study.json
-
Get help with any command:
bcm --help bcm usdm --help
How to use the tools
Configuration
Before using the tool, you need to configure your settings. Create a config.yaml file in your working directory (the same directory where your USDM JSON file is located):
# config.yaml
llm_api_key: "your-api-key-here"
llm_model: "gpt-5-mini" # or your preferred model
# Optional Configurations
llm_base_url: "https://api.openai.com/v1" # or your custom endpoint
max_ai_lookup_attempts: 7 # max retries for AI lookup
data_path: "path/to/cdisc/data" # path to CDISC data files and system prompt for LLMs
data_search_cols: # columns to search in CDISC data
- "short_name"
- "bc_categories"
- "synonyms"
- "definition"
Command Line Usage
The tool provides three main commands through the bcm CLI. Use bcm --help or bcm <command> --help to see detailed documentation for each command.
1. Map USDM File Biomedical Concepts
Map all biomedical concepts in a USDM file to CDISC standards:
bcm usdm path/to/your/usdm_file.json --config config.yaml
With custom output file:
bcm usdm path/to/your/usdm_file.json --output mapped_results.json --config config.yaml
2. Find Individual Biomedical Concept
Find CDISC match for a specific biomedical concept using LLM (provides exact matching):
bcm find-bc-cdisc "diabetes mellitus" --config config.yaml
3. Search CDISC Biomedical Concepts
Search the local CDISC index for matching concepts (searches local index without LLM):
bcm search-bc-cdisc "blood pressure" --config config.yaml
Search with custom number of results:
bcm search-bc-cdisc "blood pressure" --k 20 --config config.yaml
Note: The main difference between find-bc-cdisc and search-bc-cdisc is that find-bc-cdisc uses an LLM to find exact matches, while search-bc-cdisc looks for matches in the local index.
Advanced Usage
Enable Debug Logging
Add the --show-logs flag to any command to see detailed processing information:
bcm usdm path/to/file.json --config config.yaml --show-logs
Output Examples
USDM Mapping Output
When using bcm usdm, the tool outputs the original USDM data with mapped CDISC biomedical concepts, including confidence scores and reasoning in structured JSON format.
Individual Concept Search Output
When using bcm find-bc-cdisc or bcm search-bc-cdisc, the tool returns matched CDISC concept details with relevance scores.
Development
Development Setup
Clone the project:
git clone https://github.com/AI-LENS/usdm-bc-mapper.git
Go to the project directory:
cd usdm-bc-mapper
Install dependencies:
uv sync --group dev
Running Tests
pytest
Pre-commit Hooks
Install pre-commit hooks for code quality:
pre-commit install
pre-commit run --all-files
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Support
For questions or issues, please open an issue on the GitHub repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file usdm_bc_mapper-0.3.0.tar.gz.
File metadata
- Download URL: usdm_bc_mapper-0.3.0.tar.gz
- Upload date:
- Size: 18.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
305a6835bc395c697ce4730a286262d068882767913341c9a36ad4089a19d944
|
|
| MD5 |
43317ad8677c3bd6460380623b24cfb2
|
|
| BLAKE2b-256 |
da0fcbe5ba6a39aa32735ed2a1fdc3a42f5e76d22770daf5d9636354851b0319
|
Provenance
The following attestation bundles were made for usdm_bc_mapper-0.3.0.tar.gz:
Publisher:
pypi_release.yml on AI-LENS/usdm-bc-mapper
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
usdm_bc_mapper-0.3.0.tar.gz -
Subject digest:
305a6835bc395c697ce4730a286262d068882767913341c9a36ad4089a19d944 - Sigstore transparency entry: 499560570
- Sigstore integration time:
-
Permalink:
AI-LENS/usdm-bc-mapper@31034beaf31458587a1cd03219cd1b1ac6787dff -
Branch / Tag:
refs/heads/main - Owner: https://github.com/AI-LENS
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi_release.yml@31034beaf31458587a1cd03219cd1b1ac6787dff -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file usdm_bc_mapper-0.3.0-py3-none-any.whl.
File metadata
- Download URL: usdm_bc_mapper-0.3.0-py3-none-any.whl
- Upload date:
- Size: 18.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6836de31eff8a43ec4ca774f4a67818f295ca07f4a42f29788443707040e2f5c
|
|
| MD5 |
48614109b488a02ade0dee3c2498308f
|
|
| BLAKE2b-256 |
d7ccb4ad7a5ee50b4cd3c12b45e6c58890847abdf843613649ccaa388f0ed923
|
Provenance
The following attestation bundles were made for usdm_bc_mapper-0.3.0-py3-none-any.whl:
Publisher:
pypi_release.yml on AI-LENS/usdm-bc-mapper
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
usdm_bc_mapper-0.3.0-py3-none-any.whl -
Subject digest:
6836de31eff8a43ec4ca774f4a67818f295ca07f4a42f29788443707040e2f5c - Sigstore transparency entry: 499560612
- Sigstore integration time:
-
Permalink:
AI-LENS/usdm-bc-mapper@31034beaf31458587a1cd03219cd1b1ac6787dff -
Branch / Tag:
refs/heads/main - Owner: https://github.com/AI-LENS
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi_release.yml@31034beaf31458587a1cd03219cd1b1ac6787dff -
Trigger Event:
workflow_dispatch
-
Statement type: