Skip to main content

A tool to convert activities into biomedical concepts

Project description

USDM Biomedical Concept Mapper

License: MIT Python Build Status

A Python tool for automatically mapping activities in USDM (Unified Study Data Model) files to CDISC biomedical concepts using AI-powered semantic search and LLM-based matching.

Table of Contents

What does this project do?

The USDM Biomedical Concept Mapper helps identify biomedical concepts for activities in USDM files:

  • Automated Mapping: Maps activities from USDM files to standardized biomedical concepts
  • AI-Powered Search: Uses Large Language Models (LLMs) to find the best matching CDISC concepts for given activities
  • CDISC Integration: Utilizes the latest CDISC biomedical concepts and SDTM dataset specializations
  • Batch Processing: Processes entire USDM study files and generates mapped outputs

Key Features

  • Multiple Search Methods: Supports both LLM-powered exact matching and local index searching
  • Configurable AI Models: Supports different commercial or open-source LLMs
  • Command Line Interface: Easy-to-use CLI for batch processing and individual concept searches

Installation

Prerequisites

  • Python 3.13 or higher
  • Access to LLM (commercial or open-source)

Install from PyPI

pip install usdm-bc-mapper

Quick Start

  1. Install the package:

    pip install usdm-bc-mapper
    
  2. Create a config file (config.yaml) in your working directory:

    llm_api_key: "your-api-key-here"
    llm_model: "gpt-5-mini"
    
  3. Run the mapper on your USDM file:

    bcm usdm your_study.json
    
  4. Get help with any command:

    bcm --help
    bcm usdm --help
    

How to use the tools

Configuration

Before using the tool, you need to configure your settings. Create a config.yaml file in your working directory (the same directory where your USDM JSON file is located):

# config.yaml
llm_api_key: "your-api-key-here"
llm_model: "gpt-5-mini" # or your preferred model

# Optional Configurations
llm_base_url: "https://api.openai.com/v1" # or your custom endpoint
max_ai_lookup_attempts: 7 # max retries for AI lookup
data_path: "path/to/cdisc/data" # path to CDISC data files and system prompt for LLMs
data_search_cols: # columns to search in CDISC data
  - "short_name"
  - "bc_categories"
  - "synonyms"
  - "definition"

Command Line Usage

The tool provides three main commands through the bcm CLI. Use bcm --help or bcm <command> --help to see detailed documentation for each command.

1. Map USDM File Biomedical Concepts

Map all biomedical concepts in a USDM file to CDISC standards:

bcm usdm path/to/your/usdm_file.json --config config.yaml

With custom output file:

bcm usdm path/to/your/usdm_file.json --output mapped_results.json --config config.yaml

2. Find Individual Biomedical Concept

Find CDISC match for a specific biomedical concept using LLM (provides exact matching):

bcm find-bc-cdisc "diabetes mellitus" --config config.yaml

3. Search CDISC Biomedical Concepts

Search the local CDISC index for matching concepts (searches local index without LLM):

bcm search-bc-cdisc "blood pressure" --config config.yaml

Search with custom number of results:

bcm search-bc-cdisc "blood pressure" --k 20 --config config.yaml

Note: The main difference between find-bc-cdisc and search-bc-cdisc is that find-bc-cdisc uses an LLM to find exact matches, while search-bc-cdisc looks for matches in the local index.

Advanced Usage

Enable Debug Logging

Add the --show-logs flag to any command to see detailed processing information:

bcm usdm path/to/file.json --config config.yaml --show-logs

Output Examples

USDM Mapping Output

When using bcm usdm, the tool outputs the original USDM data with mapped CDISC biomedical concepts, including confidence scores and reasoning in structured JSON format.

Individual Concept Search Output

When using bcm find-bc-cdisc or bcm search-bc-cdisc, the tool returns matched CDISC concept details with relevance scores.

Development

Development Setup

Clone the project:

git clone https://github.com/AI-LENS/usdm-bc-mapper.git

Go to the project directory:

cd usdm-bc-mapper

Install dependencies:

uv sync --group dev

Running Tests

pytest

Pre-commit Hooks

Install pre-commit hooks for code quality:

pre-commit install
pre-commit run --all-files

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For questions or issues, please open an issue on the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

usdm_bc_mapper-0.3.0.tar.gz (18.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

usdm_bc_mapper-0.3.0-py3-none-any.whl (18.8 MB view details)

Uploaded Python 3

File details

Details for the file usdm_bc_mapper-0.3.0.tar.gz.

File metadata

  • Download URL: usdm_bc_mapper-0.3.0.tar.gz
  • Upload date:
  • Size: 18.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for usdm_bc_mapper-0.3.0.tar.gz
Algorithm Hash digest
SHA256 305a6835bc395c697ce4730a286262d068882767913341c9a36ad4089a19d944
MD5 43317ad8677c3bd6460380623b24cfb2
BLAKE2b-256 da0fcbe5ba6a39aa32735ed2a1fdc3a42f5e76d22770daf5d9636354851b0319

See more details on using hashes here.

Provenance

The following attestation bundles were made for usdm_bc_mapper-0.3.0.tar.gz:

Publisher: pypi_release.yml on AI-LENS/usdm-bc-mapper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file usdm_bc_mapper-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: usdm_bc_mapper-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 18.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for usdm_bc_mapper-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6836de31eff8a43ec4ca774f4a67818f295ca07f4a42f29788443707040e2f5c
MD5 48614109b488a02ade0dee3c2498308f
BLAKE2b-256 d7ccb4ad7a5ee50b4cd3c12b45e6c58890847abdf843613649ccaa388f0ed923

See more details on using hashes here.

Provenance

The following attestation bundles were made for usdm_bc_mapper-0.3.0-py3-none-any.whl:

Publisher: pypi_release.yml on AI-LENS/usdm-bc-mapper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page