Skip to main content

This project contains standardized tools to use LLMs in research studies for improving patient care.

Project description

project_ryland

Description

This project develops standardized tools to use LLMs in research studies for improving patient care. The two main features are:

  1. Enable a user-firendly access point for the use of GPT4DFCI, especially for those without a computing background
  2. (In development) Offer a set of tools to process OncDRS data and prepare it for use in LLM research

History

This project was conceived in fall 2025 when Justin Vinh noticed that no modular, user-friendly package existed at the Dana-Farber Cancer Institute in Boston, MA, to allow users to take advantage of the newly offered GPT4DFCI. GPT4DFCI is the HIPAA-compliant large language model (LLM) interface offered to researchers, and the associated API can be powerful if utilized. So he developed this project in collaboration with Thomas Sounack and the support of the Lindvall Lab to fill this gap.

RYLAND stands for "Research sYstem for LLM-based Analytics of Novel Data." Ryland is the protagonist of Justin's favorite book Project Hail Mary by Andy Weir.

Project Organization

project_ryland/
├── .github/
│   └── workflows/
│       └── publish.yml
├── .gitignore
├── CHANGELOG.md
├── LICENSE
├── project_ryland/
│   ├── __init__.py
│   └── llm_utils/
│       ├── __init__.py
│       ├── llm_config.py
│       └── llm_generation_utils.py
├── pyproject.toml
└── README.md

Features of the LLM Utililties Package

  1. Enables a user-friendly use of the GPT4DFCI API
  2. Enables the use of a prompt library to keep track of prompts and associated metadata

Instructions for Use

1. Installing the API

  1. Ensure that you are on the DFCI network or running the VPN client.
  2. Follow the instructions on the Azure website to install the Azure CLI tool. This will be necessary to enable the API for GPT4DFCI.
  3. Once installed, run this command in Terminal (MacOS) or Command Prompt (Windows):
az login --allow-no-subscriptions
  1. Running the prior command will open a window for you to login into your account. Log in.

2. Using Project Ryland

Note: A copy-paste version of the script is available at the end. Variable definitions can also be found at the end after the example script.

  1. If this is your first time using Project Ryland, you must install it into your environment. In Terminal or Command Prompt run the following

  2. Import llm_generation_utils from Project Ryland

from project_ryland.llm_utils import llm_generation_utils as llm
  1. In your Jupyter notebook or python script, define your endpoint and entra_scope. The endpoint is user-specific, while the entra_scope is the same for all users (current default for DFCI shown below). These values should have been provided when you were granted GPT4DFCI API access.
  2. Specify the LLM model that you will be using to run your prompts.
ENDPOINT = "https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
ENTRA_SCOPE = "https://cognitiveservices.azure.com/.default"
model_name="gpt-5"
  1. Run the LLM_wrapper function to initialize the API.
    • Note that this only has to be done once per run. You can call the API multiple times in one run
LLM_wrapper = llm.LLM_wrapper(
    model_name,
    endpoint=ENDPOINT,
    entra_scope=ENTRA_SCOPE,
)
  1. Declare the path to your input CSV file.
  2. Declare the path to your LLM Prompt Library if you will be utilizing that feature. A template prompt gallery is available for download from the GitHub. Add the library to the same directory as your main script. Use of the gallery is highly recommended to track prompts texts, prompt structures, and associated metadata.

input_file = 'pathology_llm_tests.csv'
gallery_path = "llm_prompt_gallery"

llm_prompt_gallery

dfp_new = LLM_wrapper.process_text_data(
    # Essential to specify
    input_file_path=input_file,
    text_column="SECTION_TEXT",
    format_class=ps.AssessNanoPathology,
    use_prompt_gallery=False,

    # Specify if using the prompt gallery, else put None
    prompt_gallery_path=gallery_path,
    prompt_to_get="gwas_symptoms_prompt_v1",
    user_prompt_vars=gwas_prompt_variables_v1,

    # Specify if NOT using the prompt gallery, else put None
    prompt_text = "Give me a hello",

    # Optional to specify
    output_dir="output_tests",
    flatten=True,
    sample_mode=False,
    resume=True,
    keep_checkpoints=False,
    save_every=10,

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

project_ryland-2.1.4.tar.gz (112.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

project_ryland-2.1.4-py3-none-any.whl (115.5 kB view details)

Uploaded Python 3

File details

Details for the file project_ryland-2.1.4.tar.gz.

File metadata

  • Download URL: project_ryland-2.1.4.tar.gz
  • Upload date:
  • Size: 112.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for project_ryland-2.1.4.tar.gz
Algorithm Hash digest
SHA256 17b2b32e0bbea16a24935e4556f469aed8eceec46948edea51905e346fac0b2a
MD5 c37370cd6598c24e6f8b821ea61bb85c
BLAKE2b-256 504044f48c1aefcbdbcf78d7877fa8851ba4e9c8468e913d232eae352e3158ed

See more details on using hashes here.

Provenance

The following attestation bundles were made for project_ryland-2.1.4.tar.gz:

Publisher: publish.yml on justin-vinh/project_ryland

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file project_ryland-2.1.4-py3-none-any.whl.

File metadata

  • Download URL: project_ryland-2.1.4-py3-none-any.whl
  • Upload date:
  • Size: 115.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for project_ryland-2.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 f432fa606dd5bc0245c8b514492825f127b4a837c70eea80b5cc3291bb88d015
MD5 3a87579204e4bff7dde8b76039be6a47
BLAKE2b-256 62b8f2222412e1d05c210929b6ff4a1059d0110b5909f720f415cb9ff9baa03c

See more details on using hashes here.

Provenance

The following attestation bundles were made for project_ryland-2.1.4-py3-none-any.whl:

Publisher: publish.yml on justin-vinh/project_ryland

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page