Skip to main content

This project contains standardized tools to use LLMs in research studies for improving patient care.

Project description

project_ryland

Description

This project develops standardized tools to use LLMs in research studies for improving patient care. The two main features are:

  1. Enable a user-firendly access point for the use of GPT4DFCI, especially for those without a computing background
  2. (In development) Offer a set of tools to process OncDRS data and prepare it for use in LLM research

History

This project was conceived in fall 2025 when Justin Vinh noticed that no modular, user-friendly package existed at the Dana-Farber Cancer Institute in Boston, MA, to allow users to take advantage of the newly offered GPT4DFCI. GPT4DFCI is the HIPAA-compliant large language model (LLM) interface offered to researchers, and the associated API can be powerful if utilized. So he developed this project in collaboration with Thomas Sounack and the support of the Lindvall Lab to fill this gap.

RYLAND stands for "Research sYstem for LLM-based Analytics of Novel Data." Ryland is the protagonist of Justin's favorite book Project Hail Mary by Andy Weir.

Project Organization

project_ryland/
├── .github/
│   └── workflows/
│       └── publish.yml
├── .gitignore
├── CHANGELOG.md
├── LICENSE
├── project_ryland/
│   ├── __init__.py
│   └── llm_utils/
│       ├── __init__.py
│       ├── llm_config.py
│       └── llm_generation_utils.py
├── pyproject.toml
└── README.md

Features of the LLM Utililties Package

  1. Enables a user-friendly use of the GPT4DFCI API
  2. Enables the use of a prompt library to keep track of prompts and associated metadata

Instructions for Use

1. Installing the API

  1. Ensure that you are on the DFCI network or running the VPN client.
  2. Follow the instructions on the Azure website to install the Azure CLI tool. This will be necessary to enable the API for GPT4DFCI.
  3. Once installed, run this command in Terminal (MacOS) or Command Prompt (Windows):
az login --allow-no-subscriptions
  1. Running the prior command will open a window for you to login into your account. Log in.

2. Using Project Ryland

Note: A copy-paste version of the script is available at the end. Variable definitions can also be found at the end after the example script.

  1. If this is your first time using Project Ryland, you must install it into your environment. In Terminal or Command Prompt run the following

  2. Import llm_generation_utils from Project Ryland

from project_ryland.llm_utils import llm_generation_utils as llm
  1. In your Jupyter notebook or python script, define your endpoint and entra_scope. The endpoint is user-specific, while the entra_scope is the same for all users (current default for DFCI shown below). These values should have been provided when you were granted GPT4DFCI API access.
  2. Specify the LLM model that you will be using to run your prompts.
ENDPOINT = "https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
ENTRA_SCOPE = "https://cognitiveservices.azure.com/.default"
model_name="gpt-5"
  1. Run the LLM_wrapper function to initialize the API.
    • Note that this only has to be done once per run. You can call the API multiple times in one run
LLM_wrapper = llm.LLM_wrapper(
    model_name,
    endpoint=ENDPOINT,
    entra_scope=ENTRA_SCOPE,
)
  1. Declare the path to your input CSV file.
  2. Declare the path to your LLM Prompt Library if you will be utilizing that feature. A template prompt gallery is available for download from the GitHub. Add the library to the same directory as your main script. Use of the gallery is highly recommended to track prompts texts, prompt structures, and associated metadata.

input_file = 'pathology_llm_tests.csv'
gallery_path = "llm_prompt_gallery"

llm_prompt_gallery

dfp_new = LLM_wrapper.process_text_data(
    # Essential to specify
    input_file_path=input_file,
    text_column="SECTION_TEXT",
    format_class=ps.AssessNanoPathology,
    use_prompt_gallery=False,

    # Specify if using the prompt gallery, else put None
    prompt_gallery_path=gallery_path,
    prompt_to_get="gwas_symptoms_prompt_v1",
    user_prompt_vars=gwas_prompt_variables_v1,

    # Specify if NOT using the prompt gallery, else put None
    prompt_text = "Give me a hello",

    # Optional to specify
    output_dir="output_tests",
    flatten=True,
    sample_mode=False,
    resume=True,
    keep_checkpoints=False,
    save_every=10,

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

project_ryland-2.1.5.tar.gz (112.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

project_ryland-2.1.5-py3-none-any.whl (115.5 kB view details)

Uploaded Python 3

File details

Details for the file project_ryland-2.1.5.tar.gz.

File metadata

  • Download URL: project_ryland-2.1.5.tar.gz
  • Upload date:
  • Size: 112.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for project_ryland-2.1.5.tar.gz
Algorithm Hash digest
SHA256 4f8ff51f38fd9d84d03985d79b7104650bace81c45d476b09626c5f559a5cf7e
MD5 3bd878d323391db09f1bd1052ed13485
BLAKE2b-256 68169b84f94fab87c2467ace29e5c1561f9f7cd08f3d29b6966f157c6297bf23

See more details on using hashes here.

Provenance

The following attestation bundles were made for project_ryland-2.1.5.tar.gz:

Publisher: publish.yml on justin-vinh/project_ryland

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file project_ryland-2.1.5-py3-none-any.whl.

File metadata

  • Download URL: project_ryland-2.1.5-py3-none-any.whl
  • Upload date:
  • Size: 115.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for project_ryland-2.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 a6425049d40a43129f91a9973b019dff4e7c4fdd95416756f3c4ca54c877bec2
MD5 b5137fbf2ddd39fc2749537028150178
BLAKE2b-256 4dd49d83ff284df9a2edcc7d5aafe7d9f2450a434e6869e91506bbe229991c9d

See more details on using hashes here.

Provenance

The following attestation bundles were made for project_ryland-2.1.5-py3-none-any.whl:

Publisher: publish.yml on justin-vinh/project_ryland

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page