Skip to main content

This project contains standardized tools to use LLMs in research studies for improving patient care.

Project description

Project Ryland

Description

This project enables users to more easily access and use the GPT4DFCI API.

Features

  • User-friendly interface for using the GPT4DFCI API
  • Local cost tracking for live estimates of running costs
  • Automatic logs to keep track of prompts, model used, and costs
  • A visual progress bar to estimate time until completion
  • Automatic checkpointing of operations to enable resuming if interrupted
  • A prompt gallery to help users keep track of prompts and add metadata
  • Input of user-created prompts for quick plug-and-play usage

The package is still in development and more features will be added with time.

History

This project was conceived in fall 2025 when Justin Vinh noticed that no modular, user-friendly package existed at the Dana-Farber Cancer Institute in Boston, MA, to allow users to take advantage of the newly offered GPT4DFCI. GPT4DFCI is the HIPAA-compliant large language model (LLM) interface offered to researchers, and the associated API can be powerful if utilized. So he developed this project in collaboration with Thomas Sounack and the support of the Lindvall Lab to fill this gap.

RYLAND stands for "Research sYstem for LLM-based Analytics of Novel Data." Ryland is the protagonist of Justin's favorite book Project Hail Mary by Andy Weir.

Project Organization

project_ryland/
├── .github/
│   └── workflows/
│       └── publish.yml
├── .gitignore
├── CHANGELOG.md
├── LICENSE
├── project_ryland/
│   ├── __init__.py
│   ├── cli.py
│   ├── llm_utils/
│   │   ├── __init__.py
│   │   ├── llm_config.py
│   │   └── llm_generation_utils.py
│   └── templates/
│       ├── __init__.py
│       ├── quickstart.py
│       └── standard_quickstart/
│           ├── __init__.py
│           ├── llm_prompt_gallery/
│           │   ├── __init__.py
│           │   ├── config_llm_prompts.yaml
│           │   ├── example_prompt_1.txt
│           │   ├── example_prompt_2_with_variables.txt
│           │   ├── example_prompt_2.txt
│           │   ├── keyword_mappings.py
│           │   └── prompt_structs.py
│           ├── project_ryland_quickstart.ipynb
│           └── synthetic_clinical_notes.csv
├── pyproject.toml
└── README.md


Instructions for Use

Installing the GPT4DFCI API

  1. Ensure that you are on the DFCI network or running the VPN client.
  2. Follow the instructions on the Azure website to install the Azure CLI tool. This will be necessary to enable the API for GPT4DFCI.
  3. Once installed, run this command in Terminal (MacOS) or Command Prompt (Windows):
az login --allow-no-subscriptions
  1. Running the prior command will open a window for you to login into your account. Log in.

Installing Project Ryland

  1. You can install Project Ryland using pip:
pip install project-ryland

Using Project Ryland (Quickstart)

  1. Use the quickstart to get off the ground quickly! Run this command from a python script:
from project_ryland.templates.quickstart import create_quickstart
create_quickstart(dest="~/quickstart")

or use the command line tool:

bash project-ryland-init quickstart

Using Project Ryland (Manual)

Note: A copy-paste version of the script is available at the end. Variable definitions can also be found at the end after the example script.

Note: You must be using the VPN Client or be on the DFIC netowrk to use GPT4DFCI.

  1. If this is your first time using Project Ryland, you must install it into your environment. In Terminal or Command Prompt run the following

  2. Import llm_generation_utils from Project Ryland

from project_ryland.llm_utils import llm_generation_utils as llm
  1. In your Jupyter notebook or python script, define your endpoint and entra_scope. The endpoint is user-specific, while the entra_scope is the same for all users (current default for DFCI shown below). These values should have been provided when you were granted GPT4DFCI API access.
  2. Specify the LLM model that you will be using to run your prompts.
ENDPOINT = "https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
ENTRA_SCOPE = "https://cognitiveservices.azure.com/.default"
model_name="gpt-5"
  1. Run the LLM_wrapper function to initialize the API.
    • Note that this only has to be done once per run. You can call the API multiple times in one run
LLM_wrapper = llm.LLM_wrapper(
    model_name,
    endpoint=ENDPOINT,
    entra_scope=ENTRA_SCOPE,
)
  1. Declare the path to your input CSV file.
  2. Declare the path to your LLM Prompt Gallery if you will be utilizing that feature. A template prompt gallery is available for download from the GitHub. Add the prompt gallery to the same directory as your main script. Use of the gallery is highly recommended to track prompts texts, prompt structures, and associated metadata.
input_file = 'pathology_llm_tests.csv'
gallery_path = "llm_prompt_gallery"
  1. Use the generation to obtain your LLM output.
df = LLM_wrapper.process_text_data(
    # Essential to specify
    input_file_path=input_file,
    text_column=text_column,
    format_class=prompt_struct,
    use_prompt_gallery=use_prompt_gallery,

    # Specify if using the prompt gallery, else put None
    prompt_gallery_path=gallery_path,
    prompt_to_get=gallery_prompt,
    user_prompt_vars=user_vars,

    # Specify if NOT using the prompt gallery, else put None
    prompt_text=prompt_text,

    # Optional to specify
    output_dir=output_directory,
    flatten=True,
    sample_mode=sample_mode,
    resume=True,
    keep_checkpoints=False,
    save_every=10,
)

Dictionary

Arguments for process_text_data function

Necessary Arguments at All Times

  • input_file_path specifies the path to your input CSV file (only CSV files are currently accepted).
  • text_column specifies the column within the CSV file that serves as the input to the LLM.
  • format_class specifies the class structure that enforces the desired promopt output.
  • use_prompt_gallery is a boolean (True/False) input that directs the function to use the prompt gallery if set to True. Note that setting this argument to True will override anything specified by the prompt_text argument.

Necessary Arguments if Using Prompt Gallery

  • prompt_gallery_path specifies the path to the prompt gallery.
  • prompt_to_get specifies the prompt name as listed in the prompt gallery.
  • user_prompt_vars specifies the dictionary that contains the key-value pairs between the placeholder variables and the desired user-specified variables to be inputted. See the quickstart example for how this should be done.

Necessary Arguments if Using a User Prompt

  • prompt_text specifies a string that serves as a user-inputted prompt. Use this argument only if the prompt gallery is not being used.

Optional Arguments

  • output_dir specifies the path to the output directory. If the inputted directory does not exist, it will be generated. If not specified, the default output location will be the same as the main script.
  • flatten is a boolean (True/False) that specifies whether to turn the output dictionary into individual columns. Default: True
  • sample_mode is a boolean (True/False) that specifies whether to only process the first 10 rows of the input CSV (sampling the data). It is recommended to use sample_mode when first running new data, prompts, or prompt structures to verify that the intended output is correct. Default: False.
  • resume is a boolean (True/False) that specifies whether to resume from a checkpoint if generation is interrupted. Default: True.
  • keep_checkpoints is a boolean (True/False) that specifies whether checkpoints will be auto-deleted after a run. Setting it to true will keep every generated checkpoint after a generation. Default: False.
  • save_every is an integer that specifies the interval between checkpoints. The default is 10 rows.

License

Project Ryland is released under the MIT License. See LICENSE file for more details.

Support

If you encounter any issues or have questions, please file an issue on the GitHub issue tracker. We appreciate suggestions for improvement as well!

Acknowledgements

Project Ryland was developed with the support of Thomas Sounack and the Lindvall Lab, led by Dr. Charlotta Lindvall, MD, PhD, at the Dana-Farber Cancer Institute. We thank all the contributors for their valuable input and support.

Citation

If you use project_ryland in your research or publications, please cite this repository:

Vinh J, Sounack T. project_ryland: Research sYstem for LLM-based Analytics of Novel Data. GitHub. https://github.com/justin-vinh/project_ryland

You can also use the GitHub “Cite this repository” button on the right sidebar for formatted citations (APA, BibTeX, etc.).

BibTeX

@software{vinh_project_ryland,
  author = {Vinh, Justin and Sounack, Thomas},
  title = {project_ryland: Research sYstem for LLM-based Analytics of Novel Data},
  year = {2026},
  url = {https://github.com/justin-vinh/project_ryland}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

project_ryland-2.1.7.tar.gz (117.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

project_ryland-2.1.7-py3-none-any.whl (119.2 kB view details)

Uploaded Python 3

File details

Details for the file project_ryland-2.1.7.tar.gz.

File metadata

  • Download URL: project_ryland-2.1.7.tar.gz
  • Upload date:
  • Size: 117.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for project_ryland-2.1.7.tar.gz
Algorithm Hash digest
SHA256 b0aa0f888d0066eb28b7c6b03c2159fc34e0dd6567f7ffa3170c83994e18b99e
MD5 5790bf0739477a5344c1d76bb6433e5c
BLAKE2b-256 712fcdf5003a286bf81c29e0b054d41729aef9b53f714a137f166ae725447d1a

See more details on using hashes here.

Provenance

The following attestation bundles were made for project_ryland-2.1.7.tar.gz:

Publisher: publish.yml on justin-vinh/project_ryland

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file project_ryland-2.1.7-py3-none-any.whl.

File metadata

  • Download URL: project_ryland-2.1.7-py3-none-any.whl
  • Upload date:
  • Size: 119.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for project_ryland-2.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 7f1736091a0c239b3dcbf02b9a9e59268ad176f1c288f792cfac1f068ace184d
MD5 8d7a81bd44cba1895fb073f43be8bc67
BLAKE2b-256 28f0248c463b09ade2ec7b2b56a58b6a8d3b6a7e01b01cc4cdd20e908e130274

See more details on using hashes here.

Provenance

The following attestation bundles were made for project_ryland-2.1.7-py3-none-any.whl:

Publisher: publish.yml on justin-vinh/project_ryland

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page