Skip to main content

This project contains standardized tools to use LLMs in research studies for improving patient care.

Project description

Project Ryland

Description

This project enables users to more easily access and use the DFCI AI API.

Features

  • User-friendly interface for using the DFCI AI API
  • Local cost tracking for live estimates of running costs
  • Automatic logs to keep track of prompts, model used, and costs
  • A visual progress bar to estimate time until completion
  • Automatic checkpointing of operations to enable resuming if interrupted
  • A prompt gallery to help users keep track of prompts and add metadata
  • Input of user-created prompts for quick plug-and-play usage

The package is still in development and more features will be added with time.

History

This project was conceived in fall 2025 when Justin Vinh noticed that no modular, user-friendly package existed at the Dana-Farber Cancer Institute in Boston, MA, to allow users to take advantage of the newly offered DFCI AI API. The DFCI AI API is the HIPAA-compliant large language model (LLM) interface offered to researchers, and the associated API can be powerful if utilized. So he developed this project in collaboration with Thomas Sounack and the support of the Lindvall Lab to fill this gap.

RYLAND stands for "Research sYstem for LLM-based Analytics of Novel Data." Ryland is the protagonist of Justin's favorite book Project Hail Mary by Andy Weir.

Project Organization

project_ryland/
├── .github/
│   └── workflows/
│       └── publish.yml
├── .gitignore
├── CHANGELOG.md
├── LICENSE
├── project_ryland/
│   ├── __init__.py
│   ├── cli.py
│   ├── llm_utils/
│   │   ├── __init__.py
│   │   ├── llm_config.py
│   │   └── llm_generation_utils.py
│   └── templates/
│       ├── __init__.py
│       ├── quickstart.py
│       └── standard_quickstart/
│           ├── __init__.py
│           ├── llm_prompt_gallery/
│           │   ├── __init__.py
│           │   ├── config_llm_prompts.yaml
│           │   ├── example_prompt_1.txt
│           │   ├── example_prompt_2_with_variables.txt
│           │   ├── example_prompt_2.txt
│           │   ├── keyword_mappings.py
│           │   └── prompt_structs.py
│           ├── project_ryland_quickstart.ipynb
│           └── synthetic_clinical_notes.csv
├── pyproject.toml
└── README.md


Instructions for General Use

Installing the DFCI AI API

  1. Ensure that you are on the DFCI network or running the VPN client.
  2. Follow the instructions on the Azure website to install the Azure CLI tool. This will be necessary to enable the DFCI AI API.
  3. Once installed, run this command in Terminal (MacOS) or Command Prompt (Windows):
az login --allow-no-subscriptions
  1. Running the prior command will open a window for you to login into your account. Log in.

Installing Project Ryland

  1. You can install Project Ryland using pip:
pip install project-ryland

Using Project Ryland (Quickstart)

Note: You must be using the VPN Client or be on the DFIC network to use the DFCI AI API.

  1. Use the quickstart to get off the ground quickly! To create the quickstart in your working directory, run this command from a python script:
from project_ryland.templates.quickstart import create_quickstart
create_quickstart(dest="~/quickstart")

or use the command line tool:

project-ryland-init quickstart

The quickstart contains a template prompt gallery (config_llm_prompts.yaml) , two static prompts (example_prompt_1.txt and example_prompt_2.txt), one dynamic prompt (example_prompt_2_with_variables.txt), and their associated prompt structures (prompt_structs.py). The keyword_mappings.py file contains example user variables to be used with the dynamic prompt. Finally, synthetic_clinical_notes.csv contains generated clinical data for quick demonstration use of the prompts. See below for instructions for how to use the prompt gallery.

The project_ryland_quickstart.ipynb file contains the general code to run Project Ryland.

standard_quickstart/
├── __init__.py
├── llm_prompt_gallery/
│   ├── __init__.py
│   ├── config_llm_prompts.yaml
│   ├── example_prompt_1.txt
│   ├── example_prompt_2_with_variables.txt
│   ├── example_prompt_2.txt
│   ├── keyword_mappings.py
│   └── prompt_structs.py
├── project_ryland_quickstart.ipynb
└── synthetic_clinical_notes.csv

Using Project Ryland (Manual)

Note: A copy-paste version of the script is available at the end. Variable definitions can also be found at the end after the example script.

Note: You must be using the VPN Client or be on the DFIC network to use the DFCI AI API.

  1. If this is your first time using Project Ryland, you must install it into your environment. In Terminal or Command Prompt run the following

  2. Import llm_generation_utils from Project Ryland

from project_ryland.llm_utils import llm_generation_utils as llm
  1. In your Jupyter notebook or python script, define your endpoint and entra_scope. The endpoint is user-specific, while the entra_scope is the same for all users (current default for DFCI shown below). These values should have been provided when you were granted DFCI AI API access.
  2. Specify the LLM model that you will be using to run your prompts.
ENDPOINT = "https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
ENTRA_SCOPE = "https://cognitiveservices.azure.com/.default"
model_name = "gpt-5"
  1. Run the LLM_wrapper function to initialize the API.
    • Note that this only has to be done once per run. You can call the API multiple times in one run
LLM_wrapper = llm.LLM_wrapper(
    model_name,
    endpoint=ENDPOINT,
    entra_scope=ENTRA_SCOPE,
)
  1. Declare the path to your input CSV file.
  2. Declare the path to your LLM Prompt Gallery if you will be utilizing that feature. A template prompt gallery is available for download from the GitHub. Add the prompt gallery to the same directory as your main script. Use of the gallery is highly recommended to track prompts texts, prompt structures, and associated metadata.
input_file = 'pathology_llm_tests.csv'
gallery_path = "llm_prompt_gallery"
  1. Use the generation to obtain your LLM output.
df = LLM_wrapper.process_text_data(
    # Essential to specify
    input_file_path=input_file,
    text_column=text_column,
    format_class=prompt_struct,
    use_prompt_gallery=use_prompt_gallery,

    # Specify if using the prompt gallery, else put None
    prompt_gallery_path=gallery_path,
    prompt_to_get=gallery_prompt,
    user_prompt_vars=user_vars,

    # Specify if NOT using the prompt gallery, else put None
    prompt_text=prompt_text,

    # Optional to specify
    output_dir=output_directory,
    flatten=True,
    sample_mode=sample_mode,
    resume=True,
    keep_checkpoints=False,
    save_every=10,
)
  1. Optionally use the summarize_llm_runs function to give a quick summary of the generation metrics of this LLM run (and of all known LLM runs).
df_log = llm.summarize_llm_runs(
    log_path="llm_tracking.log",
    csv_path="llm_run_summaries.csv",
)
df_log.tail()

Instructions for Using the Prompt Gallery

The prompt gallery was designed by Justin as a method of storing prompt metadata and is made to facilitate iterative prompt design. This metadata is stored in the YAML file shown in the quickstart. Several prompts are already detailed in the template and can be a good place to start. Let's look at one of them:

example_1_prompt:
  filename: example_prompt_1.txt
  description: |
    Determine of what type of cancer the patient has based on the 
    note content.
  author: Sidney Farber
  date: 2025.10.06
  • The first key example_1_prompt is the name of the prompt and is used in the API call. The prompt name does not need to be the same as the prompt filename.
  • filename specifies the path to the prompt txt file, relative to the gallery directory. In this case, the txt file is in the same directory as the prompt gallery YAML file and so only the prompt filename is needed.
  • The other metadata keys like description, author, and date are optional and can be changed to any kind of other metadata suiting the user's needs. A vertical line | allows the user to add a multiline value (as in the case of description).

Dictionary

Arguments for process_text_data function

Necessary Arguments at All Times

  • input_file_path specifies the path to your input CSV file (only CSV files are currently accepted).
  • text_column specifies the column within the CSV file that serves as the input to the LLM.
  • format_class specifies the class structure that enforces the desired promopt output.
  • use_prompt_gallery is a boolean (True/False) input that directs the function to use the prompt gallery if set to True. Note that setting this argument to True will override anything specified by the prompt_text argument.

Necessary Arguments if Using Prompt Gallery

  • prompt_gallery_path specifies the path to the prompt gallery.
  • prompt_to_get specifies the prompt name as listed in the prompt gallery.
  • user_prompt_vars specifies the dictionary that contains the key-value pairs between the placeholder variables and the desired user-specified variables to be inputted. See the quickstart example for how this should be done.

Necessary Arguments if Using a User Prompt

  • prompt_text specifies a string that serves as a user-inputted prompt. Use this argument only if the prompt gallery is not being used.

Optional Arguments

  • output_dir specifies the path to the output directory. If the inputted directory does not exist, it will be generated. If not specified, the default output location will be the same as the main script.
  • flatten is a boolean (True/False) that specifies whether to turn the output dictionary into individual columns. Default: True
  • sample_mode is a boolean (True/False) that specifies whether to only process the first 10 rows of the input CSV (sampling the data). It is recommended to use sample_mode when first running new data, prompts, or prompt structures to verify that the intended output is correct. Default: False.
  • resume is a boolean (True/False) that specifies whether to resume from a checkpoint if generation is interrupted. Default: True.
  • keep_checkpoints is a boolean (True/False) that specifies whether checkpoints will be auto-deleted after a run. Setting it to true will keep every generated checkpoint after a generation. Default: False.
  • save_every is an integer that specifies the interval between checkpoints. The default is 10 rows.

License

Project Ryland is released under the MIT License. See LICENSE file for more details.

Support

If you encounter any issues or have questions, please file an issue on the GitHub issue tracker. We appreciate suggestions for improvement as well!

Acknowledgements

Project Ryland was developed with the support of Thomas Sounack and the Lindvall Lab, led by Dr. Charlotta Lindvall, MD, PhD, at the Dana-Farber Cancer Institute. We thank all the contributors for their valuable input and support.

Citation

If you use project_ryland in your research or publications, please cite this repository:

Vinh J, Sounack T. project_ryland: Research sYstem for LLM-based Analytics of Novel Data. GitHub. https://github.com/justin-vinh/project_ryland

You can also use the GitHub “Cite this repository” button on the right sidebar for formatted citations (APA, BibTeX, etc.).

BibTeX

@software{project_ryland,
  author = {Vinh, Justin and Sounack, Thomas},
  title = {project_ryland: Research sYstem for LLM-based Analytics of Novel Data},
  year = {2026},
  url = {https://github.com/justin-vinh/project_ryland}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

project_ryland-2.10.3.tar.gz (122.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

project_ryland-2.10.3-py3-none-any.whl (125.0 kB view details)

Uploaded Python 3

File details

Details for the file project_ryland-2.10.3.tar.gz.

File metadata

  • Download URL: project_ryland-2.10.3.tar.gz
  • Upload date:
  • Size: 122.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for project_ryland-2.10.3.tar.gz
Algorithm Hash digest
SHA256 391d5b11f2eafd4845cd6634806972c7339d254edf7c702fa42bd85779014cb3
MD5 4eca8208e441d4e32812a7fec279f035
BLAKE2b-256 e7021f905a471241bd0af5828a4991784795a8b8768e81b05360358d46dfb737

See more details on using hashes here.

Provenance

The following attestation bundles were made for project_ryland-2.10.3.tar.gz:

Publisher: publish.yml on justin-vinh/project_ryland

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file project_ryland-2.10.3-py3-none-any.whl.

File metadata

  • Download URL: project_ryland-2.10.3-py3-none-any.whl
  • Upload date:
  • Size: 125.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for project_ryland-2.10.3-py3-none-any.whl
Algorithm Hash digest
SHA256 233297aaafb01d1fba66279f5c38b75480494f01d3e87b049b8084037dab8ae0
MD5 99d55b95068b0465e3e847413299acce
BLAKE2b-256 e4b5407b78135657be61a780656937ff2430a356465612b723dee55e186e9b69

See more details on using hashes here.

Provenance

The following attestation bundles were made for project_ryland-2.10.3-py3-none-any.whl:

Publisher: publish.yml on justin-vinh/project_ryland

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page