This project contains standardized tools to use LLMs in research studies for improving patient care.

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

jvinh

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Project Ryland

Description

This project enables users to more easily access and use the GPT4DFCI API.

Features

User-friendly interface for using the GPT4DFCI API
Local cost tracking for live estimates of running costs
Automatic logs to keep track of prompts, model used, and costs
A visual progress bar to estimate time until completion
Automatic checkpointing of operations to enable resuming if interrupted
A prompt gallery to help users keep track of prompts and add metadata
Input of user-created prompts for quick plug-and-play usage

The package is still in development and more features will be added with time.

History

This project was conceived in fall 2025 when Justin Vinh noticed that no modular, user-friendly package existed at the Dana-Farber Cancer Institute in Boston, MA, to allow users to take advantage of the newly offered GPT4DFCI. GPT4DFCI is the HIPAA-compliant large language model (LLM) interface offered to researchers, and the associated API can be powerful if utilized. So he developed this project in collaboration with Thomas Sounack and the support of the Lindvall Lab to fill this gap.

RYLAND stands for "Research sYstem for LLM-based Analytics of Novel Data." Ryland is the protagonist of Justin's favorite book Project Hail Mary by Andy Weir.

Project Organization

project_ryland/
├── .github/
│   └── workflows/
│       └── publish.yml
├── .gitignore
├── CHANGELOG.md
├── LICENSE
├── project_ryland/
│   ├── __init__.py
│   ├── cli.py
│   ├── llm_utils/
│   │   ├── __init__.py
│   │   ├── llm_config.py
│   │   └── llm_generation_utils.py
│   └── templates/
│       ├── __init__.py
│       ├── quickstart.py
│       └── standard_quickstart/
│           ├── __init__.py
│           ├── llm_prompt_gallery/
│           │   ├── __init__.py
│           │   ├── config_llm_prompts.yaml
│           │   ├── example_prompt_1.txt
│           │   ├── example_prompt_2_with_variables.txt
│           │   ├── example_prompt_2.txt
│           │   ├── keyword_mappings.py
│           │   └── prompt_structs.py
│           ├── project_ryland_quickstart.ipynb
│           └── synthetic_clinical_notes.csv
├── pyproject.toml
└── README.md

Instructions for General Use

Installing the GPT4DFCI API

Ensure that you are on the DFCI network or running the VPN client.
Follow the instructions on the Azure website to install the Azure CLI tool. This will be necessary to enable the API for GPT4DFCI.
Once installed, run this command in Terminal (MacOS) or Command Prompt (Windows):

az login --allow-no-subscriptions

Running the prior command will open a window for you to login into your account. Log in.

Installing Project Ryland

You can install Project Ryland using pip:

pip install project-ryland

Using Project Ryland (Quickstart)

Note: You must be using the VPN Client or be on the DFIC network to use GPT4DFCI.

Use the quickstart to get off the ground quickly! To create the quickstart in your working directory, run this command from a python script:

from project_ryland.templates.quickstart import create_quickstart
create_quickstart(dest="~/quickstart")

or use the command line tool:

project-ryland-init quickstart

The quickstart contains a template prompt gallery (config_llm_prompts.yaml) , two static prompts (example_prompt_1.txt and example_prompt_2.txt), one dynamic prompt (example_prompt_2_with_variables.txt), and their associated prompt structures (prompt_structs.py). The keyword_mappings.py file contains example user variables to be used with the dynamic prompt. Finally, synthetic_clinical_notes.csv contains generated clinical data for quick demonstration use of the prompts. See below for instructions for how to use the prompt gallery.

The project_ryland_quickstart.ipynb file contains the general code to run Project Ryland.

standard_quickstart/
├── __init__.py
├── llm_prompt_gallery/
│   ├── __init__.py
│   ├── config_llm_prompts.yaml
│   ├── example_prompt_1.txt
│   ├── example_prompt_2_with_variables.txt
│   ├── example_prompt_2.txt
│   ├── keyword_mappings.py
│   └── prompt_structs.py
├── project_ryland_quickstart.ipynb
└── synthetic_clinical_notes.csv

Using Project Ryland (Manual)

Note: A copy-paste version of the script is available at the end. Variable definitions can also be found at the end after the example script.

Note: You must be using the VPN Client or be on the DFIC network to use GPT4DFCI.

If this is your first time using Project Ryland, you must install it into your environment. In Terminal or Command Prompt run the following
Import llm_generation_utils from Project Ryland

from project_ryland.llm_utils import llm_generation_utils as llm

In your Jupyter notebook or python script, define your endpoint and entra_scope. The endpoint is user-specific, while the entra_scope is the same for all users (current default for DFCI shown below). These values should have been provided when you were granted GPT4DFCI API access.
Specify the LLM model that you will be using to run your prompts.
- Model names can be found in the llm_config.py file.

ENDPOINT = "https://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
ENTRA_SCOPE = "https://cognitiveservices.azure.com/.default"
model_name = "gpt-5"

Run the LLM_wrapper function to initialize the API.
- Note that this only has to be done once per run. You can call the API multiple times in one run

LLM_wrapper = llm.LLM_wrapper(
    model_name,
    endpoint=ENDPOINT,
    entra_scope=ENTRA_SCOPE,
)

Declare the path to your input CSV file.
Declare the path to your LLM Prompt Gallery if you will be utilizing that feature. A template prompt gallery is available for download from the GitHub. Add the prompt gallery to the same directory as your main script. Use of the gallery is highly recommended to track prompts texts, prompt structures, and associated metadata.

input_file = 'pathology_llm_tests.csv'
gallery_path = "llm_prompt_gallery"

Use the generation to obtain your LLM output.

df = LLM_wrapper.process_text_data(
    # Essential to specify
    input_file_path=input_file,
    text_column=text_column,
    format_class=prompt_struct,
    use_prompt_gallery=use_prompt_gallery,

    # Specify if using the prompt gallery, else put None
    prompt_gallery_path=gallery_path,
    prompt_to_get=gallery_prompt,
    user_prompt_vars=user_vars,

    # Specify if NOT using the prompt gallery, else put None
    prompt_text=prompt_text,

    # Optional to specify
    output_dir=output_directory,
    flatten=True,
    sample_mode=sample_mode,
    resume=True,
    keep_checkpoints=False,
    save_every=10,
)

Instructions for Using the Prompt Gallery

The prompt gallery was designed by Justin as a method of storing prompt metadata and is made to facilitate iterative prompt design. This metadata is stored in the YAML file shown in the quickstart. Several prompts are already detailed in the template and can be a good place to start. Let's look at one of them:

example_1_prompt:
  filename: example_prompt_1.txt
  description: |
    Determine of what type of cancer the patient has based on the 
    note content.
  author: Sidney Farber
  date: 2025.10.06

The first key example_1_prompt is the name of the prompt and is used in the API call. The prompt name does not need to be the same as the prompt filename.
filename specifies the path to the prompt txt file, relative to the gallery directory. In this case, the txt file is in the same directory as the prompt gallery YAML file and so only the prompt filename is needed.
The other metadata keys like description, author, and date are optional and can be changed to any kind of other metadata suiting the user's needs. A vertical line | allows the user to add a multiline value (as in the case of description).

Dictionary

Arguments for process_text_data function

Necessary Arguments at All Times

input_file_path specifies the path to your input CSV file (only CSV files are currently accepted).
text_column specifies the column within the CSV file that serves as the input to the LLM.
format_class specifies the class structure that enforces the desired promopt output.
use_prompt_gallery is a boolean (True/False) input that directs the function to use the prompt gallery if set to True. Note that setting this argument to True will override anything specified by the prompt_text argument.

Necessary Arguments if Using Prompt Gallery

prompt_gallery_path specifies the path to the prompt gallery.
prompt_to_get specifies the prompt name as listed in the prompt gallery.
user_prompt_vars specifies the dictionary that contains the key-value pairs between the placeholder variables and the desired user-specified variables to be inputted. See the quickstart example for how this should be done.

Necessary Arguments if Using a User Prompt

prompt_text specifies a string that serves as a user-inputted prompt. Use this argument only if the prompt gallery is not being used.

Optional Arguments

output_dir specifies the path to the output directory. If the inputted directory does not exist, it will be generated. If not specified, the default output location will be the same as the main script.
flatten is a boolean (True/False) that specifies whether to turn the output dictionary into individual columns. Default: True
sample_mode is a boolean (True/False) that specifies whether to only process the first 10 rows of the input CSV (sampling the data). It is recommended to use sample_mode when first running new data, prompts, or prompt structures to verify that the intended output is correct. Default: False.
resume is a boolean (True/False) that specifies whether to resume from a checkpoint if generation is interrupted. Default: True.
keep_checkpoints is a boolean (True/False) that specifies whether checkpoints will be auto-deleted after a run. Setting it to true will keep every generated checkpoint after a generation. Default: False.
save_every is an integer that specifies the interval between checkpoints. The default is 10 rows.

License

Project Ryland is released under the MIT License. See LICENSE file for more details.

Support

If you encounter any issues or have questions, please file an issue on the GitHub issue tracker. We appreciate suggestions for improvement as well!

Acknowledgements

Project Ryland was developed with the support of Thomas Sounack and the Lindvall Lab, led by Dr. Charlotta Lindvall, MD, PhD, at the Dana-Farber Cancer Institute. We thank all the contributors for their valuable input and support.

Citation

If you use project_ryland in your research or publications, please cite this repository:

Vinh J, Sounack T. project_ryland: Research sYstem for LLM-based Analytics of Novel Data. GitHub. https://github.com/justin-vinh/project_ryland

You can also use the GitHub “Cite this repository” button on the right sidebar for formatted citations (APA, BibTeX, etc.).

BibTeX

@software{project_ryland,
  author = {Vinh, Justin and Sounack, Thomas},
  title = {project_ryland: Research sYstem for LLM-based Analytics of Novel Data},
  year = {2026},
  url = {https://github.com/justin-vinh/project_ryland}
}

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

jvinh

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

2.10.9

May 6, 2026

2.10.8

May 6, 2026

2.10.7

May 6, 2026

2.10.6

May 6, 2026

2.10.5

May 6, 2026

2.10.4

May 6, 2026

2.10.3

May 6, 2026

2.10.2

May 6, 2026

2.10.1

May 6, 2026

2.10.0

May 6, 2026

2.9.1

May 5, 2026

2.9.0

May 5, 2026

2.8.0

May 5, 2026

2.7.0

May 1, 2026

2.6.1

Mar 25, 2026

2.6.0

Mar 25, 2026

2.5.2

Mar 25, 2026

2.5.1

Mar 25, 2026

2.5.0

Mar 25, 2026

2.4.1

Feb 26, 2026

2.4.0

Feb 24, 2026

2.3.0

Feb 23, 2026

2.2.1

Feb 18, 2026

This version

2.2.0

Feb 18, 2026

2.1.10

Feb 11, 2026

2.1.8

Feb 5, 2026

2.1.7

Feb 5, 2026

2.1.6

Jan 30, 2026

2.1.5

Jan 30, 2026

2.1.4

Jan 30, 2026

2.1.3

Jan 30, 2026

2.1.2

Jan 30, 2026

2.0.7

Jan 29, 2026

2.0.4

Jan 29, 2026

2.0.3

Jan 29, 2026

2.0.2

Jan 29, 2026

2.0.1

Jan 28, 2026

2.0.0

Jan 28, 2026

1.3.11

Jan 28, 2026

1.3.10

Jan 28, 2026

1.3.8

Jan 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

project_ryland-2.2.0.tar.gz (119.9 kB view details)

Uploaded Feb 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

project_ryland-2.2.0-py3-none-any.whl (122.0 kB view details)

Uploaded Feb 18, 2026 Python 3

File details

Details for the file project_ryland-2.2.0.tar.gz.

File metadata

Download URL: project_ryland-2.2.0.tar.gz
Upload date: Feb 18, 2026
Size: 119.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for project_ryland-2.2.0.tar.gz
Algorithm	Hash digest
SHA256	`37c2e15e39108613e3371d3283e99615384edb655cfbdf22e99fe6972d37db38`
MD5	`9967e6ad4164f6ea584055a5c46ce03c`
BLAKE2b-256	`ddb2d2e105bd2a19247c0f272d729716b0e990de27614d26981a7c90fb1d9d1a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for project_ryland-2.2.0.tar.gz:

Publisher: publish.yml on justin-vinh/project_ryland

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: project_ryland-2.2.0.tar.gz
- Subject digest: 37c2e15e39108613e3371d3283e99615384edb655cfbdf22e99fe6972d37db38
- Sigstore transparency entry: 963968640
- Sigstore integration time: Feb 18, 2026
Source repository:
- Permalink: justin-vinh/project_ryland@5a7d3515c6349e5c6558a042abb0941d103279a3
- Branch / Tag: refs/tags/v2.2.0
- Owner: https://github.com/justin-vinh
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@5a7d3515c6349e5c6558a042abb0941d103279a3
- Trigger Event: push

File details

Details for the file project_ryland-2.2.0-py3-none-any.whl.

File metadata

Download URL: project_ryland-2.2.0-py3-none-any.whl
Upload date: Feb 18, 2026
Size: 122.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for project_ryland-2.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f30610fd7216bb536670ea19ad982217ecc8f93ce8332e579446b35d8d40635a`
MD5	`6bdc7287580fac0ccb93b0474fe69697`
BLAKE2b-256	`406f5f81dff8fe3ab552bbf3b702583f5605666436ffc34ffa61fba59063e207`

See more details on using hashes here.

Provenance

The following attestation bundles were made for project_ryland-2.2.0-py3-none-any.whl:

Publisher: publish.yml on justin-vinh/project_ryland

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: project_ryland-2.2.0-py3-none-any.whl
- Subject digest: f30610fd7216bb536670ea19ad982217ecc8f93ce8332e579446b35d8d40635a
- Sigstore transparency entry: 963968699
- Sigstore integration time: Feb 18, 2026
Source repository:
- Permalink: justin-vinh/project_ryland@5a7d3515c6349e5c6558a042abb0941d103279a3
- Branch / Tag: refs/tags/v2.2.0
- Owner: https://github.com/justin-vinh
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@5a7d3515c6349e5c6558a042abb0941d103279a3
- Trigger Event: push

project-ryland 2.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Project Ryland

Description

Features

History

Project Organization

Instructions for General Use

Installing the GPT4DFCI API

Installing Project Ryland

Using Project Ryland (Quickstart)

Using Project Ryland (Manual)

Instructions for Using the Prompt Gallery

Dictionary

Arguments for process_text_data function

Necessary Arguments at All Times

Necessary Arguments if Using Prompt Gallery

Necessary Arguments if Using a User Prompt

Optional Arguments

License

Support

Acknowledgements

Citation

BibTeX

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance