Skip to main content

Ray-centric library for finetuning and evaluation of (large) language models.

Project description

LM Buddy

[!IMPORTANT]

The lm-buddy repo is being archived and its functionality is being folded into Lumigator. For more on the context and decisions behind this, please read here.

LM Buddy is a collection of jobs for finetuning and evaluating open-source (large) language models. The library makes use of YAML-based configuration files as inputs to CLI commands for each job, and tracks input/output artifacts on Weights & Biases.

The package currently exposes two types of jobs:

  1. finetuning job using HuggingFace model/training implementations and Ray Train for compute scaling, or an
  2. evaluation job using lm-evaluation-harness with inference performed via an in-process HuggingFace model or an externally-hosted vLLM server.

Installation

LM Buddy is available on PyPI and can be installed as follows:

pip install lm-buddy

Minimum Python version

LM Buddy is intended to be used in production on a Ray cluster (see section below on Ray job submission). Currently, we are utilizing Ray clusters running Python 3.11.9. In order to avoid dependency/syntax errors when executing LM Buddy on Ray, installation of this package requires Python between [3.11, 3.12).

CLI usage

LM Buddy exposes a CLI with a few commands, one for each type of job. You can explore the CLI options by running lm-buddy --help.

Once LM Buddy is installed in your local Python environment, usage is as follows:

# LLM finetuning
lm_buddy finetune --config finetuning_config.yaml

# LLM evaluation
lm_buddy evaluate lm-harness --config lm_harness_config.yaml
lm_buddy evaluate prometheus --config prometheus_config.yaml

See the examples/configs folder for examples of the job configuration structure. For a full end-to-end interactive workflow for using the package, see the example notebooks.

Ray job submission

Although the LM Buddy CLI can be used as a standalone tool, its commands are intended to be used as the entrypoints for jobs on a Ray compute cluster. The suggested method for submitting an LM Buddy job to Ray is by using the Ray Python SDK within a local Python driver script. This requires you to specify a Ray runtime environment containing:

  1. A working_dir for the local directory containing your job config YAML file, and
  2. A pip dependency for your desired version of lm-buddy.

Additionally, if your job requires GPU resources on the Ray entrypoint worker (e.g., for loading large/quantized models), you should specify the entrypoint_num_gpus parameter upon submission.

An example of the submission process is as follows:

from ray.job_submission import JobSubmissionClient

# If using a remote cluster, replace 127.0.0.1 with the head node's IP address.
client = JobSubmissionClient("http://127.0.0.1:8265")

runtime_env = {
    "working_dir": "/path/to/working/directory",
    "pip": ["lm-buddy==X.X.X"]
    
}

# Assuming 'config.yaml' is present in the working directory
client.submit_job(
    entrypoint="lm_buddy finetune <job-name> --config config.yaml", 
    runtime_env=runtime_env,
    entrypoint_num_gpus=1
)

See the examples/ folder for more examples of submitting Ray jobs.

Development

See the contributing guide for more information on development workflows and/or building locally.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lm-buddy-0.15.0.tar.gz (50.9 kB view details)

Uploaded Source

Built Distribution

lm_buddy-0.15.0-py3-none-any.whl (59.5 kB view details)

Uploaded Python 3

File details

Details for the file lm-buddy-0.15.0.tar.gz.

File metadata

  • Download URL: lm-buddy-0.15.0.tar.gz
  • Upload date:
  • Size: 50.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.10

File hashes

Hashes for lm-buddy-0.15.0.tar.gz
Algorithm Hash digest
SHA256 73a03c0af2192427d045fa2261a58c4ba58f88e7d4f06ff3d31421eedcfe22d7
MD5 2c29e54187b3158c3a47d5cbff3fdf7d
BLAKE2b-256 c54453732b3aa63af37d67da6f45e5a18da1b8ca15f31fc935c06b19bddf2b26

See more details on using hashes here.

File details

Details for the file lm_buddy-0.15.0-py3-none-any.whl.

File metadata

  • Download URL: lm_buddy-0.15.0-py3-none-any.whl
  • Upload date:
  • Size: 59.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.10

File hashes

Hashes for lm_buddy-0.15.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bb5b457dbb6f4d6d15854dbdb5f50f912d4f39828d8521bf4f7afcacfbd5890e
MD5 b75f099ef18afb581d59f23cce4f5940
BLAKE2b-256 86c6dd4a3cb0f3d4fee3bd12ae8bb99288241d4474a7243f8bbefcf1eeab5d90

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page