Skip to main content

CLI tool for running LLM batch processing jobs on HPC systems

Project description

LLMFlux: LLM Batch Processing Pipeline for HPC Systems

A streamlined solution for running Large Language Models (LLMs) in batch mode on HPC systems powered by Slurm. LLMFlux uses the OpenAI-compatible API format with a JSONL-first architecture, enabling your prompts to flow efficiently through LLM engines at scale.

PyPI version License: MIT

Architecture

      JSONL Input                    Batch Processing                    Results
   (OpenAI Format)                 (Ollama/vLLM + Model)               (JSON Output)
         │                                 │                                 │
         │                                 │                                 │
         ▼                                 ▼                                 ▼
    ┌──────────┐                   ┌──────────────┐                   ┌──────────┐
    │  Batch   │                   │              │                   │  Output  │
    │ Requests │─────────────────▶ │   Model on   │─────────────────▶ │  Results │
    │  (JSONL) │                   │    GPU(s)    │                   │  (JSON)  │
    └──────────┘                   │              │                   └──────────┘
                                   └──────────────┘                    

LLMFlux processes JSONL files in a standardized OpenAI-compatible batch API format, enabling efficient processing of thousands of prompts on HPC systems with minimal overhead.

Documentation

Installation

pip install llmflux

Or for development:

  1. Create and Activate Conda Environment:

    conda create -n llmflux python=3.11 -y
    conda activate llmflux
    
  2. Install Package:

    pip install -e .
    
  3. Environment Setup:

    cp .env.example .env
    # Edit .env with your SLURM account and model details
    

Quick Start

Core Batch Processing on SLURM

The primary workflow for LLMFlux is submitting JSONL files for batch processing on SLURM:

from llmflux.slurm import SlurmRunner
from llmflux.core.config import Config

# Setup SLURM configuration
config = Config()
slurm_config = config.get_slurm_config()
slurm_config.account = "myaccount"

# Initialize runner
runner = SlurmRunner(config=slurm_config)

# Submit JSONL file directly for processing
job_id = runner.run(
    input_path="prompts.jsonl",
    output_path="results.json",
    model="llama3.2:3b",
    batch_size=4
)
print(f"Job submitted with ID: {job_id}")

JSONL Input Format

JSONL input format follows the OpenAI Batch API specification:

{"custom_id":"request1","method":"POST","url":"/v1/chat/completions","body":{"model":"llama3.2:3b","messages":[{"role":"system","content":"You are a helpful assistant"},{"role":"user","content":"Explain quantum computing"}],"temperature":0.7,"max_tokens":500}}
{"custom_id":"request2","method":"POST","url":"/v1/chat/completions","body":{"model":"llama3.2:3b","messages":[{"role":"system","content":"You are a helpful assistant"},{"role":"user","content":"What is machine learning?"}],"temperature":0.7,"max_tokens":500}}

For advanced options like custom batch sizes, processing settings, or SLURM configuration, see the Configuration Guide.

For advanced model configuration, see the Models Guide.

Command-Line Interface

LLMFlux includes a command-line interface for submitting batch processing jobs:

# Process JSONL file directly (core functionality)
llmflux run --model llama3.2:3b --input data/prompts.jsonl --output results/output.json

For detailed command options:

llmflux --help

Output Format

Results are saved in the user's workspace:

[
  {
    "input": {
      "custom_id": "request1",
      "method": "POST",
      "url": "/v1/chat/completions",
      "body": {
        "model": "llama3.2:3b",
        "messages": [
          {"role": "system", "content": "You are a helpful assistant"},
          {"role": "user", "content": "Original prompt text"}
        ],
        "temperature": 0.7,
        "max_tokens": 1024
      },
      "metadata": {
        "source_file": "example.txt"
      }
    },
    "output": {
      "id": "chat-cmpl-123",
      "object": "chat.completion",
      "created": 1699123456,
      "model": "llama3.2:3b",
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Generated response text"
          },
          "finish_reason": "stop"
        }
      ]
    },
    "metadata": {
      "model": "llama3.2:3b",
      "timestamp": "2023-11-04T12:34:56.789Z",
      "processing_time": 1.23
    }
  }
]

Utility Converters

LLMFlux provides utility converters to help prepare JSONL files from various input formats:

# Convert CSV to JSONL
llmflux convert csv --input data/papers.csv --output data/papers.jsonl --template "Summarize: {text}"

# Convert directory to JSONL
llmflux convert dir --input data/documents/ --output data/docs.jsonl --recursive

For code examples of converters, see the examples directory.

Benchmarking

LLMFlux ships with a benchmarking workflow that can source prompts, submit the SLURM job, and collect results/metrics for you.

llmflux benchmark --model llama3.2:3b --name nightly --num-prompts 60 \
  --account ACCOUNT_NAME --partition PARTITION_NAME --nodes 1
  • Prompt sources: omit --input to automatically download and cache LiveBench categories (benchmark_data/). Provide --input path/to/prompts.jsonl to reuse an existing JSONL file instead. Use --num-prompts, --temperature, and --max-tokens to control synthetic dataset generation.
  • Outputs: results default to results/benchmarks/<name>_results.json and a metrics summary (<name>_metrics.txt) containing elapsed SLURM runtime and number of prompts processed.
  • Batch tuning: adjust --batch-size for throughput. Pass model arguments such as --temperature and --max-tokens to forward them to the runner.
  • SLURM overrides: forward scheduler settings with --account, --partition, --nodes, --gpus-per-node, --time, --mem, and --cpus-per-task.
  • Job controls: add --rebuild to force an Apptainer image rebuild or --debug to keep the generated job script for inspection.

For the complete option reference:

llmflux benchmark --help

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmflux-0.1.3.tar.gz (48.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmflux-0.1.3-py3-none-any.whl (58.1 kB view details)

Uploaded Python 3

File details

Details for the file llmflux-0.1.3.tar.gz.

File metadata

  • Download URL: llmflux-0.1.3.tar.gz
  • Upload date:
  • Size: 48.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llmflux-0.1.3.tar.gz
Algorithm Hash digest
SHA256 3c3b06b8d5b7624778d932277a1d2ce5e1ae05370e62d867b6d51c9e75025e85
MD5 8ac2b085d55e4def4a201c68614eac50
BLAKE2b-256 adfef31ba2bf95354ac5d6ce43abebed41c47763664826c5515d942df06ca92c

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmflux-0.1.3.tar.gz:

Publisher: publish.yml on Center-for-AI-Innovation/LLMFlux

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llmflux-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: llmflux-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 58.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llmflux-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 d4dc8a0c4c74ca4d0e345f63ca5d7a1ffd06b8fcee9e5592b44f4ed58bf69793
MD5 52db538dcd48fce60fcf04285820a5c8
BLAKE2b-256 52925ad8ef0e62df85060c2193c477b5cd37e4f4b7414864f5195eb273b2452a

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmflux-0.1.3-py3-none-any.whl:

Publisher: publish.yml on Center-for-AI-Innovation/LLMFlux

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page