CLI tool for running LLM batch processing jobs on HPC systems
Project description
LLMFlux: LLM Batch Processing Pipeline for HPC Systems
A streamlined solution for running Large Language Models (LLMs) in batch mode on HPC systems powered by Slurm. LLMFlux uses the OpenAI-compatible API format with a JSONL-first architecture, enabling your prompts to flow efficiently through LLM engines at scale.
Architecture
JSONL Input Batch Processing Results
(OpenAI Format) (Ollama/vLLM + Model) (JSON Output)
│ │ │
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────────┐ ┌──────────┐
│ Batch │ │ │ │ Output │
│ Requests │─────────────────▶ │ Model on │─────────────────▶ │ Results │
│ (JSONL) │ │ GPU(s) │ │ (JSON) │
└──────────┘ │ │ └──────────┘
└──────────────┘
LLMFlux processes JSONL files in a standardized OpenAI-compatible batch API format, enabling efficient processing of thousands of prompts on HPC systems with minimal overhead.
Documentation
- Configuration Guide - How to configure LLMFlux
- Models Guide - Supported models and requirements
- Repository Structure - Codebase organization
Installation
pip install llmflux
Or for development:
-
Create and Activate Conda Environment:
conda create -n llmflux python=3.11 -y conda activate llmflux
-
Install Package:
pip install -e .
-
Environment Setup:
cp .env.example .env # Edit .env with your SLURM account and model details
Quick Start
Core Batch Processing on SLURM
The primary workflow for LLMFlux is submitting JSONL files for batch processing on SLURM:
from llmflux.slurm import SlurmRunner
from llmflux.core.config import Config
# Setup SLURM configuration
config = Config()
slurm_config = config.get_slurm_config()
slurm_config.account = "myaccount"
# Initialize runner
runner = SlurmRunner(config=slurm_config)
# Submit JSONL file directly for processing
job_id = runner.run(
input_path="prompts.jsonl",
output_path="results.json",
model="llama3.2:3b",
batch_size=4
)
print(f"Job submitted with ID: {job_id}")
JSONL Input Format
JSONL input format follows the OpenAI Batch API specification:
{"custom_id":"request1","method":"POST","url":"/v1/chat/completions","body":{"model":"llama3.2:3b","messages":[{"role":"system","content":"You are a helpful assistant"},{"role":"user","content":"Explain quantum computing"}],"temperature":0.7,"max_tokens":500}}
{"custom_id":"request2","method":"POST","url":"/v1/chat/completions","body":{"model":"llama3.2:3b","messages":[{"role":"system","content":"You are a helpful assistant"},{"role":"user","content":"What is machine learning?"}],"temperature":0.7,"max_tokens":500}}
For advanced options like custom batch sizes, processing settings, or SLURM configuration, see the Configuration Guide.
For advanced model configuration, see the Models Guide.
Command-Line Interface
LLMFlux includes a command-line interface for submitting batch processing jobs:
# Process JSONL file directly (core functionality)
llmflux run --model llama3.2:3b --input data/prompts.jsonl --output results/output.json
For detailed command options:
llmflux --help
Output Format
Results are saved in the user's workspace:
[
{
"input": {
"custom_id": "request1",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "llama3.2:3b",
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Original prompt text"}
],
"temperature": 0.7,
"max_tokens": 1024
},
"metadata": {
"source_file": "example.txt"
}
},
"output": {
"id": "chat-cmpl-123",
"object": "chat.completion",
"created": 1699123456,
"model": "llama3.2:3b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Generated response text"
},
"finish_reason": "stop"
}
]
},
"metadata": {
"model": "llama3.2:3b",
"timestamp": "2023-11-04T12:34:56.789Z",
"processing_time": 1.23
}
}
]
Utility Converters
LLMFlux provides utility converters to help prepare JSONL files from various input formats:
# Convert CSV to JSONL
llmflux convert csv --input data/papers.csv --output data/papers.jsonl --template "Summarize: {text}"
# Convert directory to JSONL
llmflux convert dir --input data/documents/ --output data/docs.jsonl --recursive
For code examples of converters, see the examples directory.
Benchmarking
LLMFlux ships with a benchmarking workflow that can source prompts, submit the SLURM job, and collect results/metrics for you.
llmflux benchmark --model llama3.2:3b --name nightly --num-prompts 60 \
--account ACCOUNT_NAME --partition PARTITION_NAME --nodes 1
- Prompt sources: omit
--inputto automatically download and cache LiveBench categories (benchmark_data/). Provide--input path/to/prompts.jsonlto reuse an existing JSONL file instead. Use--num-prompts,--temperature, and--max-tokensto control synthetic dataset generation. - Outputs: results default to
results/benchmarks/<name>_results.jsonand a metrics summary (<name>_metrics.txt) containing elapsed SLURM runtime and number of prompts processed. - Batch tuning: adjust
--batch-sizefor throughput. Pass model arguments such as--temperatureand--max-tokensto forward them to the runner. - SLURM overrides: forward scheduler settings with
--account,--partition,--nodes,--gpus-per-node,--time,--mem, and--cpus-per-task. - Job controls: add
--rebuildto force an Apptainer image rebuild or--debugto keep the generated job script for inspection.
For the complete option reference:
llmflux benchmark --help
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmflux-0.1.2.tar.gz.
File metadata
- Download URL: llmflux-0.1.2.tar.gz
- Upload date:
- Size: 43.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
086fd8df1e4e6b0e97442e0832e7a80f5e84102eee4baf4f58e14920229bc721
|
|
| MD5 |
0aff615fd5948a9de668cad68e24bd25
|
|
| BLAKE2b-256 |
142176dff54c44a171240e3e40fb44c3111c6010e466d2961dbfd40135c33459
|
Provenance
The following attestation bundles were made for llmflux-0.1.2.tar.gz:
Publisher:
publish.yml on Center-for-AI-Innovation/ai-flux
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llmflux-0.1.2.tar.gz -
Subject digest:
086fd8df1e4e6b0e97442e0832e7a80f5e84102eee4baf4f58e14920229bc721 - Sigstore transparency entry: 908932090
- Sigstore integration time:
-
Permalink:
Center-for-AI-Innovation/ai-flux@7d563b866be9f228d6fd240d65942b9e6c939c22 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/Center-for-AI-Innovation
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7d563b866be9f228d6fd240d65942b9e6c939c22 -
Trigger Event:
release
-
Statement type:
File details
Details for the file llmflux-0.1.2-py3-none-any.whl.
File metadata
- Download URL: llmflux-0.1.2-py3-none-any.whl
- Upload date:
- Size: 53.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e518e526f87dac671f92c1664ccbf160e540e298ca4f8a7ce9aa45063874a589
|
|
| MD5 |
0349644a782a54086f6ef48492f5c39a
|
|
| BLAKE2b-256 |
5446ec13ac0a844f946c16e79f8894c0cc2d8d8835c5c73aa98b75d1c83eaada
|
Provenance
The following attestation bundles were made for llmflux-0.1.2-py3-none-any.whl:
Publisher:
publish.yml on Center-for-AI-Innovation/ai-flux
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llmflux-0.1.2-py3-none-any.whl -
Subject digest:
e518e526f87dac671f92c1664ccbf160e540e298ca4f8a7ce9aa45063874a589 - Sigstore transparency entry: 908932097
- Sigstore integration time:
-
Permalink:
Center-for-AI-Innovation/ai-flux@7d563b866be9f228d6fd240d65942b9e6c939c22 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/Center-for-AI-Innovation
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7d563b866be9f228d6fd240d65942b9e6c939c22 -
Trigger Event:
release
-
Statement type: