CLI tool for running LLM batch processing jobs on HPC systems

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

LLMFlux: LLM Batch Processing Pipeline for HPC Systems

A streamlined solution for running Large Language Models (LLMs) in batch mode on HPC systems powered by Slurm. LLMFlux uses the OpenAI-compatible API format with a JSONL-first architecture, enabling your prompts to flow efficiently through LLM engines at scale.

Architecture

      JSONL Input                    Batch Processing                    Results
   (OpenAI Format)                 (Ollama/vLLM + Model)               (JSON Output)
         │                                 │                                 │
         │                                 │                                 │
         ▼                                 ▼                                 ▼
    ┌──────────┐                   ┌──────────────┐                   ┌──────────┐
    │  Batch   │                   │              │                   │  Output  │
    │ Requests │─────────────────▶ │   Model on   │─────────────────▶ │  Results │
    │  (JSONL) │                   │    GPU(s)    │                   │  (JSON)  │
    └──────────┘                   │              │                   └──────────┘
                                   └──────────────┘

LLMFlux processes JSONL files in a standardized OpenAI-compatible batch API format, enabling efficient processing of thousands of prompts on HPC systems with minimal overhead.

Documentation

Configuration Guide - How to configure LLMFlux
Models Guide - Supported models and requirements
Repository Structure - Codebase organization

Installation

pip install llmflux

Or for development:

Create and Activate Conda Environment:

conda create -n llmflux python=3.11 -y
conda activate llmflux

Install Package:
```
pip install -e .
```

Environment Setup:

cp .env.example .env
# Edit .env with your SLURM account and model details

Confirm the installation by running a base command and ensuring your system gives the correct output:

$llmflux -h
usage: llmflux [-h] [--version] {run,benchmark,show-models,jobs,status,logs,cancel} ...

LLMFlux CLI

positional arguments:
  {run,benchmark,show-models,jobs,status,logs,cancel}
    run                 Submit a batch processing job
    benchmark           Run a benchmark job
    show-models         List all available model keys from models.yaml
    jobs                List LLMFlux tracked Slurm jobs
    status              Show detailed status for a job
    logs                Show last lines of stdout and stderr for a tracked job
    cancel              Cancel a tracked running/pending job

options:
  -h, --help            show this help message and exit
  --version, -V         Show llmflux version and exit

Quick Start

Core Batch Processing on SLURM

The primary workflow for LLMFlux is submitting JSONL files for batch processing on SLURM:

from llmflux.slurm import SlurmRunner
from llmflux.core.config import Config

# Setup SLURM configuration
config = Config()
slurm_config = config.get_slurm_config()
slurm_config.account = "myaccount"

# Initialize runner
runner = SlurmRunner(config=slurm_config)

# Submit JSONL file directly for processing
job_id = runner.run(
    input_path="prompts.jsonl",
    output_path="results.json",
    model="llama3.2:3b",
    batch_size=4
)
print(f"Job submitted with ID: {job_id}")

JSONL Input Format

JSONL input format follows the OpenAI Batch API specification:

{"custom_id":"request1","method":"POST","url":"/v1/chat/completions","body":{"model":"llama3.2:3b","messages":[{"role":"system","content":"You are a helpful assistant"},{"role":"user","content":"Explain quantum computing"}],"temperature":0.7,"max_tokens":500}}
{"custom_id":"request2","method":"POST","url":"/v1/chat/completions","body":{"model":"llama3.2:3b","messages":[{"role":"system","content":"You are a helpful assistant"},{"role":"user","content":"What is machine learning?"}],"temperature":0.7,"max_tokens":500}}

For advanced options like custom batch sizes, processing settings, or SLURM configuration, see the Configuration Guide.

For advanced model configuration, see the Models Guide.

Command-Line Interface

LLMFlux includes a command-line interface for submitting batch processing jobs. It uses Ollama as it's default engine, and model configurations rely on the Ollama naming scheme. To process your prompts.jsonl file using the Ollama engine running the llama3.2 model with 3b parameters, you would run the command:

# Process JSONL file directly (core functionality)
llmflux run --model Llama-3.2-3B-Instruct --input data/prompts.jsonl --output results/output.json

In addition to the default OLLAMA engine, LLMFlux can also be run using vLLM, to take advantage of HuggingFace models. In order to use a model that requires a HuggingFace key, you will first need to update the default .env parameter to use your personal token. You then can call using the names as established in the templates dir:

# Process JSONL file using VLLM backend
llmflux run --model Llama-3.2-3B-Instruct --input data/prompts.jsonl --output results/output.json --engine=vllm

This will run the same as above, using VLLM as the backend interface. If you wanted to run mistral-lite, for example, checking the file mistral-lite/7b.yaml reveals the name: "mistrallite:7b". Update to the appropriate HuggingFace key and run

# Process JSONL file using VLLM backend
llmflux run --model MistralLite --input data/prompts.jsonl --output results/output.json --engine=vllm

this will run the model, as noted in the config, by searching HuggingFace for hf_name: "amazon/MistralLite". You will need to check an existing model file from the folder src/llmflux/templates to find a configuration that matches what you want and use the name as the argument for the --model argument.

Note that in order to use some HuggingFace models, you will need a key from HF. Once you have a token, update your local copy of the .env file and add or change this line:

HUGGINGFACE_TOKEN=hf_XXXXXXXXXXXXXXX

to use the token, replace the hf_XXXX piece with your token. For some gated repos, you will have to visit the huggingface repository directly and activate access (often by accepting a terms and conditions agreement). You may also need to adjust settings on your HF token to ensure that LLMFlux has proper rights to access the model. In addition, the model will by default be stored in your base directory: ~/.cache/huggingfacel/hub. To change this, you can add the following parameter to your .env file:

HF_HOME=/path/to/dir

llmflux will automatically download the appropriate models for both OLLAMA and vLLM.

For detailed command options:

llmflux --help

Output Format

Results are saved in the user's workspace:

[
  {
    "input": {
      "custom_id": "request1",
      "method": "POST",
      "url": "/v1/chat/completions",
      "body": {
        "model": "llama3.2:3b",
        "messages": [
          {"role": "system", "content": "You are a helpful assistant"},
          {"role": "user", "content": "Original prompt text"}
        ],
        "temperature": 0.7,
        "max_tokens": 1024
      },
      "metadata": {
        "source_file": "example.txt"
      }
    },
    "output": {
      "id": "chat-cmpl-123",
      "object": "chat.completion",
      "created": 1699123456,
      "model": "llama3.2:3b",
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Generated response text"
          },
          "finish_reason": "stop"
        }
      ]
    },
    "metadata": {
      "model": "llama3.2:3b",
      "timestamp": "2023-11-04T12:34:56.789Z",
      "processing_time": 1.23
    }
  }
]

Utility Converters

LLMFlux provides utility converters to help prepare JSONL files from various input formats:

# Convert CSV to JSONL
llmflux convert csv --input data/papers.csv --output data/papers.jsonl --template "Summarize: {text}"

# Convert directory to JSONL
llmflux convert dir --input data/documents/ --output data/docs.jsonl --recursive

For code examples of converters, see the examples directory.

Benchmarking

LLMFlux ships with a benchmarking workflow that can source prompts, submit the SLURM job, and collect results/metrics for you.

llmflux benchmark \
    --model Llama-3.2-3B-Instruct \
    --name nightly \
    --num-prompts 60 \
    --account ACCOUNT_NAME \
    --partition PARTITION_NAME \
    --nodes 1

Prompt sources: omit --input to automatically download and cache LiveBench categories (benchmark_data/). Provide --input path/to/prompts.jsonl to reuse an existing JSONL file instead. Use --num-prompts, --temperature, and --max-tokens to control synthetic dataset generation.
Outputs: results default to results/benchmarks/<name>_results.json and a metrics summary (<name>_metrics.txt) containing elapsed SLURM runtime and number of prompts processed.
Batch tuning: adjust --batch-size for throughput. Pass model arguments such as --temperature and --max-tokens to forward them to the runner.
SLURM overrides: forward scheduler settings with --account, --partition, --nodes, --gpus-per-node, --time, --mem, and --cpus-per-task.
Job controls: add --rebuild to force an Apptainer image rebuild or --debug to keep the generated job script for inspection.

For the complete option reference:

llmflux benchmark --help

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

License

MIT License

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

rohan-uiuc

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.5

Mar 18, 2026

This version

0.1.5rc1 pre-release

Mar 18, 2026

0.1.4

Mar 10, 2026

0.1.3

Mar 5, 2026

0.1.3rc1 pre-release

Mar 4, 2026

0.1.2

Feb 3, 2026

0.1.1

Jan 13, 2026

0.1.0

Jan 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmflux-0.1.5rc1.tar.gz (53.1 kB view details)

Uploaded Mar 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmflux-0.1.5rc1-py3-none-any.whl (59.8 kB view details)

Uploaded Mar 18, 2026 Python 3

File details

Details for the file llmflux-0.1.5rc1.tar.gz.

File metadata

Download URL: llmflux-0.1.5rc1.tar.gz
Upload date: Mar 18, 2026
Size: 53.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llmflux-0.1.5rc1.tar.gz
Algorithm	Hash digest
SHA256	`53789d3b29a934384d25b406a647b097a4e58f3b0d731d5192c65bdfabfbeca3`
MD5	`1e03ec83957c0c00b9befba9eb87b8bd`
BLAKE2b-256	`62364ea1594494e97b8ad567e3477ebd7977de380238cd0911afd9e1287b2274`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmflux-0.1.5rc1.tar.gz:

Publisher: publish.yml on Center-for-AI-Innovation/LLMFlux

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llmflux-0.1.5rc1.tar.gz
- Subject digest: 53789d3b29a934384d25b406a647b097a4e58f3b0d731d5192c65bdfabfbeca3
- Sigstore transparency entry: 1125071880
- Sigstore integration time: Mar 18, 2026
Source repository:
- Permalink: Center-for-AI-Innovation/LLMFlux@0bab1303cbedac5b992ee968fb888b54cd1d2230
- Branch / Tag: refs/tags/v0.1.5rc1
- Owner: https://github.com/Center-for-AI-Innovation
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0bab1303cbedac5b992ee968fb888b54cd1d2230
- Trigger Event: release

File details

Details for the file llmflux-0.1.5rc1-py3-none-any.whl.

File metadata

Download URL: llmflux-0.1.5rc1-py3-none-any.whl
Upload date: Mar 18, 2026
Size: 59.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llmflux-0.1.5rc1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`efebaaf59bb4f765149498e8cbfc1e658b23537ee2a5d923f741841a2a060612`
MD5	`83e5fabd48c271db5eb97375f612b43c`
BLAKE2b-256	`4cd651855281035dad026144383e049d857355d4af3dda9a1ed8215535e16b1c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmflux-0.1.5rc1-py3-none-any.whl:

Publisher: publish.yml on Center-for-AI-Innovation/LLMFlux

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llmflux-0.1.5rc1-py3-none-any.whl
- Subject digest: efebaaf59bb4f765149498e8cbfc1e658b23537ee2a5d923f741841a2a060612
- Sigstore transparency entry: 1125071973
- Sigstore integration time: Mar 18, 2026
Source repository:
- Permalink: Center-for-AI-Innovation/LLMFlux@0bab1303cbedac5b992ee968fb888b54cd1d2230
- Branch / Tag: refs/tags/v0.1.5rc1
- Owner: https://github.com/Center-for-AI-Innovation
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0bab1303cbedac5b992ee968fb888b54cd1d2230
- Trigger Event: release

llmflux 0.1.5rc1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

LLMFlux: LLM Batch Processing Pipeline for HPC Systems

Architecture

Documentation

Installation

Quick Start

Core Batch Processing on SLURM

JSONL Input Format

Command-Line Interface

Output Format

Utility Converters

Benchmarking

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance