A Python package for creating and running LLM programs.
Project description
LLM Program
llmprogram is a Python package that provides a structured and powerful way to create and run programs that use Large Language Models (LLMs). It uses a YAML-based configuration to define the behavior of your LLM programs, making them easy to create, manage, and share.
How is llmprogram different?
There are many libraries and frameworks available for working with LLMs. Here’s what makes llmprogram different:
- Focus on Programmatic LLM-Chains:
llmprogramis designed to create self-contained, reusable "programs" that can be chained together to build more complex applications. The YAML-based configuration makes it easy to define and version these programs. - Data Quality and Validation: The built-in input and output validation using JSON schemas ensures that your programs are robust and that the data flowing through them is correct. This is crucial for building reliable LLM-powered applications.
- Dataset Generation as a First-Class Citizen:
llmprogramis designed with the entire lifecycle of an LLM application in mind, from development to production and fine-tuning. The automatic logging to a SQLite database makes it incredibly easy to create high-quality datasets for fine-tuning your own models. - Simplicity and Intuitiveness: The YAML configuration is easy to read and write, and the Python API is simple and intuitive. This makes it easy to get started and to build complex applications without a steep learning curve.
Features
- YAML-based Configuration: Define your LLM programs using simple and intuitive YAML files.
- Input/Output Validation: Use JSON schemas to validate the inputs and outputs of your programs, ensuring data integrity.
- Jinja2 Templating: Use the power of Jinja2 templates to create dynamic prompts for your LLMs.
- Caching: Built-in support for Redis caching to save time and reduce costs.
- Execution Logging: Automatically log program executions to a SQLite database for analysis and debugging.
- Streaming: Support for streaming responses from the LLM.
- Extensible with Tools: Extend the functionality of your programs by adding custom tools (functions) that the LLM can call.
- Batch Processing: Process multiple inputs in parallel for improved performance.
- CLI for Dataset Generation: A command-line interface to generate instruction datasets for LLM fine-tuning from your logged data.
- Web Service: Expose your programs as REST API endpoints with automatic OpenAPI documentation.
- Analytics: Comprehensive analytics tracking with DuckDB for token usage, LLM calls, program usage, and timing metrics.
- AI-Assisted YAML Generation: Generate LLM program YAML files automatically based on natural language descriptions.
Getting Started
Installation
pip install llmprogram
Usage
-
Set your OpenAI API Key:
export OPENAI_API_KEY='your-api-key'
-
Create a program YAML file:
Create a file named
sentiment_analysis.yaml:name: sentiment_analysis description: Analyzes the sentiment of a given text. version: 1.0.0 model: provider: openai name: gpt-4.1-mini temperature: 0.5 max_tokens: 100 response_format: json_object system_prompt: | You are a sentiment analysis expert. Analyze the sentiment of the given text and return a JSON response with the following format: - sentiment (string): "positive", "negative", or "neutral" - score (number): A score from -1 (most negative) to 1 (most positive) input_schema: type: object required: - text properties: text: type: string description: The text to analyze. output_schema: type: object required: - sentiment - score properties: sentiment: type: string enum: ["positive", "negative", "neutral"] score: type: number minimum: -1 maximum: 1 template: | Analyze the following text: {{text}}
-
Run the program using the CLI:
# Using a JSON input file llmprogram run sentiment_analysis.yaml --inputs sentiment_inputs.json # Using inline JSON llmprogram run sentiment_analysis.yaml --input-json '{"text": "I love this product!"}'
Or create a file named
run_sentiment_analysis.py:import asyncio from llmprogram import LLMProgram async def main(): program = LLMProgram('sentiment_analysis.yaml') result = await program(text='I love this new product! It is amazing.') print(result) if __name__ == '__main__': asyncio.run(main())
Run the script:
python run_sentiment_analysis.py
Configuration
The behavior of each LLM program is defined in a YAML file. Here are the key sections:
name,description,version: Basic metadata for your program.model: Defines the LLM provider, model name, and other parameters liketemperatureandmax_tokens.system_prompt: The instructions that are given to the LLM to guide its behavior.input_schema: A JSON schema that defines the expected input for the program. The program will validate the input against this schema before execution.output_schema: A JSON schema that defines the expected output from the LLM. The program will validate the LLM's output against this schema.template: A Jinja2 template that is used to generate the prompt that is sent to the LLM. The template is rendered with the input variables.
Using with other OpenAI-compatible endpoints
You can use llmprogram with any OpenAI-compatible endpoint, such as Ollama. To do this, you can pass the api_key and base_url to the LLMProgram constructor:
program = LLMProgram(
'your_program.yaml',
api_key='your-api-key', # optional, defaults to OPENAI_API_KEY env var
base_url='http://localhost:11434/v1' # example for Ollama
)
Caching
llmprogram supports caching of LLM responses to Redis to improve performance and reduce costs. To enable caching, you need to have a Redis server running.
By default, caching is enabled. You can disable it or configure the Redis connection and cache TTL (time-to-live) when you create an LLMProgram instance:
program = LLMProgram(
'your_program.yaml',
enable_cache=True,
redis_url="redis://localhost:6379",
cache_ttl=3600 # in seconds
)
Logging and Dataset Generation
llmprogram automatically logs every execution of a program to a SQLite database. The database file is created in the same directory as the program YAML file, with a .db extension.
This logging feature is not just for debugging; it's also a powerful tool for creating high-quality datasets for fine-tuning your own LLMs. Each record in the log contains:
function_input: The input given to the program.function_output: The output received from the LLM.llm_input: The prompt sent to the LLM.llm_output: The raw response from the LLM.
Generating a Dataset
You can use the built-in CLI to generate an instruction dataset from the logged data. The dataset is created in JSONL format, which is commonly used for fine-tuning.
llmprogram generate-dataset /path/to/your_program.db /path/to/your_dataset.jsonl
Each line in the output file will be a JSON object with the following keys:
instruction: The system prompt and the user prompt, combined to form the instruction for the LLM.output: The output from the LLM.
Command-Line Interface (CLI)
llmprogram comes with a command-line interface for common tasks.
run
Run an LLM program with inputs from command line or files.
Usage:
# First, set your OpenAI API key
export OPENAI_API_KEY='your-api-key'
# Run with inputs from a JSON file
llmprogram run program.yaml --inputs inputs.json
# Run with inputs from command line
llmprogram run program.yaml --input-json '{"text": "I love this product!"}'
# Run with inputs from stdin
echo '{"text": "I love this product!"}' | llmprogram run program.yaml
# Run with streaming output
llmprogram run program.yaml --inputs inputs.json --stream
# Save output to a file
llmprogram run program.yaml --inputs inputs.json --output result.json
Arguments:
program_path: The path to the program YAML file.--inputs,-i: Path to JSON/YAML file containing inputs.--input-json: JSON string of inputs.--output,-o: Path to output file (default: stdout).--stream,-s: Stream the response.
generate-yaml
Generate an LLM program YAML file based on a description using an AI assistant.
Usage:
# Generate a YAML program with a simple description
llmprogram generate-yaml "Create a program that analyzes the sentiment of text" --output sentiment_analyzer.yaml
# Generate a YAML program with examples
llmprogram generate-yaml "Create a program that extracts key information from customer reviews" \
--example-input "The battery life on this phone is amazing! It lasts all day." \
--example-output '{"product_quality": "positive", "battery": "positive", "durability": "neutral"}' \
--output review_analyzer.yaml
# Generate a YAML program and output to stdout
llmprogram generate-yaml "Create a program that summarizes long texts"
Arguments:
description: A detailed description of what the LLM program should do.--example-input: Example of the input the program will receive.--example-output: Example of the output the program should generate.--output,-o: Path to output YAML file (default: stdout).--api-key: OpenAI API key (optional, defaults to OPENAI_API_KEY env var).
analytics
Show analytics data collected from LLM program executions.
Usage:
# Show all analytics data
llmprogram analytics
# Show analytics for a specific program
llmprogram analytics --program sentiment_analysis
# Show analytics for a specific model
llmprogram analytics --model gpt-4
# Use a custom analytics database path
llmprogram analytics --db-path /path/to/custom/analytics.duckdb
Arguments:
--db-path: Path to the analytics database (default: llmprogram_analytics.duckdb).--program: Filter by program name.--model: Filter by model name.
generate-dataset
Generate an instruction dataset for LLM fine-tuning from a SQLite log file.
Usage:
llmprogram generate-dataset <database_path> <output_path>
Arguments:
database_path: The path to the SQLite database file.output_path: The path to write the generated dataset to.
Web Service
llmprogram includes a built-in web service that exposes your LLM programs as REST API endpoints with automatic OpenAPI documentation.
Running the Web Service
To run the web service, use the llmprogram-web command:
# Run the web service with default settings (examples directory, localhost:8000)
llmprogram-web
# Run the web service with custom directory
llmprogram-web --directory /path/to/your/programs
# Run the web service on a different host/port
llmprogram-web --host 0.0.0.0 --port 8080
# Run with auto-reload for development
llmprogram-web --reload
# Use a custom analytics database path
llmprogram-web --analytics-db /path/to/custom/analytics.duckdb
API Endpoints
The web service automatically generates REST endpoints for each YAML file in your programs directory:
GET /- Root endpoint with API informationGET /programs- List all available programsGET /programs/{program_name}- Get detailed information about a specific programPOST /programs/{program_name}/run- Run a specific programGET /analytics/llm-calls- Get LLM call statisticsGET /analytics/program-usage- Get program usage statisticsGET /analytics/token-usage- Get token usage statistics
For each program, the service generates:
- A POST endpoint at
/programs/{program_name}/run - Automatic request/response validation based on the program's input/output schemas
- Full OpenAPI documentation at
/docsand/redoc - OpenAPI specification at
/openapi.json
Analytics Endpoints
The web service includes comprehensive analytics endpoints:
GET /analytics/llm-calls- Get LLM call statistics including call count, token usage, execution time, cache hits, and unique usersGET /analytics/program-usage- Get program usage statistics including usage count, successful/failed calls, execution time, and unique usersGET /analytics/token-usage- Get token usage statistics including prompt/completion tokens, total tokens, estimated cost, and unique users
All analytics endpoints support filtering by program name, model name, and date range.
Example Usage
After starting the web service, you can interact with it using curl or any HTTP client:
# List available programs
curl http://localhost:8000/programs
# Get information about a specific program
curl http://localhost:8000/programs/sentiment_analysis
# Run a program
curl -X POST http://localhost:8000/programs/sentiment_analysis/run \
-H "Content-Type: application/json" \
-d '{"inputs": {"text": "I love this product!"}}'
# Get LLM call statistics
curl http://localhost:8000/analytics/llm-calls
# Get program usage statistics for a specific program
curl "http://localhost:8000/analytics/program-usage?program_name=sentiment_analysis"
# Get token usage statistics with filtering
curl "http://localhost:8000/analytics/token-usage?program_name=sentiment_analysis&model_name=gpt-4"
OpenAPI Documentation
The web service automatically generates comprehensive OpenAPI documentation:
- Interactive API documentation: http://localhost:8000/docs
- ReDoc documentation: http://localhost:8000/redoc
- Raw OpenAPI JSON specification: http://localhost:8000/openapi.json
The generated OpenAPI specification includes:
- Endpoint definitions for each program
- Request/response schemas based on your program's input/output schemas
- Example requests and responses
- Detailed descriptions from your program's metadata
- Analytics endpoints with filter parameters
Examples
You can find more examples in the examples directory:
- Sentiment Analysis: A simple program to analyze the sentiment of a piece of text. (
examples/sentiment_analysis.yaml)
To run the examples:
-
Navigate to the
examplesdirectory. -
Run the corresponding
run_*.pyscript, or use the CLI:# Using the CLI with a JSON input file poetry run llmprogram run sentiment_analysis.yaml --inputs sentiment_inputs.json # Using the CLI with inline JSON poetry run llmprogram run sentiment_analysis.yaml --input-json '{"text": "I love this product!"}' # Using the CLI with batch processing poetry run llmprogram run sentiment_analysis.yaml --inputs sentiment_batch_inputs.json # Using the CLI with streaming poetry run llmprogram run sentiment_analysis.yaml --inputs sentiment_inputs.json --stream # Using the CLI and saving output to a file poetry run llmprogram run sentiment_analysis.yaml --inputs sentiment_inputs.json --output result.json # View analytics data poetry run llmprogram analytics # View analytics for a specific program poetry run llmprogram analytics --program sentiment_analysis # Generate a new YAML program poetry run llmprogram generate-yaml "Create a program that classifies email priority" \ --example-input "Subject: Urgent meeting tomorrow. Body: Please prepare the Q3 report." \ --example-output '{"priority": "high", "category": "work", "response_required": true}' \ --output email_classifier.yaml
-
Or run the web service:
# Run the web service poetry run llmprogram-web --directory examples # Then interact with it using curl or any HTTP client curl -X POST http://localhost:8000/programs/sentiment_analysis/run \ -H "Content-Type: application/json" \ -d '{"inputs": {"text": "I love this product!"}}' # View analytics via the web API curl http://localhost:8000/analytics/llm-calls curl http://localhost:8000/analytics/program-usage curl http://localhost:8000/analytics/token-usage
Other examples:
- Code Generator: A program that generates Python code from a natural language description. (
examples/code_generator.yaml) - Email Generator: A program that generates a professional email based on a few inputs. (
examples/email_generator.yaml)
To run the examples, navigate to the examples directory and run the corresponding run_*.py script or use the CLI as shown above.
Development
To run the tests for this package, you will need to install pytest:
pip install pytest
Then, you can run the tests from the root directory of the project:
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmprogram-0.1.2.tar.gz.
File metadata
- Download URL: llmprogram-0.1.2.tar.gz
- Upload date:
- Size: 25.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
46a9baaf6616b4d742caab27df0211f7b1d74ce7ff21cced4f5ba7c54b51c74c
|
|
| MD5 |
d8cd250e37ba1cd7985fac89b7585217
|
|
| BLAKE2b-256 |
efddc9d12bb7f01ae786d61bcb4020098ace4c316b9e92c8ffdf8e8d6fc1669e
|
Provenance
The following attestation bundles were made for llmprogram-0.1.2.tar.gz:
Publisher:
publish.yml on Skelf-Research/llmprogram
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llmprogram-0.1.2.tar.gz -
Subject digest:
46a9baaf6616b4d742caab27df0211f7b1d74ce7ff21cced4f5ba7c54b51c74c - Sigstore transparency entry: 384038739
- Sigstore integration time:
-
Permalink:
Skelf-Research/llmprogram@215f50ce669520247986ee79853f53cdb297a75a -
Branch / Tag:
refs/tags/0.1.2 - Owner: https://github.com/Skelf-Research
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@215f50ce669520247986ee79853f53cdb297a75a -
Trigger Event:
release
-
Statement type:
File details
Details for the file llmprogram-0.1.2-py3-none-any.whl.
File metadata
- Download URL: llmprogram-0.1.2-py3-none-any.whl
- Upload date:
- Size: 24.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
154f1b4b4d5eafde25ca03db44dc0cb3e68ae38bb18c0d12bd00961eeb0124f8
|
|
| MD5 |
37de4c69a2de92769a00cd1648fdde17
|
|
| BLAKE2b-256 |
75632ddc6f40281035f61ed4c308c63307653a96b3f59a5020f1e5667052b4d0
|
Provenance
The following attestation bundles were made for llmprogram-0.1.2-py3-none-any.whl:
Publisher:
publish.yml on Skelf-Research/llmprogram
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llmprogram-0.1.2-py3-none-any.whl -
Subject digest:
154f1b4b4d5eafde25ca03db44dc0cb3e68ae38bb18c0d12bd00961eeb0124f8 - Sigstore transparency entry: 384038742
- Sigstore integration time:
-
Permalink:
Skelf-Research/llmprogram@215f50ce669520247986ee79853f53cdb297a75a -
Branch / Tag:
refs/tags/0.1.2 - Owner: https://github.com/Skelf-Research
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@215f50ce669520247986ee79853f53cdb297a75a -
Trigger Event:
release
-
Statement type: