CLI tool for querying multiple models with prompts from a CSV with schema support
Project description
ParaLLM
ParaLLM is a command-line tool and Python package for efficiently querying language models. It supports batch processing with multiple prompts and models, and includes structured JSON output via schemas.
Features
- Multi-Model Querying: Query multiple LLMs simultaneously, comparing their outputs
- CSV Input/Output: Use CSV files for batch processing of prompts
- Structured JSON Output: Get responses formatted to JSON schemas or Pydantic models
- High Performance: Leverages Bodo for parallel execution of model queries. RAG functionality uses regular pandas for simplicity and reliability.
- Multiple Providers: Support for OpenAI, AWS Bedrock, and Google Gemini
Installation
pip install parallm
Note: ParaLLM requires Python 3.9+ due to Bodo’s minimum version.
Or install from source:
git clone https://github.com/strangeloopcanon/parallm.git
cd parallm
pip install -e .
You'll need to set up your API keys. For AWS Bedrock, ensure you have AWS credentials configured. For Gemini, set the GEMINI_API_KEY environment variable. The llm package is installed automatically.
Command-Line Usage
Batch Processing (CSV Files)
Process multiple prompts from a CSV file with one or more models:
# Default mode (OpenAI/llm)
parallm default data/prompts.csv --models gpt-4 claude-3-sonnet-20240229
# AWS Bedrock mode
parallm aws data/prompts.csv --models anthropic.claude-3-sonnet-20240229 amazon.titan-text-express-v1
# Gemini mode
parallm gemini data/prompts.csv --models gemini-2.0-flash
Single Prompt Processing
Process a single prompt with optional repeat functionality:
# Default mode (OpenAI/llm)
parallm default "What is the capital of France?" --models gpt-4 --repeat 5
# AWS Bedrock mode
parallm aws "What is the capital of France?" --models amazon.titan-text-express-v1 --repeat 5
# Gemini mode
parallm gemini "What is the capital of France?" --models gemini-2.0-flash --repeat 5
Structured Output
Get responses formatted according to a JSON schema or Pydantic model:
# Using a JSON schema
parallm default data/prompts.csv --models gpt-4o --schema '{
"type": "object",
"properties": {
"answer": {"type": "string"},
"confidence": {"type": "number", "minimum": 0, "maximum": 1}
},
"required": ["answer", "confidence"]
}'
# Using a schema from file
parallm default data/prompts.csv --models gpt-4o --schema schema.json
# Using a Pydantic model
parallm default data/prompts.csv --models gpt-4o --pydantic models.py:ResponseModel
Python API Usage
Batch Processing
from parallm import query_model_all, bedrock_query_model_all, gemini_query_model_all
# Default mode (OpenAI/llm)
df = query_model_all("data/prompts.csv", ["gpt-4", "claude-3-sonnet-20240229"])
print(df)
# AWS Bedrock
df = bedrock_query_model_all("data/prompts.csv", ["anthropic.claude-3-sonnet-20240229"])
print(df)
# Gemini
df = gemini_query_model_all("data/prompts.csv", ["gemini-2.0-flash"])
print(df)
Single Prompt Processing
from parallm import query_model_repeat, bedrock_query_model_repeat, gemini_query_model_repeat
# Default mode (OpenAI/llm)
df = query_model_repeat("What is the capital of France?", "gpt-4o", repeat=5)
print(df)
# AWS Bedrock
df = bedrock_query_model_repeat("What is the capital of France?", "amazon.titan-text-express-v1", repeat=5)
print(df)
# Gemini
df = gemini_query_model_repeat("What is the capital of France?", "gemini-2.0-flash", repeat=5)
print(df)
Structured Output
from parallm import query_model_json
from pydantic import BaseModel
# Using a Pydantic model
class Response(BaseModel):
answer: str
confidence: float
result = query_model_json("What is the capital of France?", "gpt-4o", schema=Response)
print(result)
Retrieval-Augmented Generation (RAG)
ParaLLM now includes a modular RAG pipeline to allow querying language models with context retrieved from your own documents.
Overview
The RAG system processes your documents through a configurable pipeline:
- Ingestion: Loads documents from a specified directory. Supports
.txt,.pdf,.docx, and.html/.htmfiles. - Chunking: Splits documents into smaller chunks using different strategies:
fixed_size: Overlapping chunks of a defined character size.semantic: Groups sentences together (using NLTK).
- Embedding: Generates vector embeddings for each chunk using a specified Sentence Transformer model (e.g.,
all-MiniLM-L6-v2). - Indexing: Stores the chunks, embeddings, and metadata in:
- A vector store (currently ChromaDB) for semantic search.
- A keyword index (using BM25) for lexical search.
When querying, the system retrieves relevant chunks using vector search, keyword search, or a hybrid combination, augments the prompt with this context, and then sends it to the specified language model.
Configuration (rag_config.yaml)
The entire RAG pipeline is configured using a YAML file (e.g., rag_config.yaml). This file defines the sequence of steps, parameters for each step (like source paths, chunking strategy, embedding model, index paths), and the retrieval strategy.
See examples/rag_config.yaml for a detailed example.
RAG CLI Usage
Use the rag subcommand for building indexes and querying.
1. Build the RAG Index:
This command runs the ingestion, chunking, embedding, and indexing pipeline defined in your config file.
python -m parallm rag build --config path/to/your_rag_config.yaml
- Replace
path/to/your_rag_config.yamlwith the actual path to your configuration file. - This needs to be run once initially and then again whenever your source documents or pipeline configuration change.
- Indexes (ChromaDB, BM25 pickle file) will be created/updated based on paths specified in the config.
2. Query the RAG System:
This command uses a previously built index to retrieve context, augment a prompt, and query an LLM.
python -m parallm rag query --config path/to/your_rag_config.yaml --query "Your question here?" --llm-model gpt-4o-mini
--config: Specifies the RAG configuration file (used to load the retriever and embedding models).--query/-q: The question you want to ask.--llm-model: (Optional) The language model to use for generating the final answer (defaults to the model specified in the script, e.g.,gpt-4o-mini).
RAG Dependencies
Using the RAG features requires additional dependencies:
PyYAML # For parsing rag_config.yaml
sentence-transformers # For embedding generation
chromadb # Vector store
rank_bm25 # Keyword indexing
pypdf # PDF ingestion
python-docx # DOCX ingestion
beautifulsoup4 # HTML ingestion
lxml # HTML parsing backend for beautifulsoup4
nltk # Semantic chunking (sentence tokenization)
reportlab # Required by test suite to generate test PDFs
Ensure NLTK's punkt tokenizer data is downloaded:
python -m nltk.downloader punkt
Testing
The project includes comprehensive test coverage for all RAG functionality. Run tests with:
# Run all tests
python -m pytest
# Run only RAG tests
python -m pytest tests/test_rag/
# Run with verbose output
python -m pytest tests/test_rag/ -v
All 35 RAG tests currently pass, covering:
- Document ingestion (TXT, PDF, DOCX, HTML)
- Text chunking (fixed-size and semantic strategies)
- Embedding generation
- BM25 indexing and retrieval
- Configuration loading and validation
CSV Format
Your prompts.csv file should have a header row with "prompt" as the column name:
prompt
What is machine learning?
Explain quantum computing
How does blockchain work?
Dependencies
- bodo: Provides parallel DataFrame processing for model query operations. RAG components use regular pandas for maximum compatibility.
- pandas: Data processing and CSV handling
- llm: Simon Willison's LLM interface library
- python-dotenv: Environment variable management
- pydantic: Data validation for structured output
- boto3: AWS SDK for Python (required for AWS Bedrock)
- Google GenAI: Gemini API client (
google-genaior equivalent) for Gemini
Author
Rohit Krishnan
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file parallm-0.2.5.tar.gz.
File metadata
- Download URL: parallm-0.2.5.tar.gz
- Upload date:
- Size: 34.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
663930a48e62896db122545b3a722f0049897a57061a2fd41d7a72b34137b478
|
|
| MD5 |
6362a11cf23aaf532399b54349fcecef
|
|
| BLAKE2b-256 |
2d13bfb1c2eb52c0a280b8a9cc3cc6a0b8bf88b277400278a5f0765917e5003d
|
File details
Details for the file parallm-0.2.5-py3-none-any.whl.
File metadata
- Download URL: parallm-0.2.5-py3-none-any.whl
- Upload date:
- Size: 33.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0c21f28afa948b92450b0f1eac37381f30459337c1d7d545c0b15443e8b8f69
|
|
| MD5 |
bb8a7f4d58e19514195656fda1bfceb9
|
|
| BLAKE2b-256 |
90084f0ab701a7b3d9b5e7955e2410688fb9d285ecf93d30bc36aa47ef4dd1d4
|