Composable, cache-aware batch processing pipelines for LLMs, APIs, and dataset generation.

These details have not been verified by PyPI

Project links

Project description

BatchFactory

Composable, cache‑aware pipelines for parallel LLM workflows, API calls, and dataset generation.

Status — v0.4 beta. More robust and battle-tested on small projects. Still evolving quickly — APIs may shift.

BatchFactory cover

📦 GitHub Repository →

Install

pip install batchfactory            # latest tag
pip install --upgrade batchfactory  # grab the newest patch

Quick‑start

import batchfactory as bf
from batchfactory.op import *

project = bf.ProjectFolder("quickstart", 1, 0, 5)
broker  = bf.brokers.LLMBroker(project["cache/llm_broker.jsonl"])

PROMPT = """
Write a poem about {keyword}.
"""

g = bf.Graph()
g |= ReadMarkdownLines("./demo_data/greek_mythology_stories.md")
g |= Shuffle(42) | TakeFirstN(5)
g |= GenerateLLMRequest(PROMPT, model="gpt-4o-mini@openai")
g |= CallLLM(project["cache/llm_call.jsonl"],broker)
g |= ExtractResponseText()
g |= MapField(lambda headings,keyword: headings+[keyword], ["headings", "keyword"], "headings")
g |= WriteMarkdownEntries(project["out/poems.md"])

g.execute(dispatch_brokers=True)

Run it twice – everything after the first run is served from the on‑disk ledger.

🚀 Why BatchFactory?

BatchFactory lets you build cache‑aware, composable pipelines for LLM calls, embeddings, and data transforms—so you can go from idea to production with zero boilerplate.

Composable Ops – chain 30‑plus ready‑made Ops (and your own) using simple pipe syntax.
Transparent Caching & Cost Tracking – every expensive call is hashed, cached, resumable, and audited.
Pluggable Brokers – swap in LLM, embedding, search, or human‑in‑the‑loop brokers at will.
Self‑contained datasets – pack arrays, images, audio—any data—into each entry so your entire workflow travels as a single, copy‑anywhere .jsonl file.
Ready‑to‑Copy Demos – learn the idioms fast with five concise example pipelines.

🧩 Three killer moves

🏭 Mass data distillation & cleanup	🎭 Multi‑agent, multi‑round workflows	🌲 Hierarchical spawning (`ListParallel`)
Chain `GenerateLLMRequest → CallLLM → ExtractResponseText` after keyword / file sources to mass‑produce, filter, or polish datasets—millions of Q&A rows, code explanations, translation pairs—with built‑in caching & cost tracking.	With `Repeat`, `If`, `While`, and chat helpers, you can script complex role‑based collaborations—e.g. Junior Translator → Senior Editor → QA → Revision—and run full multi‑agent, multi‑turn simulations in just a few lines of code. Ideal for workflows inspired by TransAgents, MATT, or ChatDev.	`ListParallel` breaks a complex item into fine‑grained subtasks, runs them concurrently, then reunites the outputs—perfect for long‑text summarisation, RAG chunking, or any tree‑structured pipeline.

Spawn snippet (Text Segmentation)

g |= MapField(lambda x: split_text(label_line_numbers(x)), "text", "text_segments")
spawn_chain = AskLLM(LABEL_SEG_PROMPT, "labels", 1)
spawn_chain |= MapField(text_to_integer_list, "labels")
g | ListParallel(spawn_chain, "text_segments", "text", "labels", "labels")
g |= MapField(flatten_list, "labels")
g |= MapField(split_text_by_line_labels, ["text", "labels"], "text_segments")
g |= ExplodeList(["filename","text_segments"],["filename","text"])

Loop snippet (Role‑Playing)

Teacher = Character("teacher_name", "You are a teacher named {teacher_name}. "+FORMAT_REQ)
Student = Character("student_name", "You are a student named {student_name}. "+FORMAT_REQ)

g = bf.Graph()
g |= ReadMarkdownLines("./demo_data/greek_mythology_stories.md") | TakeFirstN(1)
g |= SetField("teacher_name", "Teacher","student_name", "Student")

g |= Teacher("Please introduce the text from {headings} titled {keyword}.", 0)
loop_body = Student("Please ask questions or respond.", 1)
loop_body |= Teacher("Please respond to the student or continue explaining.", 2)
g |= Repeat(loop_body, 3)
g |= Teacher("Please summarize.", 3)
g |= ChatHistoryToText(template="**{role}**: {content}\n\n")
g |= MapField(lambda headings,keyword: headings+[keyword], ["headings", "keyword"], "headings")
g |= WriteMarkdownEntries(project["out/roleplay.md"])

Text Embedding snippet

embedding_broker  = bf.brokers.LLMEmbeddingBroker(project["cache/embedding_broker.jsonl"])
g |= GenerateLLMEmbeddingRequest("keyword", model="text-embedding-3-small@openai")
g |= CallLLMEmbedding(project["cache/embedding_call.jsonl"], embedding_broker)
g |= ExtractResponseEmbedding()
g |= DecodeBase64Embedding()

Core concepts (one‑liner view)

Term	Story in one sentence
Entry	Tiny record with immutable `idx`, mutable `data`, auto‑incrementing `rev`.
Op	Atomic node; compose with `	`or`wire()`.
Graph	A chain of `Op`s wired together — supports flexible pipelines and subgraphs.
Executor	Internal engine that tracks graph state, manages batching, resumption, and broker dispatch. Created automatically when you call `graph.execute()`.
Broker	Pluggable engine for expensive or async jobs (LLM APIs, search, human labelers).
Ledger	Append‑only JSONL backing each broker & graph — enables instant resume and transparent caching.
execute()	High-level command that runs the graph: creates an `Executor`, resumes from cache, and dispatches brokers as needed.

📚 Example Gallery

✨ Example	Shows
1_quickstart	Linear LLM transform with caching & auto‑resume
2_roleplay	Multi‑agent, multi‑turn roleplay with chat agents
3_text_segmentation	Divide‑and‑conquer pipeline for text segmentation
4_prompt_management	Prompt + data templating in one place
5_embeddings	Embeddings + cosine similarity workflow

Available Ops

Operation	Description
`Apply`	Apply a function to modify the entry data.
`BeginIf`	Switch to port 1 if criteria is met. See `If` function for usage.
`CallLLM`	Dispatch concurrent API calls for LLM — may induce API billing from external providers.
`CallLLMEmbedding`	Dispatch concurrent API calls for embedding models — may induce API billing from external providers.
`ChatHistoryToText`	Format the chat history into a single text.
`CheckPoint`	A no-op checkpoint that saves inputs to the cache, and resumes from the cache.
`CleanupLLMData`	Clean up internal fields for LLM processing, such as `llm_request`, `llm_response`, `status`, and `job_idx`.
`CleanupLLMEmbeddingData`	Clean up the internal fields for LLM processing, such as `embedding_request`, `embedding_response`, `status`, `job_idx`.
`Collect`	Collect data from port 1, merge to 0.
`CollectAllToList`	Collect items from spawn entries on port 1 and merge them into a list (or lists if multiple items provided).
`DecodeBase64Embedding`	Decode the base64 encoded embedding into python array.
`EndIf`	Join entries from either port 0 or port 1. See `If` function for usage.
`ExplodeList`	Explode an entry to multiple entries based on a list (or lists).
`ExtractResponseEmbedding`	Extract the embedding object (base64 encoded numpy array) from the LLM response and store it to entry data.
`ExtractResponseText`	Extract the text content from the LLM response and store it to entry data.
`Filter`	Filter entries based on a custom criteria function.
`FilterFailedEntries`	Drop entries that have a status "failed".
`FilterMissingFields`	Drop entries that do not have specific fields.
`FromList`	Create entries from a list of dictionaries or objects, each representing an entry.
`GenerateLLMEmbeddingRequest`	Generate LLM embedding requests from input_key.
`GenerateLLMRequest`	Generate LLM requests from a given prompt, formatting it with the entry data.
`If`	Switch to true_chain if criteria is met, otherwise stay on false_chain.
`ListParallel`	Spawn entries from a list (or lists), process them in parallel, and collect them back to a list (or lists).
`MapField`	Map a function to specific field(s) in the entry data.
`PrintEntry`	Print the first n entries information.
`PrintField`	Print the specific field(s) from the first n entries.
`PrintTotalCost`	Print the total accumulated API cost for the output batch.
`ReadJsonl`	Read JSON Lines files. (also supports json array)
`ReadMarkdownEntries`	Read Markdown files and extract nonempty text under every headings with markdown headings as a list.
`ReadMarkdownLines`	Read Markdown files and extract non-empty lines as keyword with markdown headings as a list.

Operation	Description
`ReadTxtFolder`	Collect all txt files in a folder.
`RemoveField`	Remove fields from the entry data.
`RenameField`	Rename fields in the entry data.
`Repeat`	Repeat the loop body for a fixed number of rounds.
`RepeatNode`	Repeat the loop body for a fixed number of rounds. See `Repeat` function for usage.
`Replicate`	Replicate an entry to all output ports.
`SetField`	Set fields in the entry data to specific values.
`Shuffle`	Shuffle the entries in a batch randomly.
`Sort`	Sort the entries in a batch
`SortMarkdownEntries`	Sort Markdown entries based on headings and (optional) keyword.
`SpawnFromList`	Spawn multiple spawn entries to port 1 based on a list (or lists).
`TakeFirstN`	Takes the first N entries from the batch. discards the rest.
`ToList`	Output a list of specific field(s) from entries.
`TransformCharacterDialogueForLLM`	Map custom character roles to valid LLM roles (user/assistant/system). Must be called after GenerateLLMRequest.
`UpdateChatHistory`	Appending the LLM response to the chat history.
`While`	Executes the loop body while the criteria is met.
`WhileNode`	Executes the loop body while the criteria is met. See `While` function for usage.
`WriteJsonl`	Write entries to a JSON Lines file.
`WriteMarkdownEntries`	Write entries to Markdown file(s), with heading hierarchy defined by headings and text as content.
`WriteMarkdownLines`	Write keyword lists to Markdown file(s) as lines, with heading hierarchy defined by headings:list.
`remove_cot`	Remove the chain of thought (CoT) from the LLM response. Use MapField to wrap it.
`remove_speaker_tag`	Remove speaker tags. Use MapField to wrap it.
`split_cot`	Split the LLM response into text and chain of thought (CoT). Use MapField to wrap it.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.2

Jul 31, 2025

0.6.1

Jul 13, 2025

0.6.0

Jul 13, 2025

0.5.1

Jun 19, 2025

0.5.0

Jun 16, 2025

This version

0.4.0

Jun 15, 2025

0.3.5

Jun 15, 2025

0.3.4

Jun 14, 2025

0.3.3

Jun 13, 2025

0.3.2

Jun 13, 2025

0.3.1

Jun 13, 2025

0.3.0

Jun 12, 2025

0.2.1

Jun 11, 2025

0.2.0

Jun 10, 2025

0.1.1

Jun 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

batchfactory-0.4.0.tar.gz (56.8 kB view details)

Uploaded Jun 15, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

batchfactory-0.4.0-py3-none-any.whl (57.2 kB view details)

Uploaded Jun 15, 2025 Python 3

File details

Details for the file batchfactory-0.4.0.tar.gz.

File metadata

Download URL: batchfactory-0.4.0.tar.gz
Upload date: Jun 15, 2025
Size: 56.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for batchfactory-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`45bfc6eab4ec1becc6be5ad1f94613efbfbf5ac91e9de80dc77e86c1ac02226f`
MD5	`c2ff24ced4c1b04c5092a556475388fd`
BLAKE2b-256	`94baa4a12c64d12883196ce7e4017c2f051034eba798d00cb77901f6341b4e72`

See more details on using hashes here.

File details

Details for the file batchfactory-0.4.0-py3-none-any.whl.

File metadata

Download URL: batchfactory-0.4.0-py3-none-any.whl
Upload date: Jun 15, 2025
Size: 57.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for batchfactory-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`72e3420410478e0d2d7d6734eb2d18c8c52638820d76a9055dc597701501c1f8`
MD5	`44f29cc51ede2382e837f35b25fba2e8`
BLAKE2b-256	`a04ae421560fa92f0e04d493c5eb0aa9d2d3d4f3f87ab07917160e2840efce27`

See more details on using hashes here.

batchfactory 0.4.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

BatchFactory

Install

Quick‑start

🚀 Why BatchFactory?

🧩 Three killer moves

Spawn snippet (Text Segmentation)

Loop snippet (Role‑Playing)

Text Embedding snippet

Core concepts (one‑liner view)

📚 Example Gallery

Available Ops

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes