zerollm-kit

Zero setup, zero config — the easiest Python API for local LLMs on any hardware

These details have not been verified by PyPI

Project links

Project description

ZeroLLM

Zero setup. Zero config. Local LLMs on any hardware.

What is ZeroLLM?

One pip install. Auto-detects your hardware. Downloads the right model. You're chatting in 3 lines of Python.

from zerollm import Chat

bot = Chat("Qwen/Qwen3.5-4B")
print(bot.ask("What is the capital of France?"))

That's it. No config files, no model format headaches, no GPU drivers to manage.

Install

pip install zerollm-kit

Quick Start

Chat

from zerollm import Chat

bot = Chat("Qwen/Qwen3.5-4B")

# Ask
print(bot.ask("Explain quantum computing in one sentence"))

# Stream
for token in bot.stream("Write a haiku about code"):
    print(token, end="", flush=True)

# System prompt — give the bot a personality
bot = Chat("Qwen/Qwen3.5-4B", system_prompt="You are a pirate. Speak like one.")
print(bot.ask("What is the capital of France?"))

Multi-Turn Chat with Memory

from zerollm import Chat

bot = Chat("Qwen/Qwen3.5-4B", memory=True)

bot.ask("My name is Nilesh")
bot.ask("I work on AI projects")
print(bot.ask("What is my name and what do I do?"))
# Remembers: Nilesh, works on AI projects

# Memory auto-summarizes old turns when history gets long
# Persistent memory survives restarts (stored in SQLite)

Agent with Tools

from zerollm import Agent

# Pass instruction prompt to the agent
agent = Agent(
    "Qwen/Qwen3.5-4B",
    system_prompt="You are a helpful assistant. Always be concise.",
)

@agent.tool
def get_weather(city: str) -> str:
    """Get weather for a city."""
    return f"22°C and sunny in {city}"

print(agent.ask("What's the weather in Auckland?"))

Agent with ReAct Reasoning

# ReAct: Thought → Action → Observation → Answer
agent = Agent("Qwen/Qwen3.5-4B", react=True)

@agent.tool
def calculate(expression: str) -> str:
    return str(eval(expression))

agent.ask("What is 15% of 230?")  # thinks step-by-step before answering

Agent Guardrails

agent = Agent("Qwen/Qwen3.5-4B")

@agent.before_ask
def block_injection(prompt: str) -> str | None:
    if "ignore previous" in prompt.lower():
        return "Blocked: potential prompt injection."
    return None

@agent.after_ask
def clean_output(response: str) -> str:
    return response.replace("sensitive_data", "***")

Human-in-the-Loop

# Safe tools run automatically
@agent.tool
def search(query: str) -> str:
    return f"Results for: {query}"

# Dangerous tools ask for confirmation first
@agent.tool(confirm=True)
def delete_file(path: str) -> str:
    """Prompts: 'Confirm: Call delete_file({"path": "..."})? [y/N]'"""
    os.remove(path)
    return f"Deleted {path}"

Sub-Agents with Shared Context

from zerollm import Agent, SharedContext

ctx = SharedContext()

# Each sub-agent gets its own instruction prompt
researcher = Agent(
    "Qwen/Qwen3.5-4B",
    name="researcher",
    context=ctx,
    system_prompt="You are a research assistant. Find accurate information.",
)

writer = Agent(
    "Qwen/Qwen3.5-4B",
    name="writer",
    context=ctx,
    system_prompt="You are a skilled writer. Write clear, engaging content.",
)

@researcher.tool
def search(query: str) -> str:
    return f"Results for: {query}"

main = Agent(
    "Qwen/Qwen3.5-4B",
    context=ctx,
    system_prompt="You are a project manager. Delegate research and writing tasks.",
)
main.add_agent("researcher", researcher, "Research any topic")
main.add_agent("writer", writer, "Write content")

# Multi-turn — agent remembers previous conversation
main.ask("Research AI trends and write a summary")
main.ask("Now make it shorter")  # remembers the previous output

Serve as API

from zerollm import Server

Server("Qwen/Qwen3.5-4B", port=8080).serve()

OpenAI-compatible. Works with any client that speaks the OpenAI API.

Fine-Tune

from zerollm import FineTuner

tuner = FineTuner("Qwen/Qwen3.5-4B")
tuner.train("my_data.csv", epochs=3)
tuner.save("my-bot")

Then use your fine-tuned model:

from zerollm import Chat, Server

Chat("my-bot").ask("Hello!")        # chat with it
Server("my-bot", port=8080).serve() # or serve it

RAG

from zerollm import RAG

rag = RAG("Qwen/Qwen3.5-4B")
rag.add("docs.pdf")
print(rag.ask("What is the refund policy?"))

With cross-encoder reranking for better results:

rag = RAG("Qwen/Qwen3.5-4B", rerank=True)

Conversation-aware — follow-up questions just work:

rag.chat("What is the refund policy?")
rag.chat("How long do I have?")  # auto-rewrites using chat history

Connect RAG to an Agent:

agent = Agent("Qwen/Qwen3.5-4B")
agent.add_rag(rag, "Search company documents")
agent.ask("What does our policy say about returns?")

CLI

zerollm chat Qwen/Qwen3.5-4B    # interactive chat
zerollm serve Qwen/Qwen3.5-4B   # start API server
zerollm list                     # show downloaded models
zerollm doctor                   # diagnose setup
zerollm download Qwen/Qwen3.5-4B  # pre-download a model

Supported Hardware

Platform	Acceleration	Auto-detected
Any CPU	PyTorch	Yes
NVIDIA GPU	CUDA	Yes
Apple Silicon	MPS	Yes
AMD GPU	ROCm	Yes

Models

Works with any model from HuggingFace. Just pass the HF repo name:

Chat("Qwen/Qwen3.5-4B")                            # any HF model
Chat("microsoft/Phi-3-mini-4k-instruct")            # another model
Chat("deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B")  # reasoning model
Chat("/path/to/local-model/")                        # local model directory
Chat("my-finetuned-bot")                             # fine-tuned model

Run zerollm list to see downloaded models, or zerollm doctor to check your setup.

Architecture

ZeroLLM Architecture

Note

ZeroLLM is in early alpha. Things will break, APIs may change, and not every HuggingFace model will work perfectly. That's expected — we're iterating fast.

HuggingFace login: ZeroLLM downloads models from HuggingFace Hub. Public models work without login, but you may see rate limit warnings. For faster downloads, log in once:
pip install huggingface_hub
huggingface-cli login
Or set a token: export HF_TOKEN="hf_..." (get one here)

Feedback welcome. If you hit an issue or have ideas, open an issue. Your feedback shapes what this becomes.

Star History

License

MIT

Core Contributor

Nilesh Verma

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.8

Mar 22, 2026

0.1.7

Mar 22, 2026

0.1.6

Mar 22, 2026

0.1.5

Mar 22, 2026

0.1.4

Mar 22, 2026

0.1.3

Mar 22, 2026

0.1.2

Mar 22, 2026

0.1.1

Mar 22, 2026

0.1.0

Mar 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zerollm_kit-0.1.8.tar.gz (1.7 MB view details)

Uploaded Mar 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

zerollm_kit-0.1.8-py3-none-any.whl (40.8 kB view details)

Uploaded Mar 22, 2026 Python 3

File details

Details for the file zerollm_kit-0.1.8.tar.gz.

File metadata

Download URL: zerollm_kit-0.1.8.tar.gz
Upload date: Mar 22, 2026
Size: 1.7 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for zerollm_kit-0.1.8.tar.gz
Algorithm	Hash digest
SHA256	`0df2a000efc83be122cee56bfc5189173c955ea0920f199ebf927a7241e40940`
MD5	`70a69ff61531d0ba4b8a1b0787fc60c7`
BLAKE2b-256	`948761d5c3bd8566736c5b7d6699bf273ff8feb2f9276ff5c1ad6b75b66b5913`

See more details on using hashes here.

File details

Details for the file zerollm_kit-0.1.8-py3-none-any.whl.

File metadata

Download URL: zerollm_kit-0.1.8-py3-none-any.whl
Upload date: Mar 22, 2026
Size: 40.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for zerollm_kit-0.1.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`aab4ebfa82b78c0e4237903753fd950c0899ff34b044e13961632970c9f5aebf`
MD5	`662ad56f68478a34e05864b907b48c3f`
BLAKE2b-256	`ff81537948279fc387a705c333867cd3ada51ef36b07d348d9f86326e18e4141`

See more details on using hashes here.

zerollm-kit 0.1.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

What is ZeroLLM?

Install

Quick Start

Chat

Multi-Turn Chat with Memory

Agent with Tools

Agent with ReAct Reasoning

Agent Guardrails

Human-in-the-Loop

Sub-Agents with Shared Context

Serve as API

Fine-Tune

RAG

CLI

Supported Hardware

Models

Architecture

Note

Star History

License

Core Contributor

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes