LangChain tools for ZeroGPU: A compute-efficient inference provider for apps and agents — purpose-built small and nano language models on an edge network that run the repeatable tasks frontier models shouldn’t, ~10x faster and 50%+ cheaper. Auto-scaling, with zero GPU infrastructure. Plug in and you’re live.
Project description
langchain-zerogpu
LangChain tools for ZeroGPU.
ZeroGPU is a compute-efficient inference provider for apps and agents. We run purpose-built small and nano language models across an edge network for the high-volume tasks you run constantly — classification, extraction, moderation, routing, summarization — at ~10x lower latency and 50%+ lower cost than frontier-model workflows. Auto-scaling, with zero GPU infrastructure to manage. Plug in and you're live.
This package exposes those models as first-class LangChain
BaseTool subclasses, so
any LangChain agent — including create_agent and LangGraph graphs — can offload
these repeatable NLP tasks (classification, summarization, entity / JSON
extraction, PII redaction, and short chat) to ZeroGPU instead of spending
frontier-model tokens.
All calls go through the official zerogpu-api
Python SDK.
Install
pip install langchain-zerogpu
Authenticate
Every request needs a ZeroGPU API key (starts with zgpu-api-) and a
project id. Provide them via environment variables:
export ZEROGPU_API_KEY="zgpu-api-..."
export ZEROGPU_PROJECT_ID="your-project-id"
…or pass them directly to any tool or the toolkit:
from langchain_zerogpu import ZeroGPUSummarizeTool
tool = ZeroGPUSummarizeTool(api_key="zgpu-api-...", project_id="your-project-id")
The API key is stored as a pydantic.SecretStr and is never logged.
The tools
| Tool class | ZeroGPU model | Purpose |
|---|---|---|
ZeroGPUChatTool |
LFM2.5-1.2B-Instruct |
Short single-turn chat reply |
ZeroGPUChatThinkingTool |
LFM2.5-1.2B-Thinking |
Chat with a visible reasoning trace |
ZeroGPUSummarizeTool |
llama-3.1-8b-instruct-fast |
Condense a passage |
ZeroGPUClassifyIABTool |
zlm-v1-iab-classify-edge |
IAB taxonomy classification |
ZeroGPUClassifyIABEnrichedTool |
zlm-v1-iab-classify-edge-enriched |
IAB + topics / keywords / intent |
ZeroGPUClassifyZeroShotTool |
deberta-v3-small |
Zero-shot vs. custom labels |
ZeroGPUClassifyStructuredTool |
gliner2-base-v1 |
Multi-axis schema classification |
ZeroGPUExtractEntitiesTool |
gliner2-base-v1 |
Custom-label NER |
ZeroGPUExtractPIITool |
gliner-multi-pii-v1 |
Extract PII entities (JSON) |
ZeroGPURedactPIITool |
gliner-multi-pii-v1 |
Mask PII inline with [LABEL] |
ZeroGPUExtractJSONTool |
gliner2-base-v1 |
Schema-driven JSON extraction |
Quick start
from langchain_zerogpu import ZeroGPUClassifyZeroShotTool
tool = ZeroGPUClassifyZeroShotTool() # reads creds from the environment
print(tool.invoke({
"text": "The new GPU smashes every benchmark we threw at it.",
"labels": ["tech", "politics", "sports"],
}))
Tools work asynchronously too:
result = await tool.ainvoke({"text": "...", "labels": ["a", "b"]})
Bind the tools to an agent
Use the toolkit to get all eleven tools — wired to a single shared client — and bind them to an agent:
from langchain.agents import create_agent
from langchain_zerogpu import ZeroGPUToolkit
toolkit = ZeroGPUToolkit() # reads ZEROGPU_API_KEY / ZEROGPU_PROJECT_ID
tools = toolkit.get_tools()
agent = create_agent("anthropic:claude-sonnet-4-6", tools=tools)
agent.invoke({
"messages": [
{"role": "user", "content": "Redact the PII in: 'Call Jane at 555-0100.'"}
]
})
Or bind a single tool to a chat model directly:
from langchain.chat_models import init_chat_model
from langchain_zerogpu import ZeroGPUExtractPIITool
llm = init_chat_model("anthropic:claude-sonnet-4-6")
llm_with_tools = llm.bind_tools([ZeroGPUExtractPIITool()])
Errors
Failures surface as clear, typed exceptions instead of raw stack traces:
ZeroGPUAuthError— missing / malformed credentials,401, or403.ZeroGPUError— rate limits (429), server errors (5xx), and network failures.
Development
make install # uv sync --all-groups
make lint # ruff check + format --check
make mypy # mypy (disallow_untyped_defs)
make test # unit tests, sockets disabled
make integration_test # integration tests (needs real ZeroGPU creds)
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_zerogpu-0.2.1.tar.gz.
File metadata
- Download URL: langchain_zerogpu-0.2.1.tar.gz
- Upload date:
- Size: 16.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
748277fceac89427601aa121057c4604d5f13ad4cad9d2d8e434d0d67271bddf
|
|
| MD5 |
2e1443b6705784e52e80bce97af514a3
|
|
| BLAKE2b-256 |
b2ef96d65cf89f8ef8fb485df311d202d0d39f4c2ce9cc53dd138607d009d2ca
|
Provenance
The following attestation bundles were made for langchain_zerogpu-0.2.1.tar.gz:
Publisher:
release.yml on zerogpu/langchain-zerogpu
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langchain_zerogpu-0.2.1.tar.gz -
Subject digest:
748277fceac89427601aa121057c4604d5f13ad4cad9d2d8e434d0d67271bddf - Sigstore transparency entry: 1712423322
- Sigstore integration time:
-
Permalink:
zerogpu/langchain-zerogpu@703e6e84c16cb555d7009b0d087156ebecf4f116 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/zerogpu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@703e6e84c16cb555d7009b0d087156ebecf4f116 -
Trigger Event:
push
-
Statement type:
File details
Details for the file langchain_zerogpu-0.2.1-py3-none-any.whl.
File metadata
- Download URL: langchain_zerogpu-0.2.1-py3-none-any.whl
- Upload date:
- Size: 15.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2a35d8c58d5d61c0eab86ae1fb411d0d437fe9388eb4bac572102ea28d0193b
|
|
| MD5 |
f767175d4ef07f2fa9ddf3dde251994f
|
|
| BLAKE2b-256 |
e6d3e597ed04edeb2fafe4e43a4f4db9afe122a8e30a8a193df434c6ca83ac7c
|
Provenance
The following attestation bundles were made for langchain_zerogpu-0.2.1-py3-none-any.whl:
Publisher:
release.yml on zerogpu/langchain-zerogpu
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langchain_zerogpu-0.2.1-py3-none-any.whl -
Subject digest:
a2a35d8c58d5d61c0eab86ae1fb411d0d437fe9388eb4bac572102ea28d0193b - Sigstore transparency entry: 1712423454
- Sigstore integration time:
-
Permalink:
zerogpu/langchain-zerogpu@703e6e84c16cb555d7009b0d087156ebecf4f116 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/zerogpu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@703e6e84c16cb555d7009b0d087156ebecf4f116 -
Trigger Event:
push
-
Statement type: