Multi-model orchestration for LangChain chat models: primary/secondary failover (resilience) and tier-split gather/compose (cost/latency), tool-calling preserved across both.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

vinayvobbili

These details have not been verified by PyPI

Project description

langchain-failover

Tiny, dependency-light multi-model orchestration for LangChain chat models — two strategies for running more than one model behind one interface:

Failover (FailoverChatModel) — for resilience. Serve from a primary, transparently fall back to a secondary on connection errors, switch back the moment the primary recovers. Tool-calling keeps working across the failover.
Tier-split (TieredChatAgent) — for cost/latency. Run the tool-gathering loop on a cheap/local model, then compose the final answer on a frontier model. The long generation moves off the contended box.

They compose: either tier of a TieredChatAgent can itself be a FailoverChatModel. Depends only on langchain-core.

Background: SOC-in-a-Box: One LLM, Eight Hats — the production AI SOC this was extracted from, where it fails a local LLM over to a backup mid-incident and offloads final-answer synthesis to a frontier model.

Failover — for resilience

from langchain_openai import ChatOpenAI
from langchain_failover import FailoverChatModel

primary = ChatOpenAI(base_url="http://gpu-box:8001/v1", api_key="x", model="local")
backup  = ChatOpenAI(base_url="http://cpu-box:8002/v1", api_key="x", model="local")

llm = FailoverChatModel(primary=primary, secondary=backup)

llm.invoke("Summarise this incident…")   # served by primary
# …primary host dies…
llm.invoke("And the next one?")           # transparently served by backup
# …primary comes back…
llm.invoke("One more")                     # back on primary, logged as recovered

Tier-split — for cost/latency

A tool-calling agent spends almost all of its tokens and wall-clock on the loop (decide a call, read the result, decide the next) — cheap reasoning. Writing the final answer is the part that wants a stronger model. They don't have to be the same model. TieredChatAgent runs the gathering loop on a cheap/local gatherer and composes the answer on a frontier composer:

from langchain_failover import TieredChatAgent

agent = TieredChatAgent(
    gatherer=local_llm,      # cheap/local — drives the tool loop (tools are bound for you)
    composer=frontier_llm,   # frontier — writes the final answer from gathered data
    tools=[search, lookup_host, get_timeline],
)

agent.invoke("What changed in the incident overnight?").content

The gatherer is told to gather, then stop — it doesn't write the prose answer. A structural guard (is_premature_marker) catches the model trying to answer before calling any tool and nudges it to gather first, so the composer never writes an answer from zero data. On a contended local GPU this routinely turns a multi-minute final turn into a couple of seconds, because the long generation moves off the busy box. Running your own loop? synthesize_answer(composer, query, messages) is the compose step on its own.

Install

pip install langchain-failover            # core
pip install "langchain-failover[openai]"  # + langchain-openai for create_failover_llm

Why not `RunnableWithFallbacks` / `.with_fallbacks()`?

LangChain ships per-invocation fallbacks, and they're great for what they do. This package exists for the cases they don't cover well:

Stateful recovery. FailoverChatModel remembers which leg it's on and logs the transition both ways (active property tells you). .with_fallbacks() is stateless — every call re-tries the (possibly still-dead) primary first.
Tool-calling survives failover. bind_tools is overridden to bind on both legs and return another FailoverChatModel. With strict langchain-core (>=1.4, where BaseChatModel.bind_tools raises by default) naïve wrappers break at bind time; agents using this one keep working.
Connection-aware, not blanket. It only fails over on connection/network errors (walking the exception's __cause__/__context__ chain, so a socket error wrapped three layers deep still counts). A ValueError from a bad prompt propagates instead of being silently retried on a second endpoint.
Mid-stream safety. During stream(), it only fails over if the primary dies before the first token — so you never get duplicated, half-streamed output.

Local-model convenience

If you run local OpenAI-compatible servers (vLLM, mlx-lm, Ollama, LM Studio) and don't want to hardcode model names, create_failover_llm auto-discovers the served model id from each endpoint's /models:

from langchain_failover import create_failover_llm

llm = create_failover_llm(
    primary_url="http://localhost:8001/v1",
    secondary_url="http://localhost:8002/v1",
)

Bonus helper

extract_token_metrics(response.response_metadata) normalises token counts and timings across OpenAI-compatible and Ollama metadata shapes into a single {input_tokens, output_tokens, prompt_time, generation_time} dict.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

vinayvobbili

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Jun 20, 2026

0.1.1

May 30, 2026

0.1.0

May 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_failover-0.2.0.tar.gz (18.6 kB view details)

Uploaded Jun 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

langchain_failover-0.2.0-py3-none-any.whl (14.6 kB view details)

Uploaded Jun 20, 2026 Python 3

File details

Details for the file langchain_failover-0.2.0.tar.gz.

File metadata

Download URL: langchain_failover-0.2.0.tar.gz
Upload date: Jun 20, 2026
Size: 18.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for langchain_failover-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`620ae949203673de3213f87ead94c46e228a4a0025e3d43abd4a8e455028da8f`
MD5	`8021db21c059ac21eaa566cd8a15f018`
BLAKE2b-256	`efb75d86abccb15831aa988a1b4ea1510dbc9bc5cf00c1a1138c35cdb9bfe87d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_failover-0.2.0.tar.gz:

Publisher: release.yml on vinayvobbili/langchain-failover

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: langchain_failover-0.2.0.tar.gz
- Subject digest: 620ae949203673de3213f87ead94c46e228a4a0025e3d43abd4a8e455028da8f
- Sigstore transparency entry: 1877678208
- Sigstore integration time: Jun 20, 2026
Source repository:
- Permalink: vinayvobbili/langchain-failover@547cf040a5536c5ebe0e3312a457c707f554a403
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/vinayvobbili
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@547cf040a5536c5ebe0e3312a457c707f554a403
- Trigger Event: push

File details

Details for the file langchain_failover-0.2.0-py3-none-any.whl.

File metadata

Download URL: langchain_failover-0.2.0-py3-none-any.whl
Upload date: Jun 20, 2026
Size: 14.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for langchain_failover-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8c036da26fe81faff352fcaa2a9a6bffe7071a1a691cbadce672b76471e8ab88`
MD5	`5f9af476308a6ad9ff8cff8186eec34c`
BLAKE2b-256	`6df1f5119f62e29f0ab225144af5836ebdd226479e9e33c701a0c18f009eccf6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_failover-0.2.0-py3-none-any.whl:

Publisher: release.yml on vinayvobbili/langchain-failover

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: langchain_failover-0.2.0-py3-none-any.whl
- Subject digest: 8c036da26fe81faff352fcaa2a9a6bffe7071a1a691cbadce672b76471e8ab88
- Sigstore transparency entry: 1877678502
- Sigstore integration time: Jun 20, 2026
Source repository:
- Permalink: vinayvobbili/langchain-failover@547cf040a5536c5ebe0e3312a457c707f554a403
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/vinayvobbili
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@547cf040a5536c5ebe0e3312a457c707f554a403
- Trigger Event: push

langchain-failover 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

langchain-failover

Failover — for resilience

Tier-split — for cost/latency

Install

Why not `RunnableWithFallbacks` / `.with_fallbacks()`?

Local-model convenience

Bonus helper

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

langchain-failover 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

langchain-failover

Failover — for resilience

Tier-split — for cost/latency

Install

Why not RunnableWithFallbacks / .with_fallbacks()?

Local-model convenience

Bonus helper

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Why not `RunnableWithFallbacks` / `.with_fallbacks()`?