LangCore hybrid provider — combines deterministic rule/regex extraction with LLM fallback for cost savings

These details have not been verified by PyPI

Project description

LangCore Hybrid Provider

A provider plugin for LangCore that combines deterministic rule-based extraction (regex, callable functions) with LLM fallback. Saves 50–80% of LLM costs on well-structured documents.

Note: This is a third-party provider plugin for LangCore. For the main LangCore library, visit google/langcore.

Installation

Install from source:

git clone <repo-url>
cd langcore-hybrid
pip install -e .

For optional spaCy NER support:

pip install -e ".[spacy]"

Features

Rules first, LLM fallback — deterministic extraction for patterns that don't need an LLM
Regex rules — extract dates, amounts, reference numbers via named capture groups
Callable rules — plug in any Python function for custom extraction logic
Confidence thresholds — optionally fall back to LLM when rule confidence is low
Batch-aware async — only prompts that miss all rules are batched for LLM inference
Observability — built-in counters for rule hits vs LLM fallbacks
Thread-safe counters — rule-hit and LLM-fallback counters are protected by a lock for safe concurrent use
Text extractor hook — optional text_extractor callable in RuleConfig isolates document text from prompt instructions before rule evaluation
Output formatter hook — optional output_formatter callable in RuleConfig normalises rule outputs (e.g. wrap in JSON)
Zero overhead on hits — rule evaluation is pure Python, no network calls

Usage

Regex Rules for Contract Extraction

import langcore as lx
from langcore_hybrid import (
    HybridLanguageModel,
    RegexRule,
    RuleConfig,
)

# Create the fallback LLM provider
inner_config = lx.factory.ModelConfig(
    model_id="litellm/azure/gpt-4o",
    provider="LiteLLMLanguageModel",
)
inner_model = lx.factory.create_model(inner_config)

# Define rules for common patterns
rules = RuleConfig(rules=[
    RegexRule(
        r"Date:\s*(?P<date>\d{4}-\d{2}-\d{2})",
        description="ISO date extraction",
    ),
    RegexRule(
        r"Amount:\s*\$(?P<amount>[\d,.]+)",
        description="USD amount extraction",
    ),
    RegexRule(
        r"Ref(?:erence)?[:\s]+(?P<ref>[A-Z]+-\d+)",
        description="Reference number extraction",
    ),
])

# Create hybrid provider
hybrid_model = HybridLanguageModel(
    model_id="hybrid/gpt-4o",
    inner=inner_model,
    rule_config=rules,
)

# Use as normal
result = lx.extract(
    text_or_documents="Contract Ref: ABC-123, Date: 2026-01-15...",
    model=hybrid_model,
    prompt_description="Extract contract metadata.",
)

# Check cost savings
print(f"Rule hits: {hybrid_model.rule_hits}")
print(f"LLM fallbacks: {hybrid_model.llm_fallbacks}")

Callable Rules

import json
from langcore_hybrid import CallableRule, RuleConfig

def extract_email(prompt: str) -> str | None:
    """Extract email addresses deterministically."""
    import re
    emails = re.findall(r'\b[\w.+-]+@[\w-]+\.[\w.]+\b', prompt)
    if emails:
        return json.dumps({"emails": emails})
    return None  # Signal miss — fall back to LLM

rules = RuleConfig(rules=[
    CallableRule(
        func=extract_email,
        description="email extraction",
    ),
])

Custom Output Templates

import json
from langcore_hybrid import RegexRule

rule = RegexRule(
    r"Amount:\s*\$(?P<amount>[\d,.]+)",
    output_template=lambda groups: json.dumps({
        "amount": float(groups["amount"].replace(",", "")),
        "currency": "USD",
    }),
)

Confidence Thresholds

When fallback_on_low_confidence is enabled, rule results with confidence below min_confidence trigger LLM fallback instead:

from langcore_hybrid import RegexRule, RuleConfig

rules = RuleConfig(
    rules=[
        RegexRule(r"(?P<date>\d{1,2}/\d{1,2}/\d{2,4})", confidence=0.6),
        RegexRule(r"(?P<date>\d{4}-\d{2}-\d{2})", confidence=1.0),
    ],
    fallback_on_low_confidence=True,
    min_confidence=0.8,
)
# Ambiguous dates (MM/DD/YY) fall back to LLM; ISO dates are trusted

Rule Evaluation Order

Rules are evaluated in list order. The first rule that hits and meets the confidence threshold wins — later rules are not evaluated.

When fallback_on_low_confidence=True, a rule that hits below min_confidence is skipped and evaluation continues to the next rule. If no subsequent rule produces a confident hit, the prompt falls through to the LLM.

This means rule ordering matters:

Place high-confidence, specific rules first (e.g. ISO dates).
Follow with lower-confidence, broader rules (e.g. ambiguous date formats).
The LLM acts as the final catch-all.

rules = RuleConfig(
    rules=[
        RegexRule(r"(?P<date>\d{4}-\d{2}-\d{2})", confidence=1.0),   # ① specific
        RegexRule(r"(?P<date>\d{1,2}/\d{1,2}/\d{2,4})", confidence=0.6),  # ② broad
    ],
    fallback_on_low_confidence=True,
    min_confidence=0.8,
)
# "2026-01-15" → rule ① (confidence 1.0 ≥ 0.8) → instant result
# "1/15/26"   → rule ① miss, rule ② hit (0.6 < 0.8) → LLM fallback

Async Usage

The async path is batch-optimised — only prompts that miss all rules are sent to the LLM in a single batch:

results = await hybrid_model.async_infer([
    "Date: 2026-01-15",       # Rule hit — instant
    "Complex clause text...",  # Rule miss — LLM
    "Ref: ABC-123",           # Rule hit — instant
])
# Only 1 LLM call for the 1 miss, not 3

Observability

Counters are per-instance and cumulative — they track totals since the provider was created (or last reset). In long-running applications where a single HybridLanguageModel handles unrelated jobs, call reset_counters() between jobs or use get_counters() to take point-in-time snapshots for differential reporting.

print(f"Rule hits: {hybrid_model.rule_hits}")
print(f"LLM fallbacks: {hybrid_model.llm_fallbacks}")

# Atomic snapshot (thread-safe dict copy)
snapshot = hybrid_model.get_counters()
# {"rule_hits": 42, "llm_fallbacks": 7}

# Reset counters (also thread-safe)
hybrid_model.reset_counters()

Text Extractor

When prompts contain instructions followed by document text, rules may match instruction fragments by mistake. Use text_extractor to isolate the document:

def extract_after_marker(prompt: str) -> str:
    """Return text after '---DOCUMENT---' marker."""
    marker = "---DOCUMENT---"
    idx = prompt.find(marker)
    return prompt[idx + len(marker) :].strip() if idx >= 0 else prompt

rules = RuleConfig(
    rules=[RegexRule(r"Date:\s*(?P<date>\d{4}-\d{2}-\d{2})")],
    text_extractor=extract_after_marker,
)

Output Formatter

Normalise rule outputs for downstream consumers:

import json

rules = RuleConfig(
    rules=[RegexRule(r"Ref:\s*(?P<ref>[A-Z]+-\d+)")],
    output_formatter=lambda raw: json.dumps({"result": json.loads(raw)}),
)

Custom Rules

Implement the ExtractionRule interface:

from langcore_hybrid.rules import ExtractionRule, RuleResult

class SpacyNERRule(ExtractionRule):
    def __init__(self, nlp_model: str = "en_core_web_sm") -> None:
        import spacy
        self._nlp = spacy.load(nlp_model)

    def evaluate(self, prompt: str) -> RuleResult:
        doc = self._nlp(prompt)
        entities = [
            {"text": ent.text, "label": ent.label_}
            for ent in doc.ents
        ]
        if entities:
            import json
            return RuleResult(
                hit=True,
                output=json.dumps(entities),
                confidence=0.85,
            )
        return RuleResult(hit=False)

Development

pip install -e ".[dev]"
pytest

License

Apache 2.0

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.1.1

Feb 23, 2026

1.1.0

Feb 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langcore_hybrid_llm_regex-1.1.1.tar.gz (19.1 kB view details)

Uploaded Feb 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

langcore_hybrid_llm_regex-1.1.1-py3-none-any.whl (14.6 kB view details)

Uploaded Feb 23, 2026 Python 3

File details

Details for the file langcore_hybrid_llm_regex-1.1.1.tar.gz.

File metadata

Download URL: langcore_hybrid_llm_regex-1.1.1.tar.gz
Upload date: Feb 23, 2026
Size: 19.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for langcore_hybrid_llm_regex-1.1.1.tar.gz
Algorithm	Hash digest
SHA256	`6302c6a040eabc7027afca046785b8fb1b8f15fad58c9a5ca7df5aa4573eff67`
MD5	`cc29a582b75e0d5c3ed0cbedf58cf765`
BLAKE2b-256	`b78b9e56eda4d05239d0970698e7df15eff4a2fa797dedc0d4e25ff8537fa26a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for langcore_hybrid_llm_regex-1.1.1.tar.gz:

Publisher: release.yml on IgnatG/langcore-hybrid-llm-regex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: langcore_hybrid_llm_regex-1.1.1.tar.gz
- Subject digest: 6302c6a040eabc7027afca046785b8fb1b8f15fad58c9a5ca7df5aa4573eff67
- Sigstore transparency entry: 983335255
- Sigstore integration time: Feb 23, 2026
Source repository:
- Permalink: IgnatG/langcore-hybrid-llm-regex@59ba5cafd2d17492f18e57b9b758133215ab1163
- Branch / Tag: refs/heads/main
- Owner: https://github.com/IgnatG
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@59ba5cafd2d17492f18e57b9b758133215ab1163
- Trigger Event: push

File details

Details for the file langcore_hybrid_llm_regex-1.1.1-py3-none-any.whl.

File metadata

Download URL: langcore_hybrid_llm_regex-1.1.1-py3-none-any.whl
Upload date: Feb 23, 2026
Size: 14.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for langcore_hybrid_llm_regex-1.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2d176bf4436e1f51b5a23e436b00ba7c00f0381a8155f08dc2642af718fa28fa`
MD5	`72ad065827401a42ad422e28a5bcfcbf`
BLAKE2b-256	`2b359271f6cf05def20c0d85b1b75116636cf0edac89e01ff15e85024bbafe3f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for langcore_hybrid_llm_regex-1.1.1-py3-none-any.whl:

Publisher: release.yml on IgnatG/langcore-hybrid-llm-regex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: langcore_hybrid_llm_regex-1.1.1-py3-none-any.whl
- Subject digest: 2d176bf4436e1f51b5a23e436b00ba7c00f0381a8155f08dc2642af718fa28fa
- Sigstore transparency entry: 983335269
- Sigstore integration time: Feb 23, 2026
Source repository:
- Permalink: IgnatG/langcore-hybrid-llm-regex@59ba5cafd2d17492f18e57b9b758133215ab1163
- Branch / Tag: refs/heads/main
- Owner: https://github.com/IgnatG
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@59ba5cafd2d17492f18e57b9b758133215ab1163
- Trigger Event: push

langcore-hybrid-llm-regex 1.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

LangCore Hybrid Provider

Installation

Features

Usage

Regex Rules for Contract Extraction

Callable Rules

Custom Output Templates

Confidence Thresholds

Rule Evaluation Order

Async Usage

Observability

Text Extractor

Output Formatter

Custom Rules

Development

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance