A comprehensive LLM configuration package supporting multiple providers (OpenAI, VLLM, Gemini, Infinity) for chat assistants and embeddings

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

liux2

These details have not been verified by PyPI

Project description

Langchain LLM Config

Yet another redundant Langchain abstraction: comprehensive Python package for managing and using multiple LLM providers (OpenAI, VLLM, Gemini, Infinity) with a unified interface for both chat assistants and embeddings.

Features

🤖 Multiple Chat Providers: Support for OpenAI, VLLM, and Gemini
🔗 Multiple Embedding Providers: Support for OpenAI, VLLM, and Infinity
⚙️ Unified Configuration: Single YAML configuration file for all providers
🚀 Easy Setup: CLI tool for quick configuration initialization
🔄 Easy Context Concatenation: Simplified process for combining contexts into chat
🔒 Environment Variables: Secure API key management
📦 Self-Contained: No need to import specific paths
⚡ Async Support: Full async/await support for all operations
🌊 Streaming Chat: Real-time streaming responses for interactive experiences
🛠️ Enhanced CLI: Environment setup and validation commands
🪶 Lightweight Core: Minimal dependencies with optional provider-specific packages
🎯 Flexible Installation: Install only the providers you need

What's New in V2 Configuration

The V2 configuration format introduces a model-centric approach that provides:

Key Benefits

✅ Simpler API: Reference models by name instead of provider/type hierarchy
✅ More Flexible: Define multiple models per provider with different configurations
✅ Better Defaults: Set default models by name, not provider
✅ VLM Support: Ready for vision-language models with dedicated vlm type
✅ Clearer Structure: Each model is a first-class entity with its own config
✅ Easy Migration: One command to migrate from V1 to V2

Quick Comparison

V1 (Old):

# Provider-centric: specify provider name
assistant = create_assistant(provider="openai", ...)

V2 (New):

# Model-centric: specify model name
assistant = create_assistant(model="gpt-4-turbo", ...)

See the Configuration Reference section for detailed examples.

Installation

Basic Installation

The package has a lightweight core with optional dependencies for specific providers.

Core installation (minimal dependencies):

# Using uv (recommended)
uv add langchain-llm-config

# Using pip
pip install langchain-llm-config

Provider-Specific Installation

With OpenAI support:

uv add "langchain-llm-config[openai]"
pip install "langchain-llm-config[openai]"

With VLLM support:

uv add "langchain-llm-config[vllm]"
pip install "langchain-llm-config[vllm]"

With Gemini support:

uv add "langchain-llm-config[gemini]"
pip install "langchain-llm-config[gemini]"

With Infinity embeddings support:

uv add "langchain-llm-config[infinity]"
pip install "langchain-llm-config[infinity]"

With local models support (sentence-transformers):

uv add "langchain-llm-config[local-models]"
pip install "langchain-llm-config[local-models]"

Convenience Groups

All assistant providers (OpenAI, VLLM, Gemini):

uv add "langchain-llm-config[assistants]"
pip install "langchain-llm-config[assistants]"

All embedding providers (Infinity, local models):

uv add "langchain-llm-config[embeddings]"
pip install "langchain-llm-config[embeddings]"

Everything (all providers and features):

uv add "langchain-llm-config[all]"
pip install "langchain-llm-config[all]"

Development Installation

git clone https://github.com/liux2/Langchain-LLM-Config.git
cd langchain-llm-config
uv sync --dev
uv run pip install -e .

Dependency Optimization

This package is designed with a lightweight core approach:

Core Dependencies (Always Installed)

langchain-core - Core abstractions only (much lighter than full langchain)
langchain-openai - OpenAI and VLLM provider support
pydantic - Data validation and parsing
pyyaml - Configuration file parsing
python-dotenv - Environment variable management
openai - OpenAI client library

Optional Dependencies

Gemini: langchain-google-genai - Only installed with [gemini] extra
Infinity: langchain-community - Only installed with [infinity] extra
Local Models: sentence-transformers - Only installed with [local-models] extra

Benefits

✅ Smaller installation size - No heavy ML dependencies unless needed
✅ Faster installation - Skip unnecessary packages
✅ Cleaner environments - Only install what you use
✅ Better compatibility - Avoid conflicts from unused dependencies

Quick Start

1. Initialize Configuration

# Initialize config in current directory (v2 format by default)
llm-config init

# Or specify a custom location
llm-config init ~/.config/api.yaml

# Use legacy v1 format (deprecated)
llm-config init --format v1

This creates an api.yaml file with all supported providers configured using the new v2 model-centric format.

2. Set Up Environment Variables

# Set up environment variables and create .env file
llm-config setup-env

# Or with custom config path
llm-config setup-env --config-path ~/.config/.env

This creates a .env file with placeholders for your API keys.

3. Configure Your Providers

Edit the generated api.yaml file with your API keys and settings.

V2 Configuration Format (Recommended)

The new model-centric configuration format allows you to define models independently:

# Default models to use
default:
  chat_provider: gpt-3.5-turbo
  embedding_provider: text-embedding-ada-002

# Model definitions
models:
  gpt-3.5-turbo:
    model_type: chat
    provider_type: openai
    model_config:
      api_base: https://api.openai.com/v1
      api_key: ${OPENAI_API_KEY}
      model_name: gpt-3.5-turbo
      temperature: 0.7
      max_tokens: 8192

  text-embedding-ada-002:
    model_type: embedding
    provider_type: openai
    model_config:
      api_base: https://api.openai.com/v1
      api_key: ${OPENAI_API_KEY}
      model_name: text-embedding-ada-002

  llama-2-local:
    model_type: chat
    provider_type: vllm
    model_config:
      api_base: http://localhost:8000/v1
      api_key: ${OPENAI_API_KEY}
      model_name: meta-llama/Llama-2-7b-chat-hf
      temperature: 0.6
      extra_body:
        return_reasoning: false  # Set to true for reasoning output

V1 Configuration Format (Legacy, Auto-Converted)

The old provider-centric format is still supported but deprecated:

llm:
  openai:
    chat:
      api_base: "https://api.openai.com/v1"
      api_key: "${OPENAI_API_KEY}"
      model_name: "gpt-3.5-turbo"
  default:
    chat_provider: "openai"

Note: V1 configs are automatically converted to V2 at runtime with a deprecation warning.

4. Set Environment Variables

Edit the .env file with your actual API keys:

OPENAI_API_KEY=your-openai-api-key
GEMINI_API_KEY=your-gemini-api-key

5. Use in Your Code

Basic Usage (Synchronous)

from langchain_llm_config import create_assistant, create_embedding_provider
from pydantic import BaseModel, Field
from typing import List


# Define your response model
class ArticleAnalysis(BaseModel):
    summary: str = Field(..., description="Article summary")
    keywords: List[str] = Field(..., description="Key topics")
    sentiment: str = Field(..., description="Overall sentiment")


# V2 API: Use model names directly (recommended)
assistant = create_assistant(
    model="gpt-3.5-turbo",  # Reference model by name from config
    response_model=ArticleAnalysis,
    system_prompt="You are a helpful article analyzer.",
)

# Use the assistant - returns dict with parsed data
# Note: Structured output returns a dict, not a Pydantic model instance
result = assistant.ask("Analyze this article: ...")
print(result["summary"])  # Access as dict
print(result["keywords"])
print(result["sentiment"])

# Raw text mode (no structured output)
assistant_raw = create_assistant(
    model="gpt-3.5-turbo",
    auto_apply_parser=False,  # Disable parsing
    system_prompt="You are a helpful assistant.",
)
result = assistant_raw.ask("Tell me a joke")
print(result)  # Returns string

# Create an embedding provider
embedding_provider = create_embedding_provider(
    model="text-embedding-ada-002"  # Reference model by name
)

# Get embeddings (synchronous)
texts = ["Hello world", "How are you?"]
embeddings = embedding_provider.embed_texts(texts)

# V1 API: Still supported (deprecated)
assistant_v1 = create_assistant(
    provider="openai",  # Old way - provider name
    response_model=ArticleAnalysis,
)

Advanced Usage (Asynchronous)

import asyncio

# Use the assistant (asynchronous)
result = await assistant.ask_async("Analyze this article: ...")
print(result["summary"])

# Get embeddings (asynchronous)
embeddings = await embedding_provider.embed_texts_async(texts)

Streaming Chat

import asyncio
from langchain_llm_config import create_assistant


async def main():
    """Main async function to run the streaming chat example"""
    # Create assistant with auto_apply_parser=False for streaming
    assistant = create_assistant(
        provider="openai",
        system_prompt="You are a helpful assistant.",
        auto_apply_parser=False,  # Required for streaming
    )

    print("🤖 Starting streaming chat...")
    print("Response: ", end="", flush=True)

    try:
        # Simple streaming - just get text chunks
        async for chunk in assistant.chat_async("Tell me a story"):
            print(chunk, end="", flush=True)

        print("\n")

        # Advanced streaming - get chunks with metadata
        async for chunk in assistant.chat_stream("Explain quantum computing"):
            if chunk["type"] == "stream":
                print(chunk["content"], end="", flush=True)
            elif chunk["type"] == "final":
                print(f"\n\nProcessing time: {chunk['processing_time']:.2f}s")
                print(f"Model used: {chunk['model_used']}")
    except Exception as e:
        print(f"\n❌ Error occurred: {e}")


if __name__ == "__main__":
    # Run the async function
    asyncio.run(main())

Kunlun API (Bearer Token Authentication)

Kunlun APIs are OpenAI-compatible but use bearer token authentication instead of API keys.

from langchain_llm_config import create_assistant, create_embedding_provider
from pydantic import BaseModel, Field

# Define your response model
class Analysis(BaseModel):
    summary: str = Field(description="Brief summary")
    key_points: list[str] = Field(description="Key points")

# Create Kunlun assistant with thinking mode enabled
assistant = create_assistant(
    model="kunlun-qwen3-235b",
    response_model=Analysis,
    system_prompt="You are a helpful AI assistant."
)

# Use the assistant
result = assistant.ask("Analyze the impact of AI on society")
print(result["summary"])
print(result["key_points"])

# Create Kunlun embedding provider
embedding_provider = create_embedding_provider(model="kunlun-bge-m3")
embeddings = embedding_provider.embed_texts(["Hello world", "AI is amazing"])

Configuration:

models:
  kunlun-qwen3-235b:
    model_type: chat
    provider_type: kunlun
    model_config:
      api_base: ${KUNLUN_QWEN3_235B_API_BASE}  # Your Kunlun API endpoint
      bearer_token: ${KUNLUN_BEARER_TOKEN}
      model_name: Qwen3-235B-A22B
      temperature: 0.7
      max_tokens: 8000
      extra_body:
        chat_template_kwargs:
          enable_thinking: true  # Enable reasoning mode

  kunlun-bge-m3:
    model_type: embedding
    provider_type: kunlun
    model_config:
      api_base: ${KUNLUN_BGE_M3_API_BASE}  # Your Kunlun API endpoint
      bearer_token: ${KUNLUN_BEARER_TOKEN}
      model_name: embedding
      dimensions: 1024

Environment Variables:

export KUNLUN_BEARER_TOKEN="your_jwt_token_here"
export KUNLUN_QWEN3_235B_API_BASE="https://your-kunlun-endpoint/v1"
export KUNLUN_BGE_M3_API_BASE="https://your-kunlun-endpoint/v1"

Key Features:

🔐 Bearer Token Authentication: Uses JWT tokens instead of API keys
🧠 Thinking Mode: Enable reasoning with chat_template_kwargs.enable_thinking
🔌 OpenAI-Compatible: Works with standard OpenAI API format
🚀 Full Feature Support: Streaming, structured output, embeddings

Supported Providers

Chat Providers

Provider	Models	Features	Installation
OpenAI	GPT-3.5, GPT-4, etc.	Streaming, function calling, structured output	✅ Core (always available)
VLLM	Any HuggingFace model	Local deployment, high performance	✅ Core (always available)
Gemini	Gemini Pro, etc.	Google's latest models	📦 `[gemini]` extra required
Kunlun	Qwen3, etc.	Bearer token auth, thinking mode	✅ Core (always available)

Embedding Providers

Provider	Models	Features	Installation
OpenAI	text-embedding-ada-002, etc.	High quality, reliable	✅ Core (always available)
VLLM	BGE, sentence-transformers	Local deployment	✅ Core (always available)
Infinity	Various embedding models	Fast inference	📦 `[infinity]` extra required
Kunlun	BGE, Qwen3-Embedding, etc.	Bearer token auth	✅ Core (always available)

CLI Commands

# Initialize a new configuration file (v2 format by default)
llm-config init [path]
llm-config init --format v2  # Explicit v2 format
llm-config init --format v1  # Legacy v1 format

# Migrate v1 config to v2 format
llm-config migrate [--output path]

# Set up environment variables and create .env file
llm-config setup-env [path] [--force]

# Validate existing configuration
llm-config validate [path]

# Show package information
llm-config info

Advanced Usage

Custom Configuration Path

from langchain_llm_config import create_assistant

assistant = create_assistant(
    response_model=MyModel,
    config_path="/path/to/custom/api.yaml"
)

Context-Aware Conversations

# Add context to your queries
result = await assistant.ask_async(
    query="What are the main points?",
    context="This is a research paper about machine learning...",
    extra_system_prompt="Focus on technical details."
)

Direct Provider Usage

from langchain_llm_config import VLLMAssistant, OpenAIEmbeddingProvider

# Core providers (always available)
vllm_assistant = VLLMAssistant(
    config={"api_base": "http://localhost:8000/v1", "model_name": "llama-2"},
    response_model=MyModel
)

openai_embeddings = OpenAIEmbeddingProvider(
    config={"api_key": "your-key", "model_name": "text-embedding-ada-002"}
)

# Optional providers (require extras)
# from langchain_llm_config import GeminiAssistant  # requires [gemini]
# from langchain_llm_config import InfinityEmbeddingProvider  # requires [infinity]

Complete Example with Error Handling

import asyncio
from langchain_llm_config import create_assistant, create_embedding_provider
from pydantic import BaseModel, Field
from typing import List

class ChatResponse(BaseModel):
    message: str = Field(..., description="The assistant's response message")
    confidence: float = Field(..., description="Confidence score", ge=0.0, le=1.0)
    suggestions: List[str] = Field(default_factory=list, description="Follow-up questions")

async def main():
    try:
        # Create assistant
        assistant = create_assistant(
            response_model=ChatResponse,
            provider="openai",
            system_prompt="You are a helpful AI assistant."
        )
        
        # Chat conversation
        response = await assistant.ask_async("What is the capital of France?")
        print(f"Assistant: {response['message']}")
        print(f"Confidence: {response['confidence']:.2f}")
        
        # Create embedding provider
        embedding_provider = create_embedding_provider(provider="openai")
        
        # Get embeddings
        texts = ["Hello world", "How are you?"]
        embeddings = await embedding_provider.embed_texts_async(texts)
        print(f"Generated {len(embeddings)} embeddings")
        
    except Exception as e:
        print(f"Error: {e}")

# Run the example
asyncio.run(main())

Configuration Reference

Environment Variables

The package supports environment variable substitution in configuration:

api_key: "${OPENAI_API_KEY}"  # Will be replaced with actual value

V2 Configuration Structure (Recommended)

Model-centric configuration where each model is defined independently:

# Default models
default:
  chat_provider: model-name      # Model name for chat
  embedding_provider: model-name # Model name for embeddings

# Model definitions
models:
  model-name:
    model_type: chat | embedding | vlm
    provider_type: openai | vllm | gemini | infinity | reasoning
    model_config:
      api_base: "https://api.example.com/v1"
      api_key: "${API_KEY}"
      model_name: "actual-model-name"
      temperature: 0.7
      max_tokens: 8192
      top_p: 1.0
      connect_timeout: 60
      read_timeout: 60
      extra_body:
        return_reasoning: false  # Enable reasoning output (vLLM)
      # ... other provider-specific parameters

V1 Configuration Structure (Legacy)

Provider-centric configuration (automatically converted to v2 at runtime):

llm:
  provider_name:
    chat:
      api_base: "https://api.example.com/v1"
      api_key: "${API_KEY}"
      model_name: "model-name"
      # ... parameters
    embeddings:
      api_base: "https://api.example.com/v1"
      api_key: "${API_KEY}"
      model_name: "embedding-model"
  default:
    chat_provider: "provider_name"
    embedding_provider: "provider_name"

Migration from V1 to V2

Use the CLI migration tool:

# Migrate and create backup
llm-config migrate

# Specify output path
llm-config migrate --output api_v2.yaml

Or manually update your config following the v2 structure above.

Development

Testing with Different Provider Combinations

# Test core functionality only
uv sync --extra test
uv run pytest

# Test with all providers
uv sync --extra test --extra all
uv run pytest

# Test specific provider combinations
uv sync --extra test --extra gemini
uv run pytest tests/test_providers.py -k gemini

Running Tests

uv run pytest

Code Formatting

uv run black .
uv run isort .

Type Checking

uv run mypy .

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

License

MIT License - see LICENSE file for details.

Support

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

liux2

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.4

Mar 16, 2026

0.3.3

Mar 16, 2026

0.3.2

Dec 22, 2025

0.3.1

Dec 20, 2025

0.2.0

Aug 8, 2025

0.1.6

Jul 28, 2025

0.1.5

Jul 23, 2025

0.1.3

Jul 1, 2025

0.1.0

Jun 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_llm_config-0.3.4.tar.gz (1.1 MB view details)

Uploaded Mar 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

langchain_llm_config-0.3.4-py3-none-any.whl (831.9 kB view details)

Uploaded Mar 16, 2026 Python 3

File details

Details for the file langchain_llm_config-0.3.4.tar.gz.

File metadata

Download URL: langchain_llm_config-0.3.4.tar.gz
Upload date: Mar 16, 2026
Size: 1.1 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for langchain_llm_config-0.3.4.tar.gz
Algorithm	Hash digest
SHA256	`0d8ce4e56af8a0b9adbce3db176e7ad4b26fcc8106acdeefb8fa6496a2a9397a`
MD5	`e4f3290b51cec0a79789e11af8d26fd1`
BLAKE2b-256	`7e507b04696e16bf7bdeaf0495cccf42be8e8b929fc9350c6c0496b05a3d794b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_llm_config-0.3.4.tar.gz:

Publisher: python-publish.yml on liux2/Langchain-LLM-Config

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: langchain_llm_config-0.3.4.tar.gz
- Subject digest: 0d8ce4e56af8a0b9adbce3db176e7ad4b26fcc8106acdeefb8fa6496a2a9397a
- Sigstore transparency entry: 1109470908
- Sigstore integration time: Mar 16, 2026
Source repository:
- Permalink: liux2/Langchain-LLM-Config@ab260b50f6f8829e9f6e70f3f5d0d95810a41175
- Branch / Tag: refs/tags/v0.3.4
- Owner: https://github.com/liux2
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@ab260b50f6f8829e9f6e70f3f5d0d95810a41175
- Trigger Event: release

File details

Details for the file langchain_llm_config-0.3.4-py3-none-any.whl.

File metadata

Download URL: langchain_llm_config-0.3.4-py3-none-any.whl
Upload date: Mar 16, 2026
Size: 831.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for langchain_llm_config-0.3.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b5b75c17a06d571ee37347e60a3b48044efb9aa0e897b191f861ece082c67f55`
MD5	`000b15f994cc4cfcc5f1135163875841`
BLAKE2b-256	`d63e97f7432b5a6d6f95d4fa7c5c3d909acb458294eed5c299853a15f892e2b5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_llm_config-0.3.4-py3-none-any.whl:

Publisher: python-publish.yml on liux2/Langchain-LLM-Config

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: langchain_llm_config-0.3.4-py3-none-any.whl
- Subject digest: b5b75c17a06d571ee37347e60a3b48044efb9aa0e897b191f861ece082c67f55
- Sigstore transparency entry: 1109470923
- Sigstore integration time: Mar 16, 2026
Source repository:
- Permalink: liux2/Langchain-LLM-Config@ab260b50f6f8829e9f6e70f3f5d0d95810a41175
- Branch / Tag: refs/tags/v0.3.4
- Owner: https://github.com/liux2
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@ab260b50f6f8829e9f6e70f3f5d0d95810a41175
- Trigger Event: release

langchain-llm-config 0.3.4

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Langchain LLM Config

Features

What's New in V2 Configuration

Key Benefits

Quick Comparison

Installation

Basic Installation

Provider-Specific Installation

Convenience Groups

Development Installation

Dependency Optimization

Core Dependencies (Always Installed)

Optional Dependencies

Benefits

Quick Start

1. Initialize Configuration

2. Set Up Environment Variables

3. Configure Your Providers

V2 Configuration Format (Recommended)

V1 Configuration Format (Legacy, Auto-Converted)

4. Set Environment Variables

5. Use in Your Code

Basic Usage (Synchronous)

Advanced Usage (Asynchronous)

Streaming Chat

Kunlun API (Bearer Token Authentication)

Supported Providers

Chat Providers

Embedding Providers

CLI Commands

Advanced Usage

Custom Configuration Path

Context-Aware Conversations

Direct Provider Usage

Complete Example with Error Handling

Configuration Reference

Environment Variables

V2 Configuration Structure (Recommended)

V1 Configuration Structure (Legacy)

Migration from V1 to V2

Development

Testing with Different Provider Combinations

Running Tests

Code Formatting

Type Checking

Contributing

License

Support

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance