Integration of Sarvam AI platform with LangChain

These details have not been verified by PyPI

Project links

Project description

langchain-sarvam-integration

LangChain integration for Sarvam AI - Indian language LLM with native support for Hindi and other Indic languages.

langchain-sarvam-integration is an opinionated Python library to harness Sarvam AI through LangChain, LangGraph, and LangSmith, bringing Sarvam’s LLMs and APIs cleanly into chains, agents, and RAG workflows. It enables generative chat, task orchestration, and multilingual use cases—especially for Indian languages—while keeping prompt and response handling predictable. The package standardizes Sarvam as a first-class provider across the LangChain ecosystem and is fully LangSmith-compliant for tracing and evaluation. In practice, it removes integration glue code so your architecture stays intentional instead of “creative.” Think of it as serious plumbing with just enough wit to keep your stack from leaking.

⚠️ AI-Assisted Development Disclaimer

~95% of this codebase was written by AI coding agents (primarily Claude Code) with architectural guidance and review via GEMINI CLI.

This project demonstrates modern AI-assisted software development practices, with human oversight ensuring code quality, security, and functionality alignment. All code has been tested and reviewed before publication.

✨ Features

🤖 SarvamLLM - Simple prompt-response interface
💬 SarvamChat - Multi-turn conversation support
🎯 Multiple Models - Support for sarvam-m, sarvam-105b, and sarvam-30b models
⚡ Async Support - Non-blocking async operations
🌊 Streaming Support - Real-time response streaming
🧠 Reasoning Mode - Built-in thinking capability
📚 Wiki Grounding - Factual query enhancement
🇮🇳 Hindi & Indic Languages - Native language support
📝 Structured Output - JSON extraction with Pydantic support
🔧 Task Planning - Automatic TODO list generation
🔨 Tool/Function Calling - Agent support for sarvam-30b and sarvam-105b models

Installation

pip install langchain-sarvam-integration

Or install from source:

git clone https://github.com/AmritSDutta/sarvam_lanchchain_integration.git
cd sarvam_lanchchain_integration
pip install -e .

Configuration

Set your Sarvam API key as an environment variable:

export SARVAM_API_KEY="your-api-key-here"

Or pass it directly when initializing:

from sarvam import SarvamLLM, SarvamChat

llm = SarvamLLM(api_key="your-api-key")
chat = SarvamChat(api_key="your-api-key")

Usage

LLM Style (Simple Prompt-Response)

from sarvam import SarvamLLM

# Using environment variable
llm = SarvamLLM()

response = llm.invoke("What is the capital of India?")
print(response)  # New Delhi

Chat Style (Conversation History)

from sarvam import SarvamChat
from langchain_core.messages import HumanMessage, SystemMessage

chat = SarvamChat()

response = chat.invoke([
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Tell me about Indian classical music.")
])
print(response.content)

Async Support

Both SarvamChat and SarvamLLM support async operations using async/await:

import asyncio
from sarvam import SarvamChat, SarvamLLM
from langchain_core.messages import HumanMessage

async def main():
    # Async chat invocation
    chat = SarvamChat()
    response = await chat.ainvoke([HumanMessage(content="Hello!")])
    print(response.content)

    # Async LLM invocation
    llm = SarvamLLM()
    response = await llm.ainvoke("What is the capital of India?")
    print(response)  # New Delhi

    # Multiple concurrent requests
    tasks = [
        chat.ainvoke([HumanMessage(content=f"Query {i}")])
        for i in range(5)
    ]
    responses = await asyncio.gather(*tasks)
    for r in responses:
        print(r.content)

asyncio.run(main())

Note: Since the Sarvam AI SDK doesn't support async operations natively, ainvoke() uses asyncio.to_thread() to run synchronous API calls in a thread pool, preventing event loop blocking while still being fully async-compatible.

Streaming Support

Both SarvamChat and SarvamLLM support streaming via stream() and astream() methods for real-time response processing:

from sarvam import SarvamChat, SarvamLLM

# SarvamChat streaming
chat = SarvamChat()
for chunk in chat.stream("Hello, how are you?"):
    print(chunk.content, end="")
print()

# SarvamLLM streaming
llm = SarvamLLM()
for chunk in llm.stream("Tell me a joke"):
    print(chunk.text, end="")
print()

# Async streaming
import asyncio

async def main():
    chat = SarvamChat()
    async for chunk in chat.astream("What's the weather like?"):
        print(chunk.content, end="")

asyncio.run(main())

Note: Sarvam AI API does not currently support native streaming. The streaming interface is implemented using single-chunk fallback - the complete response is yielded as one chunk. This provides LangChain compatibility while waiting for Sarvam AI to add native streaming support.

What is Native Streaming?

Native streaming (also called true streaming) is when the AI API sends the response piece-by-piece as it's being generated. Instead of waiting for the entire response to complete, you receive tokens in real-time:

# Example of native streaming (not currently supported by Sarvam AI)
for chunk in chat.stream("Tell me a story"):
    print(chunk.content, end="", flush=True)  # Prints word-by-word as generated
    # Output appears gradually: "Once" → "Once upon" → "Once upon a" → "Once upon a time"...

Benefits of native streaming:

Faster perceived response time: Users see text appearing immediately
Better UX for long responses: No waiting for complete generation
Real-time processing: Can start processing/analyzing partial responses
Lower memory usage: No need to buffer the entire response

Current implementation (single-chunk fallback):

The complete response is generated internally by Sarvam AI
Once generation is complete, the entire response is yielded as one chunk
Provides LangChain interface compatibility
Will automatically upgrade to native streaming when Sarvam AI adds support

Advanced Features

With Reasoning Effort (Thinking Mode)

Enable deeper reasoning for complex tasks:

from sarvam import SarvamLLM

llm = SarvamLLM(reasoning_effort="high")
response = llm.invoke("Solve: If 3x + 7 = 22, what is x?")

Options: "low", "medium", "high"

With Wiki Grounding

Get factual answers with wiki grounding enabled:

from sarvam import SarvamChat
from langchain_core.messages import HumanMessage

chat = SarvamChat(wiki_grounding=True)
response = chat.invoke([HumanMessage(content="What is the history of the Taj Mahal?")])

With Temperature Control

Control response randomness (0-2):

from sarvam import SarvamLLM

# Lower temperature for more focused responses
llm = SarvamLLM(temperature=0.3)
response = llm.invoke("Explain quantum computing")

# Higher temperature for more creative responses
llm = SarvamLLM(temperature=1.2)
response = llm.invoke("Write a story about a robot")

With Top-P Sampling

from sarvam import SarvamChat

chat = SarvamChat(top_p=0.9)
response = chat.invoke([HumanMessage(content="Hello!")])

Multi-turn Conversations

from sarvam import SarvamChat
from langchain_core.messages import HumanMessage, AIMessage

chat = SarvamChat()

messages = [
    HumanMessage(content="What are the two main styles of Indian classical music?"),
    AIMessage(content="The two main styles are Hindustani and Carnatic music."),
    HumanMessage(content="What's the difference between them?")
]

response = chat.invoke(messages)
print(response.content)

Hindi Language Support

from sarvam import SarvamChat
from langchain_core.messages import HumanMessage, SystemMessage

chat = SarvamChat(temperature=0.3)

response = chat.invoke([
    SystemMessage(content="आप एक सहायक हैं जो हिंदी में जवाब देता है।"),
    HumanMessage(content="भारत की राजधानी क्या है?")
])
print(response.content)

Structured JSON Output

Sarvam AI can return structured JSON output. The library includes utility functions to extract and parse JSON from responses, even when the model includes reasoning text before the JSON.

Simple JSON Output

Extract structured data using JSON format:

from sarvam import SarvamChat
from langchain_core.messages import HumanMessage
from pydantic import BaseModel

class Person(BaseModel):
    """Simple person model."""
    name: str
    age: int
    city: str

chat = SarvamChat(temperature=0.3)

response = chat.invoke([HumanMessage(content="""
Extract the person information and return as JSON:
Text: Priya Sharma is a 32 year old software engineer living in Delhi.

Return JSON with fields: name, age, city
""")])

# Parse response into Pydantic model
from sarvam import parse_structured_output
person = parse_structured_output(response.content, Person)

print(person.name)      # Priya Sharma
print(person.age)       # 32
print(person.city)      # Delhi

Nested JSON Output

Work with complex nested structures:

from sarvam import SarvamChat
from langchain_core.messages import HumanMessage
from pydantic import BaseModel

class Address(BaseModel):
    """Address model."""
    street: str
    city: str
    country: str
    zipcode: str

class Company(BaseModel):
    """Company model with nested address."""
    name: str
    industry: str
    employee_count: int
    headquarters: Address

chat = SarvamChat(temperature=0.3)

response = chat.invoke([HumanMessage(content="""
Extract company information and return as JSON:
Text: TechCorp India is a software development company with 250 employees.
Their headquarters is at 123 MG Road, Bengaluru, India, 560001.

Return JSON with: name, industry, employee_count, headquarters (object with street, city, country, zipcode)
""")])

# Parse into nested Pydantic models
from sarvam import parse_structured_output
company = parse_structured_output(response.content, Company)

print(company.name)                    # TechCorp India
print(company.employee_count)          # 250
print(company.headquarters.city)       # Bengaluru
print(company.headquarters.country)    # India

Manual JSON Extraction

For more control, use the utility functions directly:

from sarvam import SarvamChat, parse_json_response
from langchain_core.messages import HumanMessage

chat = SarvamChat()

response = chat.invoke([HumanMessage(content="""
Return the following data as JSON:
Name: Raj Kumar
Age: 28
City: Mumbai
""")])

# Extract JSON (handles markdown blocks and reasoning text)
data = parse_json_response(response.content)
print(data["name"])  # Raj Kumar

Note: Sarvam AI may include reasoning text before the JSON output, especially with reasoning_effort="high" (default). The utility functions automatically handle this and extract the JSON portion.

Task Planning

Generate structured TODO lists from user requests - perfect for breaking down complex tasks into actionable steps:

Quick Example:

from sarvam import SarvamChat, parse_structured_output
from langchain_core.messages import HumanMessage
from pydantic import BaseModel

class TodoItem(BaseModel):
    title: str
    description: str

class TodoList(BaseModel):
    todos: list[TodoItem]

chat = SarvamChat(reasoning_effort="low", temperature=0.3)

# Simple task planning
response = chat.invoke([HumanMessage(content="""
Convert this task into a JSON TODO list with 2-4 items:
'Organize a bookshelf by genre and author'

Return format: {"todos": [{"title": "...", "description": "..."}]}
""")])

todo_list = parse_structured_output(response.content, TodoList)

for i, todo in enumerate(todo_list.todos, 1):
    print(f"{i}. {todo.title}")

Advanced Example:

Generate structured TODO lists from user requests:

from sarvam import SarvamChat, parse_structured_output
from langchain_core.messages import HumanMessage
from pydantic import BaseModel

class TodoItem(BaseModel):
    """A single TODO item."""
    title: str
    description: str

class TodoList(BaseModel):
    """A list of TODO items."""
    todos: list[TodoItem]

chat = SarvamChat(reasoning_effort="low", temperature=0.3)

response = chat.invoke([HumanMessage(content="""
You are a task-planning assistant. Analyze the user's task.

Your goal is to convert a user request into a clear, actionable TODO list.
Plan at the minimum sufficient granularity.

Rules:
- If the task is simple or routine → generate 2–4 TODOs
- If the task is moderately complex → generate 4–6 TODOs
- If the task is complex or multi-stage → rarely generate 7–10 TODOs
- Titles must be concise (≤ 10 words)
- Descriptions must be concrete and outcome-oriented

Your output must be JSON with this structure:
{
  "todos": [
    {
      "title": "Short task title",
      "description": "Detailed description of what needs to be done"
    }
  ]
}

Task: Analyze recent gold price surge globally
""")])

# Parse into structured TODO list
todo_list = parse_structured_output(response.content, TodoList)

# Display the generated TODOs
for i, todo in enumerate(todo_list.todos, 1):
    print(f"{i}. {todo.title}")
    print(f"   {todo.description}\n")

Example output:

1. Identify key economic indicators
   Analyze inflation rates, interest rates, and currency fluctuations affecting gold prices

2. Research geopolitical factors
   Examine international conflicts and trade tensions impacting gold demand

3. Study market sentiment data
   Assess investor behavior and trading volume patterns in gold markets

4. Review central bank policies
   Analyze Federal Reserve and global central bank actions on gold reserves

Tip: Use reasoning_effort="low" for more direct responses without extensive reasoning text.

Tool/Function Calling

Tool/function calling is supported for sarvam-30b and sarvam-105b models only. This enables building LangChain agents that can use external tools and APIs.

from sarvam import SarvamChat
from langchain_core.tools import tool

# Define a tool
@tool
def get_weather(location: str) -> str:
    """Get the current weather for a location."""
    # In a real application, this would call a weather API
    return f"Sunny and 25°C in {location}"

# Use sarvam-30b or sarvam-105b for tool support
chat = SarvamChat(model="sarvam-30b")
bound_chat = chat.bind_tools([get_weather])

response = bound_chat.invoke("What's the weather in Mumbai?")

# If the model chooses to use the tool, response.additional_kwargs["tool_calls"] will contain the tool calls
if "tool_calls" in response.additional_kwargs:
    tool_calls = response.additional_kwargs["tool_calls"]
    for tool_call in tool_calls:
        print(f"Tool: {tool_call['function']['name']}")
        print(f"Arguments: {tool_call['function']['arguments']}")
else:
    print(response.content)

Controlling Tool Behavior with tool_choice:

from sarvam import SarvamChat
from langchain_core.tools import tool

@tool
def search(query: str) -> str:
    """Search the web."""
    return f"Results for: {query}"

# Force the model to use a tool
chat = SarvamChat(model="sarvam-105b", tool_choice="required")
bound_chat = chat.bind_tools([search])

# The model must call at least one tool
response = bound_chat.invoke("Find information about quantum computing")

Note: The sarvam-m model does not support tools. If you bind tools with sarvam-m, they will be ignored and an info message will be logged.

Tool Calling Protocol

When using tool calling with supported models (sarvam-30b, sarvam-105b, sarvam-30b-16k, sarvam-105b-32k), the interaction follows a multi-turn protocol:

1. First Response: Model may return tool_calls with blank/empty content

This is expected behavior - the model is requesting to use tools rather than providing a text response directly:

from sarvam import SarvamChat
from langchain_core.tools import tool

@tool
def get_weather(location: str) -> str:
    """Get the current weather for a location."""
    return f"Sunny and 25°C in {location}"

chat = SarvamChat(model="sarvam-30b")
bound_chat = chat.bind_tools([get_weather])

# First API call
response = bound_chat.invoke("What's the weather in Mumbai?")

# Check if model wants to use tools
if "tool_calls" in response.additional_kwargs:
    # response.content may be empty or blank - this is EXPECTED
    print(f"Content: '{response.content}'")  # Might be "" or very brief
    print(f"Tool calls: {response.additional_kwargs['tool_calls']}")

2. Tool Result Submission: Send tool results back via ToolMessage

After extracting tool calls, execute the tools and submit results:

from langchain_core.messages import ToolMessage

messages = [
    HumanMessage("What's the weather in Mumbai?"),
    response  # First response with tool_calls
]

# Execute tools and add results
for tool_call in response.additional_kwargs["tool_calls"]:
    # Parse arguments and execute the tool
    import json
    args = json.loads(tool_call["function"]["arguments"])
    result = get_weather(**args)  # Execute the function

    # Add tool result to conversation
    messages.append(ToolMessage(
        content=result,
        tool_call_id=tool_call["id"]
    ))

# Second API call - get final response
final_response = bound_chat.invoke(messages)
print(final_response.content)  # Now contains the actual answer

3. Second Response: Model returns the actual text content

After processing tool results, the model provides a comprehensive text response:

# final_response.content now contains the actual answer
# Example: "The weather in Mumbai is sunny and 25°C."

Key Points:

Blank content on tool call requests is correct behavior per OpenAI's tool calling protocol
Always check response.additional_kwargs["tool_calls"] for tool requests
Use ToolMessage to submit tool results back to the model
The second call will have non-blank content with the actual answer

Parameters

Parameter	Type	Default	Description
`api_key`	`str`	`None`	Sarvam API key (or use `SARVAM_API_KEY` env var)
`model`	`str`	`"sarvam-m"`	Model to use: `"sarvam-m"`, `"sarvam-105b"`, or `"sarvam-30b"`
`temperature`	`float`	`0.5`	Sampling temperature (0-2)
`top_p`	`float`	`1.0`	Nucleus sampling (0-1)
`reasoning_effort`	`str`	`"high"`	Reasoning level: `"low"`, `"medium"`, `"high"`
`wiki_grounding`	`bool`	`False`	Enable wiki grounding for factual queries
`max_tokens`	`int`	Model-specific	Maximum tokens to generate: `8192` for sarvam-m and sarvam-30b-16k, `16384` for others (auto-configured)
`tool_choice`	`str`	`None`	Tool choice mode: `"none"`, `"auto"`, `"required"`, or specific tool (only for sarvam-30b and sarvam-105b)

Available Models

sarvam-m: Default model, good for general-purpose tasks (does NOT support tool calling)
sarvam-105b: Larger model (105B parameters) for more complex reasoning and better quality responses (supports tool calling)
sarvam-105b-32k: Extended context variant (32k tokens) with same capabilities as sarvam-105b (supports tool calling)
sarvam-30b: Medium-sized model (30B parameters) balancing performance and speed (supports tool calling)
sarvam-30b-16k: Extended context variant (16k tokens) with same capabilities as sarvam-30b (supports tool calling)

Using Different Models

from sarvam import SarvamChat, SarvamLLM

# Use the default sarvam-m model
chat_default = SarvamChat()

# Use sarvam-105b for more complex tasks
chat_105b = SarvamChat(model="sarvam-105b")
response = chat_105b.invoke([HumanMessage(content="Explain quantum entanglement in detail")])

# Use sarvam-30b for balanced performance
llm_30b = SarvamLLM(model="sarvam-30b")
response = llm_30b.invoke("Summarize the key events of the Indian independence movement")

# You can also use models with other parameters
chat = SarvamChat(
    model="sarvam-105b",
    temperature=0.3,
    reasoning_effort="high"
)

LangSmith Integration

Both SarvamChat and SarvamLLM support LangSmith tracing out of the box for LLM observability, debugging, and cost tracking.

Setup

Enable LangSmith by setting environment variables:

export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_API_KEY="your-langsmith-api-key"
export LANGCHAIN_PROJECT="your-project-name"

Automatic Token Tracking

Token usage is automatically tracked and sent to LangSmith for all operations:

from sarvam import SarvamChat
from langchain_core.messages import HumanMessage

chat = SarvamChat()

# Token counts automatically appear in LangSmith
response = chat.invoke([HumanMessage(content="Hello!")])

# Also works with streaming
for chunk in chat.stream("Tell me a story"):
    print(chunk.content, end="")

# And async operations
import asyncio
async def main():
    response = await chat.ainvoke([HumanMessage(content="Hello!")])
asyncio.run(main())

What's Tracked

Token counts: input_tokens, output_tokens, total_tokens
Model metadata: Provider (sarvam), model name, parameters
Latency: Request duration and timing
Errors: API failures with error details

View Traces

After running your code, visit smith.langchain.com to view:

Request/response pairs
Token usage and costs
Latency metrics
Error traces

Limitations

Streaming: The stream() and astream() methods are implemented with single-chunk fallback since Sarvam AI API does not currently support native streaming. The complete response is yielded as one chunk for LangChain compatibility. Native streaming will be supported when the Sarvam AI API adds this feature.

Tool/Function Calling: Tool/function calling is supported for sarvam-30b and sarvam-105b models only. The sarvam-m model does not support tools. When you bind tools with sarvam-m, they will be stored but an info message will be logged indicating that the API won't use them.

from sarvam import SarvamChat
from langchain_core.tools import tool

@tool
def get_weather(location: str) -> str:
    """Get the current weather for a location."""
    return f"Sunny in {location}"

# Works with sarvam-30b or sarvam-105b
chat_30b = SarvamChat(model="sarvam-30b")
bound_chat = chat_30b.bind_tools([get_weather])
response = bound_chat.invoke("What's the weather in Mumbai?")
# Tools will be passed to the API

# sarvam-m does not support tools
chat_m = SarvamChat(model="sarvam-m")  # or just SarvamChat()
bound_chat_m = chat_m.bind_tools([get_weather])
response_m = bound_chat_m.invoke("What's the weather in Mumbai?")
# Info logged: Tool calling not supported for sarvam-m. Tools will be ignored.

Development

Install with dev dependencies:

pip install -e ".[dev]"

Run tests:

# Run all tests (unit + integration - requires API key)
pytest

# Run only unit tests (no API key required)
pytest -m "not integration"

# Run model-specific integration tests
pytest test/sarvam/test_integration_105b.py -v
pytest test/sarvam/test_integration_30b.py -v

Format code:

ruff format .

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Community

🐛 Bug Reports: Open an issue
💡 Feature Requests: Open an issue
❓ Questions: Open an issue
📖 Code of Conduct: CODE_OF_CONDUCT.md
🔒 Security Policy: SECURITY.md

License

Changelog

See CHANGELOG.md for a list of changes in each version.

Links

Made with ❤️ for the LangChain community

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.9

Mar 23, 2026

0.1.8

Mar 12, 2026

0.1.7

Feb 10, 2026

0.1.6

Feb 9, 2026

0.1.5

Feb 6, 2026

0.1.4

Feb 3, 2026

0.1.3

Feb 3, 2026

0.1.2

Feb 2, 2026

0.1.1

Feb 2, 2026

0.1.0

Feb 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_sarvam_integration-0.1.9.tar.gz (26.5 kB view details)

Uploaded Mar 23, 2026 Source

File details

Details for the file langchain_sarvam_integration-0.1.9.tar.gz.

File metadata

Download URL: langchain_sarvam_integration-0.1.9.tar.gz
Upload date: Mar 23, 2026
Size: 26.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for langchain_sarvam_integration-0.1.9.tar.gz
Algorithm	Hash digest
SHA256	`951b213cdec7a8a9c5f992d11f6eb2706fb9ef1c9cd8d0527e6e31e349ff767d`
MD5	`fb487ff92d177f73fc47c9c23a6276b8`
BLAKE2b-256	`a7b9a7328d1bf955da3c564bf035e28eeae222f7295fbf212e4fc288b6583454`

See more details on using hashes here.

langchain-sarvam-integration 0.1.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

langchain-sarvam-integration

⚠️ AI-Assisted Development Disclaimer

✨ Features

Installation

Configuration

Usage

LLM Style (Simple Prompt-Response)

Chat Style (Conversation History)

Async Support

Streaming Support

What is Native Streaming?

Advanced Features

With Reasoning Effort (Thinking Mode)

With Wiki Grounding

With Temperature Control

With Top-P Sampling

Multi-turn Conversations

Hindi Language Support

Structured JSON Output

Simple JSON Output

Nested JSON Output

Manual JSON Extraction

Task Planning

Tool/Function Calling

Tool Calling Protocol

Parameters

Available Models

Using Different Models

LangSmith Integration

Setup

Automatic Token Tracking

What's Tracked

View Traces

Limitations

Development

Contributing

Community

License

Changelog

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes