LangChain integration for Apache Iceberg with native PyIceberg API support
Project description
LangChain Iceberg Toolkit
A native LangChain integration for Apache Iceberg that enables AI-powered natural language queries over your data lakes. Built with PyIceberg for direct API access (not SQL strings) and featuring Iceberg-specific capabilities like time-travel, snapshots, and partition-aware queries.
Features
- 🚀 Native PyIceberg Integration - Direct API access, not SQL strings
- 🔍 Iceberg-Specific Tools - Snapshots, time-travel, partition-aware queries
- 📊 Optional Semantic Layer - YAML-driven metrics and dimensions
- 💬 Zero SQL Required - Natural language to Iceberg queries
- 🏢 Enterprise-Ready - Query limits and timeout protection
Installation
Using pip (standard)
pip install langchain-iceberg
For semantic layer support:
pip install langchain-iceberg[semantic]
Using uv (recommended by LangChain)
uv is a fast Python package installer recommended by LangChain:
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install package
uv pip install langchain-iceberg
# With semantic layer
uv pip install "langchain-iceberg[semantic]"
See INSTALL_WITH_UV.md for more details.
Quick Start
Basic Usage
from langchain_iceberg import IcebergToolkit
from langchain_openai import ChatOpenAI
from langchain.agents import create_react_agent
# Initialize toolkit
toolkit = IcebergToolkit(
catalog_name="prod",
catalog_config={
"type": "rest",
"uri": "http://localhost:8181",
"warehouse": "s3://my-warehouse"
}
)
# Get tools
tools = toolkit.get_tools()
# Create agent
llm = ChatOpenAI(model="gpt-4")
agent = create_react_agent(llm, tools)
# Query with natural language
result = agent.invoke({
"input": "Show me the top 10 orders by amount from the sales.orders table"
})
print(result)
Direct Tool Usage
from langchain_iceberg import IcebergToolkit
toolkit = IcebergToolkit(
catalog_name="rest",
catalog_config={
"type": "rest",
"uri": "http://localhost:8181",
"warehouse": "s3://warehouse/wh/"
}
)
tools = toolkit.get_tools()
# Use tools directly
list_ns = next(t for t in tools if t.name == "iceberg_list_namespaces")
namespaces = list_ns.run({})
print(namespaces)
query = next(t for t in tools if t.name == "iceberg_query")
results = query.run({
"table_id": "test.orders",
"filters": "status = 'completed'",
"limit": 10
})
print(results)
With Semantic Layer
# Load semantic YAML for business-friendly metrics
toolkit = IcebergToolkit(
catalog_name="prod",
catalog_config={...},
semantic_yaml="s3://bucket/semantic.yaml"
)
tools = toolkit.get_tools()
# Now includes auto-generated metric tools like get_total_revenue, get_order_count, etc.
agent = create_react_agent(llm, tools)
# Business question (no SQL needed!)
result = agent.invoke({
"input": "What was Q4 2024 revenue by customer segment?"
})
Time-Travel Queries
# Query historical data
result = agent.invoke({
"input": "Compare this month's revenue to the same period last year using time-travel"
})
Available Tools
The toolkit provides the following tools:
Catalog Exploration
iceberg_list_namespaces- List all namespaces in the catalogiceberg_list_tables- List tables in a namespaceiceberg_get_schema- Get table schema with sample data
Query Execution
iceberg_query- Execute queries with filters and column selectioniceberg_plan_query- LLM-assisted query planning
Time-Travel (Iceberg-Specific)
iceberg_snapshots- List table snapshotsiceberg_time_travel- Query data at a specific point in time
Semantic Layer (Auto-Generated)
get_{metric_name}- Auto-generated tools from YAML metrics
Documentation
Requirements
- Python 3.10+
- Apache Iceberg catalog (REST, Hive, Glue, or Nessie)
- Cloud storage (S3, ADLS, or GCS)
Contributing
Contributions are welcome! Please see our Contributing Guide for details.
License
Apache 2.0 License - see LICENSE file for details.
Support
- GitHub Issues: Report a bug or request a feature
- Documentation: Full documentation
Roadmap
- Core toolkit with catalog exploration
- Query execution tools
- Time-travel and snapshot tools
- Semantic layer with YAML support
- Governance features (access control, PII protection) - Planned for future release
- Query planner tool
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_iceberg-0.1.2.tar.gz.
File metadata
- Download URL: langchain_iceberg-0.1.2.tar.gz
- Upload date:
- Size: 42.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
77961548cdd583c964a8ea10a7d1ec7f07354550ee2c14e410d0b17e0ccc7516
|
|
| MD5 |
9bbe44b9aca995cea8258de1931de082
|
|
| BLAKE2b-256 |
952f980434c5f4442aedbb0a21ff309b99dc112d5063927ae7de8d15b58410c3
|
File details
Details for the file langchain_iceberg-0.1.2-py3-none-any.whl.
File metadata
- Download URL: langchain_iceberg-0.1.2-py3-none-any.whl
- Upload date:
- Size: 45.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
482291d6da051ee37a11240fd8540605086949db7a17c31ff9166ecf739015c2
|
|
| MD5 |
5597240b3ee8398042660772d66a4dac
|
|
| BLAKE2b-256 |
952466dc224dcc552cd58b5c885d3b26109bd42ff1d37ffd2f40bdff541c8b21
|