DataHub Agent Context - MCP Tools for AI Agents
Project description
DataHub Agent Context
DataHub Agent Context provides a collection of tools and utilities for building AI agents that interact with DataHub metadata. This package contains MCP (Model Context Protocol) tools that enable AI agents to search, retrieve, and manipulate metadata in DataHub. These can be used directly to create an agent, or be included in an MCP server such as Datahub's open source MCP server.
Features
Installation
Base Installation
python3 -m pip install --upgrade pip wheel setuptools
python3 -m pip install --upgrade datahub-agent-context
With LangChain Support
For building LangChain agents with pre-built tools:
python3 -m pip install --upgrade "datahub-agent-context[langchain]"
Prerequisites
This package requires:
- Python 3.9 or higher
acryl-datahubpackage
Quick Start
Basic Example
These tools are designed to be used with an AI agent and have the responses passed directly to an LLM, so the return schema is a simple dict, but they can be used independently if desired.
from datahub.ingestion.graph.client import DataHubGraph
from datahub_agent_context.mcp_tools.search import search
from datahub_agent_context.mcp_tools.entities import get_entities
# Initialize DataHub graph client
client = DataHubClient.from_env()
# Search for datasets
with client as client:
results = search(
query="user_data",
filters={"entity_type": ["dataset"]},
num_results=10
)
# Get detailed entity information
with client as client:
entities = get_entities(
urns=[result["entity"]["urn"] for result in results["searchResults"]]
)
LangChain Integration
Build AI agents with pre-built LangChain tools:
from datahub.sdk.main_client import DataHubClient
from datahub_agent_context.langchain_tools import build_langchain_tools
from langchain.agents import create_agent
# Initialize DataHub client
client = DataHubClient.from_env()
# Build all tools (read-only by default)
tools = build_langchain_tools(client, include_mutations=False)
# Or include mutation tools for tagging, descriptions, etc.
tools = build_langchain_tools(client, include_mutations=True)
# Create agent
agent = create_agent(model, tools=tools, system_prompt="...")
See examples/langchain/ for complete LangChain agent examples including:
- simple_search.py - Minimal example with AWS Bedrock
Available Tools
Search Tools
search()- Search across all entity types with filters and sortingsearch_documents()- Search specifically for Document entitiesgrep_documents()- Grep for patterns in document content
Entity Tools
get_entities()- Get detailed information about entities by URNlist_schema_fields()- List and filter schema fields for datasets
Lineage Tools
get_lineage()- Get upstream or downstream lineageget_lineage_paths_between()- Get detailed paths between two entities
Query Tools
get_dataset_queries()- Get SQL queries for datasets or columns
Mutation Tools
add_tags(),remove_tags()- Manage tagsupdate_description()- Update entity descriptionsset_domains(),remove_domains()- Manage domainsadd_owners(),remove_owners()- Manage ownersadd_glossary_terms(),remove_glossary_terms()- Manage glossary termsadd_structured_properties(),remove_structured_properties()- Manage structured propertiessave_document()- Save or update a Document.
User Tools
get_me()- Get information about the authenticated user
Architecture
The package is organized into the following modules:
mcp_tools/- Core MCP tool implementationsbase.py- Base GraphQL execution and response cleaningsearch.py- Search functionalitydocuments.py- Document search and grepentities.py- Entity retrievallineage.py- Lineage queryingqueries.py- Query retrievaltags.py,descriptions.py,domains.py, etc. - Mutation toolshelpers.py- Shared utility functionsgql/- GraphQL query definitions
Development
Setup
# Clone the repository
git clone https://github.com/datahub-project/datahub.git
cd datahub/datahub-agent-context
# Set up development environment
./gradlew :datahub-agent-context:installDev
# Run tests
./gradlew :datahub-agent-context:testQuick
# Run linting
./gradlew :datahub-agent-context:lintFix
Testing
The package includes comprehensive unit tests for all tools:
# Run quick tests
./gradlew :datahub-agent-context:testQuick
# Run full test suite
./gradlew :datahub-agent-context:testFull
Support
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datahub_agent_context-1.4.0.1rc1.tar.gz.
File metadata
- Download URL: datahub_agent_context-1.4.0.1rc1.tar.gz
- Upload date:
- Size: 89.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c2d7bd05a0eab30f29ae05547335de8760bf8e4b79fc204b62a5948e1a0302b
|
|
| MD5 |
6db3717603dec3c1f8d09bd4918cd36d
|
|
| BLAKE2b-256 |
bf53824b39d3d92f04c493c388ba28659bf68ad8fde97917a98e7b753ccd4b86
|
File details
Details for the file datahub_agent_context-1.4.0.1rc1-py3-none-any.whl.
File metadata
- Download URL: datahub_agent_context-1.4.0.1rc1-py3-none-any.whl
- Upload date:
- Size: 120.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd29bd4c3c25deb7dbff680ab396d9d13c9b8584b78244f664bfe3ad8fc90b93
|
|
| MD5 |
1fef1e9ac24a303c492fd741a8b31208
|
|
| BLAKE2b-256 |
e2d5be60d954afa82a5eae531e1027ec7f0ba0e7f6957297bbb430690ddbfa73
|