Skip to main content

DataHub Agent Context - MCP Tools for AI Agents

Project description

DataHub Agent Context

DataHub Agent Context provides a collection of tools and utilities for building AI agents that interact with DataHub metadata. This package contains MCP (Model Context Protocol) tools that enable AI agents to search, retrieve, and manipulate metadata in DataHub. These can be used directly to create an agent, or be included in an MCP server such as Datahub's open source MCP server.

Features

Installation

Base Installation

python3 -m pip install --upgrade pip wheel setuptools
python3 -m pip install --upgrade datahub-agent-context

With LangChain Support

For building LangChain agents with pre-built tools:

python3 -m pip install --upgrade "datahub-agent-context[langchain]"

Prerequisites

This package requires:

  • Python 3.9 or higher
  • acryl-datahub package

Quick Start

Basic Example

These tools are designed to be used with an AI agent and have the responses passed directly to an LLM, so the return schema is a simple dict, but they can be used independently if desired.

from datahub.ingestion.graph.client import DataHubGraph
from datahub_agent_context.mcp_tools.search import search
from datahub_agent_context.mcp_tools.entities import get_entities

# Initialize DataHub graph client
client = DataHubClient.from_env()

# Search for datasets
with client as client:
    results = search(
        query="user_data",
        filters={"entity_type": ["dataset"]},
        num_results=10
    )

# Get detailed entity information
with client as client:
    entities = get_entities(
        urns=[result["entity"]["urn"] for result in results["searchResults"]]
    )

LangChain Integration

Build AI agents with pre-built LangChain tools:

from datahub.sdk.main_client import DataHubClient
from datahub_agent_context.langchain_tools import build_langchain_tools

# Initialize DataHub client
client = DataHubClient.from_env()

# Build all tools (read-only by default)
tools = build_langchain_tools(client, include_mutations=False)

# Or include mutation tools for tagging, descriptions, etc.
tools = build_langchain_tools(client, include_mutations=True)

# Create agent
agent = create_agent(model, tools=tools, system_prompt="...")

DataHub Cloud Tools

If you're connected to a DataHub Cloud instance, you can add Cloud-only tools like Ask DataHub (AI-powered data assistant):

from datahub_agent_context.langchain_tools import build_langchain_tools, build_langchain_cloud_tools

client = DataHubClient.from_env()

# Base tools (works on any DataHub instance)
tools = build_langchain_tools(client, include_mutations=True)

# Add Cloud-only tools (requires DataHub Cloud)
tools += build_langchain_cloud_tools(client, ask_datahub=True)

The same pattern works for Google ADK:

from datahub_agent_context.google_adk_tools import build_google_adk_tools, build_google_adk_cloud_tools

tools = build_google_adk_tools(client, include_mutations=True)
tools += build_google_adk_cloud_tools(client, ask_datahub=True)

See examples/langchain/ for complete LangChain agent examples including:

Available Tools

Search Tools

  • search() - Search across all entity types with filters and sorting
  • search_documents() - Search specifically for Document entities
  • grep_documents() - Grep for patterns in document content

Entity Tools

  • get_entities() - Get detailed information about entities by URN
  • list_schema_fields() - List and filter schema fields for datasets

Lineage Tools

  • get_lineage() - Get upstream or downstream lineage
  • get_lineage_paths_between() - Get detailed paths between two entities

Query Tools

  • get_dataset_queries() - Get SQL queries for datasets or columns

Mutation Tools

  • add_tags(), remove_tags() - Manage tags
  • update_description() - Update entity descriptions
  • set_domains(), remove_domains() - Manage domains
  • add_owners(), remove_owners() - Manage owners
  • add_glossary_terms(), remove_glossary_terms() - Manage glossary terms
  • add_structured_properties(), remove_structured_properties() - Manage structured properties
  • save_document() - Save or update a Document.

User Tools

  • get_me() - Get information about the authenticated user

Cloud-Only Tools (DataHub Cloud)

  • ask_datahub_chat() - Ask the DataHub AI assistant a question about your data catalog
  • get_datahub_chat() - Retrieve messages and status from an Ask DataHub conversation

Architecture

The package is organized into the following modules:

  • mcp_tools/ - Core MCP tool implementations
    • base.py - Base GraphQL execution and response cleaning
    • search.py - Search functionality
    • documents.py - Document search and grep
    • entities.py - Entity retrieval
    • lineage.py - Lineage querying
    • queries.py - Query retrieval
    • tags.py, descriptions.py, domains.py, etc. - Mutation tools
    • helpers.py - Shared utility functions
    • gql/ - GraphQL query definitions

Development

Setup

# Clone the repository
git clone https://github.com/datahub-project/datahub.git
cd datahub/datahub-agent-context

# Set up development environment
./gradlew :datahub-agent-context:installDev

# Run tests
./gradlew :datahub-agent-context:testFull

# Run linting
./gradlew :datahub-agent-context:lintFix

Testing

The package includes comprehensive unit tests for all tools:

# Run full test suite
./gradlew :datahub-agent-context:testFull

Support

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datahub_agent_context-1.5.0.9.tar.gz (99.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datahub_agent_context-1.5.0.9-py3-none-any.whl (135.3 kB view details)

Uploaded Python 3

File details

Details for the file datahub_agent_context-1.5.0.9.tar.gz.

File metadata

  • Download URL: datahub_agent_context-1.5.0.9.tar.gz
  • Upload date:
  • Size: 99.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for datahub_agent_context-1.5.0.9.tar.gz
Algorithm Hash digest
SHA256 e562f7ed17efe65f4ef370d22600083108e6508c6c6e21fb070f624b9f94b673
MD5 f3a38ad647e629b0a35c3e30b7c82603
BLAKE2b-256 afa00f14b248c2295ab8fbc9f72c6fe73126392072fb0ccc6907eea1a278e442

See more details on using hashes here.

File details

Details for the file datahub_agent_context-1.5.0.9-py3-none-any.whl.

File metadata

File hashes

Hashes for datahub_agent_context-1.5.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 054b90c2463122007e3f38bccba4d57f6c0fae5a5d1dbccb25d2cef3803a341b
MD5 0d88410213ef51cd42fe0004c891f445
BLAKE2b-256 b823a5aa45607e7f5eaaa764a9951dce7888c21fb93eebd630e0515845b1bd39

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page