DataAgent - A powerful multi-modal Data Agent workflow template framework

Project description

🚀 DataAgent

中文 · English

License Python Version LangGraph openJiuwen GaussVector

Data + AI Agent: Enterprise Data Task Solution

🚀 DataAgent is a next-generation enterprise data intelligence platform for Data + AI scenarios, reimagining the entire data engineering pipeline through the Agent paradigm. Deeply integrating NL2SQL, unified semantic layers, and multi-agent collaboration, it delivers end-to-end data analysis and feature mining across financial risk control, AI for Science, and other core domains.

🌟 Why DataAgent

🏆 Scenario Advantages

Scenario	Traditional Approach	The DataAgent Edge	Typical Applications
📊 Financial Q&A	Business request → data team queue → manual SQL → manual verification; T+1 is the norm for a single metric query	NL2SQL four-stage pipeline (Perception→Generation→Validation→Reflection), natural language to instant answers. Semantic metric mapping, 74%+ execution accuracy on BIRD DEV benchmark, sub-second response	✅ Enterprise financial analytics assistant
🔬 AI for Science	Multi-source scientific data scattered everywhere; cross-database correlation requires manual exports; literature and data cannot be jointly queried	Multi-source federated queries + structured/unstructured joint retrieval, natural-language-driven scientific data exploration	✅ Scientific data exploration platform

⚡ Core Capabilities

Capability	Description
🧠 NL2SQL Intelligent Engine	Four-stage pipeline: Perceptor→Generator→Validator→Reflector; multi-strategy fusion: Prompt / ICL / Skeleton / DC; supports SQLite / MySQL / PostgreSQL / Hive; 74%+ execution accuracy on BIRD benchmark
🔬 Automated Feature Engineering	Agents autonomously explore relationships across hundreds of tables, auto-discover latent feature combinations with importance ranking and visualization — 10x+ efficiency boost
🏭 Full-Pipeline Data Factory	Data ingestion→Schema perception→Feature mining→Model training→Report generation — one YAML config runs the complete data engineering pipeline
🧩 Unified Semantic Layer	Prioritizes GaussVector as an enhanced vector retrieval foundation in the semantic layer, turning tables, columns, metric definitions, and business descriptions into retrievable schema signals for NL2SQL and multi-source semantic alignment
🔌 Plugin Tool Ecosystem	Local functions / MCP (stdio+sse) / A2A — three tool types with unified registration and invocation. Auto-discovery and on-demand loading. Built-in data analysis SKILLs
📡 Native Multi-Agent Collaboration	Full A2A 1.0 protocol support: automatic agent discovery, capability mapping, standardized communication. Naturally supports distributed collaboration for complex business tasks
🧩 YAML as Agent	Model, tools, memory, workflow, scenario prompts — all declaratively orchestrated. From idea to running Agent in minutes
🛡️ Enterprise Security Sandbox	Workspace isolation + path whitelisting + full audit trail, meeting financial-grade compliance requirements
⚡ Out of the Box	20+ industry scenario example configs — zero code to start, up and running in minutes

📋 Environment Requirements

Dependency	Version
🐍 Python	>= 3.11
📦 Package Manager	uv (recommended) or pip

📚 Documentation

Full documentation lives under docs/ (中文 · English). Build and preview locally:

uv sync --extra mkdoc
uv run mkdocs serve -f docs/mkdocs.yml

Document	Description
📖 Installation	Install with `uv` / pip, environment variables, and verification
📖 Quick Start	Run an end-to-end DataAgent workflow in minutes
🗄️ Database Installation	Deploy Elasticsearch, PostgreSQL, MySQL; prioritize GaussVector integration, import scenario data, and connect Semantic Service
⚙️ Features	Core capabilities, modules, tools, and model support
🧩 Semantic Service	MetaVisor enriched metadata for NL2SQL, prioritizing GaussVector-oriented semantic-layer indexing, candidate schema recall, and schema perception enhancement
🔗 openJiuwen	openJiuwen integration and usage guide
🏗️ Architecture	System architecture; context, planning engine, and action modules
📡 API Design	A2A northbound interface and Python SDK
📋 Application Cases	Build a dedicated NL2SQL Agent; build a data analysis Agent
📝 Notes	Development, testing, and documentation maintenance
🗓️ Milestone	Release planning and roadmap

🚴 Installation

1️⃣ Clone the project

git clone https://gitcode.com/datagallery/DataAgent.git
cd DataAgent

2️⃣ Install dependencies (uv recommended)

# Install dependencies
uv sync

# Activate virtual environment
source .venv/bin/activate  # Linux / macOS
.venv\Scripts\activate     # Windows

3️⃣ Or use pip

pip install -e .

4️⃣ Configure environment variables

# Copy environment template
cp .env.example .env

# Edit .env file with your actual configuration values

⚡ Quick Start

🎮 Interactive quick start

uv run -m dataagent quickstart

Follow the prompts to enter model configuration and start chatting with the Agent!

📁 Start with config file

# Terminal interactive mode
uv run -m dataagent --config dataagent/core/flex/examples/quickstart.yaml

🔍 Config check

# Check environment variable references in config
uv run -m dataagent config check dataagent/core/flex/examples/quickstart.yaml

📖 Usage

🐍 Python SDK

from dataagent import DataAgent

agent = DataAgent.from_config("path/to/config.yaml")

# Single-turn conversation
response = await agent.chat("Analyze sales data trends for the past week")
print(response)

# Streaming conversation
async for chunk in agent.astream(input={"user_query": "Generate user report"}):
    print(chunk, end="", flush=True)

📝 YAML Config Example

AGENT_CONFIG:
  name: "My Data Agent"
  version: "1.0"
  description: "Data Analysis Agent"
  backend: "langgraph"
  type: "react"

MODEL:
  chat_model:
    provider: "deepseek"
    model_type: "chat"
    params:
      model: "deepseek-chat"
      temperature: 0.7
      base_url: "$env{DEEPSEEK_BASE_URL}"
      api_key: "$env{DEEPSEEK_API_KEY}"

WORKSPACE:
  path: "/tmp/dataagent_workspace"
  allow_path:
    - "/tmp/dataagent_workspace"

🌐 A2A 1.0 Server Mode

# Start A2A server
uv run -m dataagent serve-a2a \
  --config path/to/config.yaml \
  --host 0.0.0.0 \
  --port 9999 \
  --auth-token your_token

# Service endpoints
# ├── 🌟 AgentCard: http://localhost:9999/.well-known/agent.json
# ├── 📡 JSON-RPC:  http://localhost:9999/a2a/jsonrpc
# └── 🔌 REST:      http://localhost:9999/a2a/rest

⚙️ Configuration

🔐 Environment Variables

Variable	Description	Example
`DEEPSEEK_API_KEY`	DeepSeek API Key	`sk-xxx`
`DEEPSEEK_BASE_URL`	DeepSeek API Base URL	`https://api.deepseek.com`
`BAILIAN_API_KEY`	Alibaba Cloud Bailian API Key	`sk-xxx`
`OPENAI_API_KEY`	OpenAI API Key	`sk-xxx`

📌 For more configuration, refer to .env.example

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Jun 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dg_dataagent-0.1.0-py3-none-any.whl (742.2 kB view details)

Uploaded Jun 20, 2026 Python 3

File details

Details for the file dg_dataagent-0.1.0-py3-none-any.whl.

File metadata

Download URL: dg_dataagent-0.1.0-py3-none-any.whl
Upload date: Jun 20, 2026
Size: 742.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for dg_dataagent-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9f628e6606accba5b09dd08de5dbac3091f7c83a6e710f4a676ccf4323db9259`
MD5	`16d6c33a491d8a86c6d3ef5cc8913e30`
BLAKE2b-256	`cc49418cdc8262e4d181e6826d8622386f53a1da64d0711f5394b4cb2d6a7c75`

See more details on using hashes here.

dg-dataagent 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta