Composable inference-time data framework for LLM agents — KB / Graph / DB / Skills / Memory / Hooks behind one tool registry

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

lh20010120

These details have not been verified by PyPI

Project links

Documentation

Project description

DataMind

An agentic retrieval assistant that pulls from six distinct knowledge surfaces and picks the right tool itself. Talk to it through a CLI or a browser UI; drag a file in and it'll route it into the right backend automatically.

v0.2 is the current focus. v0.1 (LlamaIndex FunctionAgent in main.py / server.py / modules/) still works for comparison. New code lives under datamind/. For an end-to-end walkthrough see GETTING_STARTED.md or the docs site.

Capabilities

Capability	Backend	Tools the agent gets
KB (RAG)	Chroma + BM25 with Reciprocal Rank Fusion	`kb_search`, `kb_list_documents`, `kb_count`, `kb_reindex`
Graph	NetworkX, JSON-persisted	`graph_search_entities`, `graph_traverse`, `graph_neighbors`, `graph_upsert_triples`
Database	SQLAlchemy (SQLite / MySQL / Postgres)	`db_list_tables`, `db_describe_table`, `db_query_sql`, `db_query_nl`
Skills	`.claude/skills/<name>/SKILL.md` + safe Python tools	`skill_search`, `skill_get`, `skill_list`, `calculator`, `unit_convert`, `get_current_time`, `analyze_text`
Memory	SQLite with cosine recall + LLM fact extraction; scope-typed (`global` / `profile` / `session`) for multi-tenant isolation	`memory_save`, `memory_recall`, `memory_forget`, `memory_list_profiles`
Ingest ✨	Conversational data import — drop a file in via chat or the browser drag-drop zone	`kb_add_file`, `kb_add_path`, `db_import_csv`, `graph_add_triples_from_text`
Hooks ✨ v0.3	Sandboxed tool dispatch — every call is intercepted; `Allow` / `Deny` / `AskUser` / `Rewrite`; tamper-evident audit log per profile	`PathAllowlistHook`, `DestructiveSqlHook`, `AuditLogHook` (built-in; user hooks pluggable)

27 tools total. All routed through one ToolRegistry; the agent decides what to call and in what order.

60-second demo

git clone https://github.com/your-org/DataMind.git && cd DataMind
python -m venv .venv && source .venv/bin/activate
pip install -e .

cp .env.datamind.example .env.datamind
$EDITOR .env.datamind     # set DATAMIND__LLM__API_KEY at minimum

# 1. Smoke-test the gateway (~2 s)
python -m datamind.scripts.hello_sdk

# 2. Seed a realistic enterprise dataset (17 docs / 64 graph nodes / 6 tables / 101 rows)
python -m datamind.scripts.seed_enterprise_demo

# 3. Watch the agent answer 8 cross-backend questions on its own
DATAMIND__DATA__PROFILE=enterprise_demo \
  python -m datamind.scripts.hello_enterprise

# 4. Or just open the browser UI
DATAMIND__DATA__PROFILE=enterprise_demo \
  python -m uvicorn datamind.server:app --port 8000
# → http://127.0.0.1:8000  — drag any .md / .csv / .txt into the dropzone, ask questions, watch tools fire

More detail in GETTING_STARTED.md.

What "agentic" actually means here

Ask: "工程部 Shanghai 的员工工资加起来是多少？"

The agent figures out it needs SQL, tries db_query_nl, gets an empty result, recovers by inspecting the schema (db_list_tables → db_describe_table), discovers the column is Eng not Engineering, rewrites the SQL itself, and answers ¥26,000 — without any of that being hard-coded. Same agent picks graph_search_entities + graph_neighbors for relationship questions, kb_search + skill_get for SOP questions, memory_save for "remember this for me" requests.

Frontend stays the same regardless. The 27 tools, the streaming SSE protocol, and the chat UI work identically across two interchangeable agent backends:

DATAMIND__AGENT__BACKEND=native   # default — pure-Python anthropic SDK + self-written loop
DATAMIND__AGENT__BACKEND=sdk      # claude-agent-sdk + claude-code-router (CCR)
                                  # unlocks Hooks / Subagents / Compaction / Plan mode

Both verified end-to-end against the same 8 enterprise-demo questions (numbers here).

Add data by talking

The 4 ingest tools turn the agent into a read-and-write surface:

you  → "把 /Users/foo/sales-q2.csv 导入成数据表 q2_sales"
agent → calls db_import_csv(path=..., table='q2_sales')   ✓ 18 rows inserted
you  → "Q2 sales pipeline 里 in-pipeline 单子总额是多少？哪个 sales rep 单子最多？"
agent → calls db_query_sql(...)                            ✓ answers from the freshly-imported table

Or drop the file into the browser dropzone and click 导入. Or say "把这段加进图谱：陈诚晋升 Tech Lead，向 Ann 汇报" → agent calls graph_add_triples_from_text, LLM extracts triples, graph upserts them. No restart, no reindex.

Why v0.2

v0.1 was functional but coupled: a global AppState, hard-wired modules, vendor-locked to the claude CLI. v0.2 reshapes it around:

Protocols + registries — every capability is a Protocol; concrete classes register under a short name. New DB dialect / embedding provider / retriever strategy = one file.
Pluggable agent loop — native (anthropic SDK) or sdk (claude-agent-sdk + CCR), one ENV switch.
Real SSE streaming through FastAPI — not v0.1's fake character-sliced streaming.
Zero global state — every request owns its own RequestContext with a trace id.
Side-by-side with v0.1 — old code paths untouched, easy comparison.

See Architecture for full detail.

Repo layout

DataMind/
├── datamind/                     # ── v0.2 (new code) ──────────────────
│   ├── agent/                    # base.py + loop_native.py + loop_sdk.py
│   ├── capabilities/             # kb / graph / db / skills / memory /
│   │                             #   ingest / embedding
│   ├── core/                     # Protocol, Registry, Config, Logging, Tools
│   ├── scripts/                  # hello_*.py + seed_enterprise_demo.py
│   ├── cli.py                    # `python -m datamind ...`
│   ├── server.py                 # FastAPI + real SSE + /api/upload
│   └── tests/                    # 95 passing tests (no network required)
│
├── .claude/skills/               # SDK-style knowledge skills (SKILL.md)
├── static/app.html               # browser UI (drag-drop + tool cards + sidebar)
├── scripts/start_ccr.sh          # one-line CCR launcher (for sdk backend)
├── demo-uploads/                 # 6 sample files to drag-drop into the UI
│
├── modules/ core/ main.py server.py benchmark/   # ── v0.1 legacy ─
│
├── data/profiles/<profile>/      # per-profile raw inputs
├── storage/<profile>/            # per-profile indexes & DBs
├── pyproject.toml                # v0.2 install + CLI entry
└── .env.datamind.example         # nested env template

Profiles

One environment variable switches data + storage directories in lockstep:

DATAMIND__DATA__PROFILE=customer_a python -m datamind chat

Maps to data/profiles/customer_a/ and storage/customer_a/.

Tests

pytest datamind/tests/
# 95 passed in ~0.6s — no network required

Plus live smoke + benchmark scripts: hello_sdk, hello_kb, hello_db, hello_graph, hello_skills, hello_memory, hello_agent, seed_enterprise_demo, hello_enterprise (8 cross-backend questions).

Full documentation

See DataMind-Doc for architecture, configuration reference, per-capability deep dives, and tutorials in English and Chinese.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

lh20010120

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

This version

0.3.0

May 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datamind-0.3.0.tar.gz (150.1 kB view details)

Uploaded May 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

datamind-0.3.0-py3-none-any.whl (181.1 kB view details)

Uploaded May 26, 2026 Python 3

File details

Details for the file datamind-0.3.0.tar.gz.

File metadata

Download URL: datamind-0.3.0.tar.gz
Upload date: May 26, 2026
Size: 150.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datamind-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`010b9a5456bf12ac26edddbd3dffe5118733acddf11de1bf20e70833cbc36c4d`
MD5	`f17523ae78439fa8e885f591ee18ede3`
BLAKE2b-256	`570808643e0f8ca44ad52be7ee1840a5ec4e012046a0ced8ff26894fa5c7bdab`

See more details on using hashes here.

Provenance

The following attestation bundles were made for datamind-0.3.0.tar.gz:

Publisher: python-publish.yml on OpenDCAI/DataMind

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: datamind-0.3.0.tar.gz
- Subject digest: 010b9a5456bf12ac26edddbd3dffe5118733acddf11de1bf20e70833cbc36c4d
- Sigstore transparency entry: 1633419985
- Sigstore integration time: May 26, 2026
Source repository:
- Permalink: OpenDCAI/DataMind@82b59f498d0fd77ae62ead729bfac4bf9a4c32f0
- Branch / Tag: refs/tags/0.3.0
- Owner: https://github.com/OpenDCAI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@82b59f498d0fd77ae62ead729bfac4bf9a4c32f0
- Trigger Event: release

File details

Details for the file datamind-0.3.0-py3-none-any.whl.

File metadata

Download URL: datamind-0.3.0-py3-none-any.whl
Upload date: May 26, 2026
Size: 181.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datamind-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c4059dffc670639eaabef7eafd8a4960202f261775bfbfaad0baf8e622b77cfc`
MD5	`28e0c27beba46a7e73d8a45fc87eab07`
BLAKE2b-256	`7523a2fddd2d6d43f76bd0543b6f30fa5a43c52c584f46395cec3cc6d523d41c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for datamind-0.3.0-py3-none-any.whl:

Publisher: python-publish.yml on OpenDCAI/DataMind

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: datamind-0.3.0-py3-none-any.whl
- Subject digest: c4059dffc670639eaabef7eafd8a4960202f261775bfbfaad0baf8e622b77cfc
- Sigstore transparency entry: 1633419993
- Sigstore integration time: May 26, 2026
Source repository:
- Permalink: OpenDCAI/DataMind@82b59f498d0fd77ae62ead729bfac4bf9a4c32f0
- Branch / Tag: refs/tags/0.3.0
- Owner: https://github.com/OpenDCAI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@82b59f498d0fd77ae62ead729bfac4bf9a4c32f0
- Trigger Event: release

datamind 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DataMind

Capabilities

60-second demo

What "agentic" actually means here

Add data by talking

Why v0.2

Repo layout

Profiles

Tests

Full documentation

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance