Agentic schema analyzer for ArangoDB: conceptual model + conceptual-to-physical mapping for transpilers.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

arthurkeen

These details have not been verified by PyPI

Project description

arangodb-schema-analyzer

Standalone Python library that analyzes an ArangoDB database's physical schema and produces:

a conceptual schema (entities, relationships, properties)
a conceptual→physical mapping suitable for transpilers (Cypher, SPARQL, future)
metadata (confidence, timestamp, analyzed collection counts, detected patterns, per-entity tenant scope, deployment-style sharding profile)

Current release: see CHANGELOG.md. The version is the single source of truth in pyproject.toml.

Install

From source (this repo):

python -m pip install -e .

Optional LLM provider extras:

python -m pip install -e ".[openai]"
python -m pip install -e ".[anthropic]"
python -m pip install -e ".[openrouter]"

OpenRouter actually requires no extra SDK (uses stdlib urllib); the [openrouter] extra exists as a documentation marker only and pulls in nothing.

MCP (Model Context Protocol) — optional stdio server wrapping the v1 JSON tool contract:

python -m pip install -e ".[mcp]"
arangodb-schema-analyzer-mcp

Development extras (pytest, ruff, mypy, etc.):

python -m pip install -e ".[dev]"

If you don't install a provider SDK (or you don't provide an API key), analysis degrades gracefully to deterministic baseline inference.

Usage

from arango import ArangoClient

from schema_analyzer import AgenticSchemaAnalyzer

client = ArangoClient(hosts="http://localhost:8529")
db = client.db("mydb", username="root", password="openSesame")

analyzer = AgenticSchemaAnalyzer(
    llm_provider="openai",  # or "anthropic" or "openrouter"
    api_key=None,           # e.g. os.environ["OPENAI_API_KEY"]
    model="gpt-4o-mini",
    cache={"type": "filesystem", "directory": ".schema-analyzer-cache"},
)

analysis = analyzer.analyze_physical_schema(
    db,
    timeout_ms=60_000,
    sample_limit_per_collection=5,
)

print(analysis.metadata.confidence)

Tool usage (CLI)

This project can be called as a non-interactive tool (stdin JSON → stdout JSON) using the v1 contract under docs/tool-contract/v1/.

Install (editable):

python -m pip install -e .

Example (analyze) using the provided request example:

cat docs/tool-contract/v1/examples/request.analyze.json | arangodb-schema-analyzer --pretty

CLI options

arangodb-schema-analyzer [--request FILE] [--out FILE] [--pretty] [-v|--verbose]

--request FILE — path to request JSON (default: read from stdin)
--out FILE — write response JSON to file (default: stdout)
--pretty — pretty-print JSON output
-v / --verbose — enable verbose logging

Evaluation CLI

Run analysis quality benchmarks against domain packs:

arangodb-schema-analyzer eval \
  --provider openai \
  --model gpt-4o-mini \
  --report eval_report.json

Pass --baseline <prior-report.json> to diff a new run against an earlier report (the baseline file is whatever a previous --report produced; no baseline ships in the repo).

Options: --url, --user, --password, --database, --domains, --sample-limit, --timeout-ms, --scale, --no-cleanup.

Domains included: healthcare, financial_fraud_detection, insurance, intelligence, network_asset_management.

Public API

Exports (see schema_analyzer/__init__.py):

AgenticSchemaAnalyzer — main analyzer class
ConceptualSchema — conceptual schema dataclass
PhysicalMapping — physical mapping dataclass with AQL helpers
generate_schema_docs(analysis) — Markdown documentation generator
export_mapping(analysis, target) — transpiler export (currently only cypher)
export_conceptual_model_as_owl_turtle(analysis) — OWL Turtle export
register_provider(name, ...) — register custom LLM providers
list_providers() — list registered LLM provider names
run_tool(request_dict) — programmatic entrypoint to the v1 tool contract
fingerprint_physical_schema(snapshot) — full-snapshot SHA-256 (cache key)
fingerprint_physical_shape(db, *, exclude_collections=None) — cheap probe that hashes only the collection set + per-collection type + index digests
fingerprint_physical_counts(db, *, exclude_collections=None) — shape fingerprint combined with per-collection count()

Recent additions

See CHANGELOG.md for the full history. Highlights since 0.3.0:

0.6.0 — Shard-family detection (physicalMapping.shardFamilies) groups conceptual entities that share an identical property set and a common name suffix (the per-source / per-repo collection-duplication pattern), so downstream consumers can emit UNION-aware guidance instead of silently picking one member. Plus multitenancy classification (metadata.multitenancy) layered on the sharding profile.
0.5.0 — Sharding-profile classification. Every analysis stamps metadata.shardingProfile with one of OneShard, DisjointSmartGraph, SmartGraph, SatelliteGraph, or Sharded, plus per-graph and per-collection evidence. Snapshot-only, no extra DB round trip.
0.4.0 — Tenant-scope annotations. Every entity in physicalMapping.entities[*] now carries a tenantScope block (tenant_root / tenant_scoped / global), with a per-run metadata.tenantScopeReport summary. Configurable via SCHEMA_ANALYZER_TENANT_* env vars.
0.3.0 — Cheap change-detection probes (fingerprint_physical_shape, fingerprint_physical_counts), statistics block on metadata.statistics, and a reconciliation step that backfills any collections the LLM omitted.

Configuration

Tunable defaults live in schema_analyzer/defaults.py (full list there). Selected parameters:

Parameter	Default	Description
`MAX_REPAIR_ATTEMPTS`	2	LLM repair loop iterations
`LLM_TEMPERATURE`	0.0	Sampling temperature
`DEFAULT_TIMEOUT_MS`	60000	Analysis timeout (ms)
`DEFAULT_REVIEW_THRESHOLD`	0.6	Confidence threshold for `review_required`
`DEFAULT_CACHE_TTL_SECONDS`	86400	Cache TTL (seconds)
`TENANT_SCOPE_ROOT_NAMES`	`("Tenant",)`	Entity names treated as tenant roots
`TENANT_SCOPE_FIELD_REGEX`	`^tenant[_-]?(id\|key)$`	Denormalised tenant-reference field detector (regex pipe escaped as `\|` to avoid breaking the markdown table)
`MIN_TENANT_FIELD_COVERAGE_FRACTION`	0.5	Threshold for `discriminator_field` multitenancy
`MIN_SHARD_FAMILY_SIZE`	2	Min members for a shard-family group

Notes

Secrets: API keys are read from config/env; never persisted by this library.
AQL fragments: helper methods return AQL text + bind variables; collection names are passed via bind parameters.
Graceful degradation: without an LLM provider, the analyzer returns deterministic baseline inference with review_required=True.

Integration evaluation (Docker ArangoDB)

Bring up a local ArangoDB:

docker compose up -d

Run integration tests (opt-in):

export RUN_INTEGRATION=1
export ARANGO_URL=http://localhost:18529
export ARANGO_DB=schema_analyzer_it
export ARANGO_USER=root
export ARANGO_PASS=openSesame
pytest -q -m integration

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

arthurkeen

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.6.1

Apr 24, 2026

0.6.0

Apr 24, 2026

0.5.0

Apr 23, 2026

0.4.0

Apr 21, 2026

0.3.0

Apr 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arangodb_schema_analyzer-0.6.1.tar.gz (105.0 kB view details)

Uploaded Apr 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

arangodb_schema_analyzer-0.6.1-py3-none-any.whl (108.6 kB view details)

Uploaded Apr 24, 2026 Python 3

File details

Details for the file arangodb_schema_analyzer-0.6.1.tar.gz.

File metadata

Download URL: arangodb_schema_analyzer-0.6.1.tar.gz
Upload date: Apr 24, 2026
Size: 105.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arangodb_schema_analyzer-0.6.1.tar.gz
Algorithm	Hash digest
SHA256	`cd945523f3cabc2047f5704a46cffa34c640695cc942a660d14302d6bed01967`
MD5	`ae6034e7835e35946f00fa730d2d4e82`
BLAKE2b-256	`4166ff1186972b3df6f910d9bb2474862910b72203a1a21a1fa5bc652887ef86`

See more details on using hashes here.

Provenance

The following attestation bundles were made for arangodb_schema_analyzer-0.6.1.tar.gz:

Publisher: publish.yml on ArthurKeen/arango-schema-mapper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: arangodb_schema_analyzer-0.6.1.tar.gz
- Subject digest: cd945523f3cabc2047f5704a46cffa34c640695cc942a660d14302d6bed01967
- Sigstore transparency entry: 1367572311
- Sigstore integration time: Apr 24, 2026
Source repository:
- Permalink: ArthurKeen/arango-schema-mapper@2b8dd612ea71f4a3df3c0b368a89f64f99e3d3c4
- Branch / Tag: refs/tags/v0.6.1
- Owner: https://github.com/ArthurKeen
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2b8dd612ea71f4a3df3c0b368a89f64f99e3d3c4
- Trigger Event: push

File details

Details for the file arangodb_schema_analyzer-0.6.1-py3-none-any.whl.

File metadata

Download URL: arangodb_schema_analyzer-0.6.1-py3-none-any.whl
Upload date: Apr 24, 2026
Size: 108.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arangodb_schema_analyzer-0.6.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4a9ce46a3bab446442218295f1565e1dcc728cabd0618c89fae0193e4b9d8f04`
MD5	`fcf1037cb06db699dc73f9c79d9fde5a`
BLAKE2b-256	`4715075d295279f2eef786de4bb543fce907bf3f57cd087bf9092f73ec2b3962`

See more details on using hashes here.

Provenance

The following attestation bundles were made for arangodb_schema_analyzer-0.6.1-py3-none-any.whl:

Publisher: publish.yml on ArthurKeen/arango-schema-mapper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: arangodb_schema_analyzer-0.6.1-py3-none-any.whl
- Subject digest: 4a9ce46a3bab446442218295f1565e1dcc728cabd0618c89fae0193e4b9d8f04
- Sigstore transparency entry: 1367572350
- Sigstore integration time: Apr 24, 2026
Source repository:
- Permalink: ArthurKeen/arango-schema-mapper@2b8dd612ea71f4a3df3c0b368a89f64f99e3d3c4
- Branch / Tag: refs/tags/v0.6.1
- Owner: https://github.com/ArthurKeen
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2b8dd612ea71f4a3df3c0b368a89f64f99e3d3c4
- Trigger Event: push

arangodb-schema-analyzer 0.6.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

arangodb-schema-analyzer

Install

Usage

Tool usage (CLI)

CLI options

Evaluation CLI

Public API

Recent additions

Configuration

Notes

Integration evaluation (Docker ArangoDB)

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance