Skip to main content

Agentic schema analyzer for ArangoDB: conceptual model + conceptual-to-physical mapping for transpilers.

Project description

arangodb-schema-analyzer (v0.1)

Standalone Python library that analyzes an ArangoDB database's physical schema and produces:

  • a conceptual schema (entities, relationships, properties)
  • a conceptual→physical mapping suitable for transpilers (Cypher, SPARQL, future)
  • metadata (confidence, timestamp, analyzed collection counts, detected patterns)

Install

From source (this repo):

python -m pip install -e .

Optional LLM provider extras:

python -m pip install -e ".[openai]"
python -m pip install -e ".[anthropic]"

OpenRouter is also supported and requires no extra SDK (uses stdlib urllib).

MCP (Model Context Protocol) — optional stdio server wrapping the v1 JSON tool contract:

python -m pip install -e ".[mcp]"
arangodb-schema-analyzer-mcp

If you don't install a provider SDK (or you don't provide an API key), analysis degrades gracefully to deterministic baseline inference.

Usage

from arango import ArangoClient

from schema_analyzer import AgenticSchemaAnalyzer

client = ArangoClient(hosts="http://localhost:8529")
db = client.db("mydb", username="root", password="openSesame")

analyzer = AgenticSchemaAnalyzer(
    llm_provider="openai",  # or "anthropic" or "openrouter"
    api_key=None,           # e.g. os.environ["OPENAI_API_KEY"]
    model="gpt-4o-mini",
    cache={"type": "filesystem", "directory": ".schema-analyzer-cache"},
)

analysis = analyzer.analyze_physical_schema(
    db,
    timeout_ms=60_000,
    sample_limit_per_collection=5,
)

print(analysis.metadata.confidence)

Tool usage (CLI)

This project can be called as a non-interactive tool (stdin JSON → stdout JSON) using the v1 contract under docs/tool-contract/v1/.

Install (editable):

python -m pip install -e .

Example (analyze) using the provided request example:

cat docs/tool-contract/v1/examples/request.analyze.json | arangodb-schema-analyzer --pretty

CLI options

arangodb-schema-analyzer [--request FILE] [--out FILE] [--pretty] [-v]
  • --request FILE — path to request JSON (default: read from stdin)
  • --out FILE — write response JSON to file (default: stdout)
  • --pretty — pretty-print JSON output
  • -v — enable verbose logging

Evaluation CLI

Run analysis quality benchmarks against domain packs:

arangodb-schema-analyzer eval \
  --provider openai \
  --model gpt-4o-mini \
  --report eval_report.json \
  --baseline eval_baseline.json

Options: --url, --user, --password, --database, --domains, --sample-limit, --timeout-ms, --scale, --no-cleanup.

Domains included: healthcare, financial_fraud_detection, insurance, intelligence, network_asset_management.

Public API

Exports:

  • AgenticSchemaAnalyzer — main analyzer class
  • ConceptualSchema — conceptual schema dataclass
  • PhysicalMapping — physical mapping dataclass with AQL helpers
  • generate_schema_docs(analysis) — Markdown documentation generator
  • export_mapping(analysis, target) — transpiler export (v0.1: cypher)
  • export_conceptual_model_as_owl_turtle(analysis) — OWL Turtle export
  • register_provider(name, ...) — register custom LLM providers
  • list_providers() — list registered LLM provider names

Configuration

Tunable defaults live in schema_analyzer/defaults.py. Key parameters:

Parameter Default Description
MAX_REPAIR_ATTEMPTS 2 LLM repair loop iterations
LLM_TEMPERATURE 0.0 Sampling temperature
DEFAULT_TIMEOUT_MS 60000 Analysis timeout (ms)
DEFAULT_REVIEW_THRESHOLD 0.6 Confidence threshold for review_required
DEFAULT_CACHE_TTL_SECONDS 86400 Cache TTL (seconds)

Notes

  • Secrets: API keys are read from config/env; never persisted by this library.
  • AQL fragments: helper methods return AQL text + bind variables; collection names are passed via bind parameters.
  • Graceful degradation: without an LLM provider, the analyzer returns deterministic baseline inference with review_required=True.

Integration evaluation (Docker ArangoDB)

Bring up a local ArangoDB:

docker compose up -d

Run integration tests (opt-in):

export RUN_INTEGRATION=1
export ARANGO_URL=http://localhost:18529
export ARANGO_DB=schema_analyzer_it
export ARANGO_USER=root
export ARANGO_PASS=openSesame
pytest -q -m integration

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arangodb_schema_analyzer-0.5.0.tar.gz (76.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arangodb_schema_analyzer-0.5.0-py3-none-any.whl (87.1 kB view details)

Uploaded Python 3

File details

Details for the file arangodb_schema_analyzer-0.5.0.tar.gz.

File metadata

  • Download URL: arangodb_schema_analyzer-0.5.0.tar.gz
  • Upload date:
  • Size: 76.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arangodb_schema_analyzer-0.5.0.tar.gz
Algorithm Hash digest
SHA256 332e4c7c839e3b6a500d5c7b72192df0ec9f8724f89fcaa9146f79fb66b85d24
MD5 1dc00d83dc918715395f5f63fefef199
BLAKE2b-256 2df7f9bb6bc022a7cbc39191db6b537e71f1cdc02728d29cde6c3703a7ac8af7

See more details on using hashes here.

Provenance

The following attestation bundles were made for arangodb_schema_analyzer-0.5.0.tar.gz:

Publisher: publish.yml on ArthurKeen/arango-schema-mapper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arangodb_schema_analyzer-0.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for arangodb_schema_analyzer-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 365154d6e9072199e94593ee8059ccbf5249899e0d1779d131eb96c83f670491
MD5 f67c4dde69bd2ef0fbff3e6b5ed8a2f6
BLAKE2b-256 29e17b953ccd1424f5ae67e1d6e69da1d3d9a7ed918c95909a98bc16b792b0ab

See more details on using hashes here.

Provenance

The following attestation bundles were made for arangodb_schema_analyzer-0.5.0-py3-none-any.whl:

Publisher: publish.yml on ArthurKeen/arango-schema-mapper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page