Skip to main content

Database migration agent from structured data (e.g. SQL) to graph.

Project description

SQL Database to Graph Migration Agent

Intelligent database migration agent that transforms SQL databases (MySQL, PostgreSQL) into graph databases, powered by LLM analysis and LangGraph workflows.

Overview

This package provides a sophisticated migration agent that:

  • Analyzes SQL database schemas - Automatically discovers tables, relationships, and constraints
  • Generates optimal graph models - Uses AI to create node and relationship structures
  • Creates indexes and constraints - Ensures performance and data integrity
  • Handles complex relationships - Converts foreign keys to graph relationships
  • Incremental refinement - Review each table, adjust the model immediately, then enter the interactive refinement loop once all tables are processed
  • Comprehensive validation - Verifies migration results and data integrity

Installation

# Install the package
uv pip install .

# Or install in development mode
uv pip install -e .

Quick Start

Run the migration agent:

uv run main

The agent will guide you through:

  1. Environment setup and database connections
  2. Graph modeling strategy selection
  3. Automatic or incremental migration mode
  4. Complete migration workflow with progress tracking

Incremental review: The LLM now drafts the entire graph model in a single shot and then walks you through table-level changes detected since the last migration. You only need to approve (or tweak) the differences that matter.

You can also preconfigure the workflow using CLI flags or environment variables:

uv run main --mode incremental --strategy llm --meta-graph reset --log-level DEBUG
Option Environment Description
--mode {automatic,incremental} SQL2MG_MODE Selects automatic or incremental modeling flow.
--strategy {deterministic,llm} SQL2MG_STRATEGY Chooses deterministic or LLM-powered HyGM strategy.
--provider {openai,anthropic,gemini} LLM_PROVIDER Selects LLM provider (auto-detects if not specified).
--model MODEL_NAME LLM_MODEL Specifies LLM model name (uses provider default if not set).
--meta-graph {auto,skip,reset} SQL2MG_META_POLICY Controls how stored meta graph data is used (default auto).
--log-level LEVEL SQL2MG_LOG_LEVEL Sets logging verbosity (DEBUG, INFO, etc.).
--mapping PATH Generate/edit a mapping JSON file instead of running migration.
--editor CMD EDITOR Editor for opening mapping files (e.g. vim, code --wait).

Mapping Mode

Use --mapping to generate or edit a mapping file that describes how SQL tables and columns map to graph nodes and edges — without running an actual migration.

# Generate a new mapping from the source database
uv run main --mapping output/mapping.json

# Re-open an existing mapping for editing
uv run main --mapping output/mapping.json

When the mapping file does not exist, the agent connects to the source database, analyzes the schema, builds a graph model, writes the mapping JSON, and enters the interactive editor. When the file already exists, it is loaded directly into the editor.

Interactive Mapping Editor

Inside the editor you can use slash commands or natural language:

Commands:
  /edit    - open the mapping JSON in $EDITOR (vi by default)
  /save    - save changes and exit
  /cancel  - discard changes and exit

Or describe changes in natural language (sent to LLM), e.g.:
  Add a Person label node mapped from the people table
  Rename label Person to User
  Remove the KNOWS relationship

The LLM sees both the current and original model state, so requests like "go back to the original names" work correctly. An LLM provider is auto-detected from available API keys for natural language editing; /edit always works regardless.

Docker Usage

Build and run with Dockerfile.local for local development:

docker build -f Dockerfile.local -t memgraph/structured2graph .
docker run -d --rm --net memgql-net --name structured2graph-dev \
  --env-file .env -v $(pwd)/output:/output \
  --entrypoint sleep memgraph/structured2graph infinity
docker exec -it structured2graph-dev uv run main.py --mapping /output/mapping.json

Note: If your .env file quotes values (e.g. ANTHROPIC_API_KEY="sk-..."), the agent strips the surrounding quotes automatically so docker run --env-file works correctly.

Configuration

Set up your environment variables in .env:

# Select source database (mysql or postgresql)
SOURCE_DB_TYPE=postgresql

# PostgreSQL Database (used when SOURCE_DB_TYPE=postgresql)
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DATABASE=pagila
POSTGRES_USER=username
POSTGRES_PASSWORD=password
POSTGRES_SCHEMA=public

# MySQL Database (used when SOURCE_DB_TYPE=mysql)
MYSQL_HOST=localhost
MYSQL_PORT=3306
MYSQL_DATABASE=sakila
MYSQL_USER=username
MYSQL_PASSWORD=password

# Memgraph Database
MEMGRAPH_URL=bolt://localhost:7687
MEMGRAPH_USERNAME=
MEMGRAPH_PASSWORD=
MEMGRAPH_DATABASE=memgraph

# LLM API Keys (for LLM-powered features - choose one or more)
OPENAI_API_KEY=your_openai_key         # For GPT models
# ANTHROPIC_API_KEY=your_anthropic_key # For Claude models
# GOOGLE_API_KEY=your_google_key       # For Gemini models

# LLM Provider Configuration (optional - auto-detects if not set)
# LLM_PROVIDER=openai                  # Options: openai, anthropic, gemini
# LLM_MODEL=gpt-4o-mini                # Specific model name

# Optional migration defaults (override CLI prompts)
SQL2MG_MODE=automatic
SQL2MG_STRATEGY=deterministic
SQL2MG_META_POLICY=auto
SQL2MG_LOG_LEVEL=INFO

When switching SOURCE_DB_TYPE remember to update the matching credential block and rerun uv sync so dependencies like psycopg2-binary are installed for PostgreSQL support.

Make sure that Memgraph is started with the --schema-info-enabled=true, since agent uses the schema information from Memgraph SHOW SCHEMA INFO.

Multi-LLM Provider Support

The agent supports multiple LLM providers for AI-powered graph modeling:

Supported Providers

  • OpenAI (GPT models) - Default: gpt-4o-mini
  • Anthropic (Claude models) - Default: claude-sonnet-4-20250514
  • Google (Gemini models) - Default: gemini-1.5-pro

Usage Examples

# Auto-detect provider based on API keys
uv run main --strategy llm

# Use specific provider
uv run main --strategy llm --provider anthropic

# Use specific model
uv run main --strategy llm --provider openai --model gpt-4o

# All options together
uv run main --mode incremental --strategy llm --provider gemini --model gemini-1.5-flash

All providers support structured outputs for consistent graph model generation. The system automatically validates schemas using Pydantic models.

📖 Full Multi-Provider Documentation

Arhitecture

core/hygm/
├── hygm.py # Main orchestrator class
├── models/ # Data models and structures
│ ├── graph_models.py # Core graph representation
│ ├── llm_models.py # LLM-specific models
│ ├── operations.py # Interactive operations
│ └── sources.py # Source tracking
└── strategies/ # Modeling strategies
├── base.py # Abstract interface
├── deterministic.py # Rule-based modeling
└── llm.py # AI-powered modeling

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

structured2graph-0.2.0.tar.gz (350.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

structured2graph-0.2.0-py3-none-any.whl (113.5 kB view details)

Uploaded Python 3

File details

Details for the file structured2graph-0.2.0.tar.gz.

File metadata

  • Download URL: structured2graph-0.2.0.tar.gz
  • Upload date:
  • Size: 350.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.6

File hashes

Hashes for structured2graph-0.2.0.tar.gz
Algorithm Hash digest
SHA256 3da87094360e6f76cc1e53433e47a018ded32deab88a531d622db935f51d420f
MD5 44124395280bf7b457b22b99e0e09da8
BLAKE2b-256 17abed30589851d81d54df1521819c482acf0970692719026ef0b865045cb0cb

See more details on using hashes here.

File details

Details for the file structured2graph-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for structured2graph-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2b70699f3017e9ecd3658e0464783a324bcab8c74dbc4dfcd65eec0b407aa1c7
MD5 5000f4e9971b986a0b98ecfd37161c4c
BLAKE2b-256 65593b1df87a845fcf1c7dbc6d1894552e5bd6e26293ca3f0b77f88721494129

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page