Skip to main content

Auto-generate semantic layers for any database using LLMs. Metadata only.

Project description

soul-schema 🧠

Auto-generate semantic layers for any database using LLMs.

PyPI version License: MIT

Metadata only. No row data ever leaves your infrastructure.

soul-schema connects to your database, reads the schema, samples a few rows for context, and uses an LLM to generate human-readable descriptions for every table and column. Corrections are remembered permanently — the semantic layer gets smarter over time.


Why soul-schema?

Alation / Collibra Unity Catalog soul-schema
Cost $100K+/yr Databricks only Free / OSS
Setup Weeks Days Minutes
Metadata generation Manual Semi-auto Automatic
Learns from corrections
Works with any database
Air-gapped (local LLM)

Install

pip install soul-schema

# With Anthropic support
pip install soul-schema[anthropic]

# With OpenAI support  
pip install soul-schema[openai]

# Everything
pip install soul-schema[all]

Quickstart

# Connect and generate descriptions
soul-schema connect \
  --db "postgresql://user:pass@localhost/mydb" \
  --llm anthropic \
  --key sk-ant-...

# Review and correct
soul-schema review

# Export to dbt
soul-schema export --format dbt

# Export to Vanna training data
soul-schema export --format vanna

# Check status
soul-schema status

Works with any LLM

# Anthropic Claude (recommended)
soul-schema connect --db ... --llm anthropic --key sk-ant-...

# OpenAI
soul-schema connect --db ... --llm openai --key sk-...

# Ollama (fully local, air-gapped)
soul-schema connect --db ... --llm openai-compatible \
  --base-url http://localhost:11434/v1 \
  --model llama3.2

# Google Gemini
soul-schema connect --db ... --llm openai-compatible \
  --key AIza... \
  --base-url https://generativelanguage.googleapis.com/v1beta/openai \
  --model gemini-2.0-flash

How it works

  1. Connect — reads INFORMATION_SCHEMA for table/column metadata
  2. Sample — fetches up to 10 rows per table to help the LLM understand context
  3. Generate — LLM writes descriptions for every table and column
  4. Store — saves to a human-readable markdown file (schema_memory.md)
  5. Learn — corrections are locked and never overwritten by re-generation
  6. Export — outputs dbt YAML, Vanna training data, or portable JSON

Your data stays in your infrastructure. soul-schema never stores or transmits row-level data. The memory file contains only metadata and LLM-generated descriptions.


The memory file

soul-schema stores everything in a plain markdown file — schema_memory.md.

  • Human-readable and editable by hand
  • Version-controllable with git
  • Locked columns (human corrections) are never overwritten
  • Portable — bring it to any tool

Export formats

soul-schema export --format json    # → soul_schema_export.json
soul-schema export --format dbt     # → schema.yml (dbt schema file)
soul-schema export --format vanna   # → Vanna AI training data

Python API

from soul_schema import SchemaConnector, SemanticGenerator, SchemaMemory

connector = SchemaConnector("postgresql://...")
generator = SemanticGenerator(provider="anthropic", api_key="sk-ant-...")
memory = SchemaMemory("./schema_memory.md")

schema = connector.get_full_schema()
for table, data in schema.items():
    result = generator.generate_table(
        table=table,
        columns=data["columns"],
        sample=data["sample"],
        row_count=data["row_count"],
    )
    memory.set_table(table, result["table_description"], result["columns"])

# Correct a description — locked forever
memory.correct("customers", "cust_ltv", "Customer lifetime value in USD")

# Export
print(memory.export_dbt())

Supported databases

Phase 1: PostgreSQL, MySQL, SQLite
Coming soon: BigQuery, Snowflake, Redshift, DuckDB, ClickHouse


License

MIT — free for commercial use.


Related projects

  • soul.py — persistent memory for LLM agents
  • soul-agent — PyPI package with SoulMateClient
  • SoulMate — hosted memory API for enterprise agents

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soul_schema-0.1.0.tar.gz (16.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

soul_schema-0.1.0-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file soul_schema-0.1.0.tar.gz.

File metadata

  • Download URL: soul_schema-0.1.0.tar.gz
  • Upload date:
  • Size: 16.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for soul_schema-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e158645f042d652a1061f73ae606fd43b6be5245b108d4d3c2ecddad442e7d79
MD5 7a97499461dd288b475934d03bbdf08f
BLAKE2b-256 a5f6a522e3e4eddb69546b188c94211c511aa40f079771746129bd2f3f10e495

See more details on using hashes here.

File details

Details for the file soul_schema-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: soul_schema-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for soul_schema-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ffe2971d7c33be8dfd02d55e6d780c906cf46fd8dccc8bb86bb6f3146bcfdae2
MD5 1bbbad35e81e14a85f026fd7454d5466
BLAKE2b-256 36ee5d34c6c106ebd121d51d5b2d57d50c2dc4ab6b03a4a6723ec7ba691653e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page