Auto-generate semantic layers for any database using LLMs. Metadata only.
Project description
soul-schema 🧠
Auto-generate semantic layers for any database using LLMs.
Metadata only. No row data ever leaves your infrastructure.
soul-schema connects to your database, reads the schema, samples a few rows for context, and uses an LLM to generate human-readable descriptions for every table and column. Corrections are remembered permanently — the semantic layer gets smarter over time.
Why soul-schema?
| Alation / Collibra | Unity Catalog | soul-schema | |
|---|---|---|---|
| Cost | $100K+/yr | Databricks only | Free / OSS |
| Setup | Weeks | Days | Minutes |
| Metadata generation | Manual | Semi-auto | Automatic |
| Learns from corrections | ❌ | ❌ | ✅ |
| Works with any database | ✅ | ❌ | ✅ |
| Air-gapped (local LLM) | ❌ | ❌ | ✅ |
Install
pip install soul-schema
# With Anthropic support
pip install soul-schema[anthropic]
# With OpenAI support
pip install soul-schema[openai]
# Everything
pip install soul-schema[all]
Quickstart
# Connect and generate descriptions
soul-schema connect \
--db "postgresql://user:pass@localhost/mydb" \
--llm anthropic \
--key sk-ant-...
# Review and correct
soul-schema review
# Export to dbt
soul-schema export --format dbt
# Export to Vanna training data
soul-schema export --format vanna
# Check status
soul-schema status
Works with any LLM
# Anthropic Claude (recommended)
soul-schema connect --db ... --llm anthropic --key sk-ant-...
# OpenAI
soul-schema connect --db ... --llm openai --key sk-...
# Ollama (fully local, air-gapped)
soul-schema connect --db ... --llm openai-compatible \
--base-url http://localhost:11434/v1 \
--model llama3.2
# Google Gemini
soul-schema connect --db ... --llm openai-compatible \
--key AIza... \
--base-url https://generativelanguage.googleapis.com/v1beta/openai \
--model gemini-2.0-flash
How it works
- Connect — reads
INFORMATION_SCHEMAfor table/column metadata - Sample — fetches up to 10 rows per table to help the LLM understand context
- Generate — LLM writes descriptions for every table and column
- Store — saves to a human-readable markdown file (
schema_memory.md) - Learn — corrections are locked and never overwritten by re-generation
- Export — outputs dbt YAML, Vanna training data, or portable JSON
Your data stays in your infrastructure. soul-schema never stores or transmits row-level data. The memory file contains only metadata and LLM-generated descriptions.
The memory file
soul-schema stores everything in a plain markdown file — schema_memory.md.
- Human-readable and editable by hand
- Version-controllable with git
- Locked columns (human corrections) are never overwritten
- Portable — bring it to any tool
Export formats
soul-schema export --format json # → soul_schema_export.json
soul-schema export --format dbt # → schema.yml (dbt schema file)
soul-schema export --format vanna # → Vanna AI training data
Python API
from soul_schema import SchemaConnector, SemanticGenerator, SchemaMemory
connector = SchemaConnector("postgresql://...")
generator = SemanticGenerator(provider="anthropic", api_key="sk-ant-...")
memory = SchemaMemory("./schema_memory.md")
schema = connector.get_full_schema()
for table, data in schema.items():
result = generator.generate_table(
table=table,
columns=data["columns"],
sample=data["sample"],
row_count=data["row_count"],
)
memory.set_table(table, result["table_description"], result["columns"])
# Correct a description — locked forever
memory.correct("customers", "cust_ltv", "Customer lifetime value in USD")
# Export
print(memory.export_dbt())
Supported databases
Phase 1: PostgreSQL, MySQL, SQLite
Coming soon: BigQuery, Snowflake, Redshift, DuckDB, ClickHouse
License
MIT — free for commercial use.
Related projects
- soul.py — persistent memory for LLM agents
- soul-agent — PyPI package with SoulMateClient
- SoulMate — hosted memory API for enterprise agents
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file soul_schema-0.1.0.tar.gz.
File metadata
- Download URL: soul_schema-0.1.0.tar.gz
- Upload date:
- Size: 16.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e158645f042d652a1061f73ae606fd43b6be5245b108d4d3c2ecddad442e7d79
|
|
| MD5 |
7a97499461dd288b475934d03bbdf08f
|
|
| BLAKE2b-256 |
a5f6a522e3e4eddb69546b188c94211c511aa40f079771746129bd2f3f10e495
|
File details
Details for the file soul_schema-0.1.0-py3-none-any.whl.
File metadata
- Download URL: soul_schema-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ffe2971d7c33be8dfd02d55e6d780c906cf46fd8dccc8bb86bb6f3146bcfdae2
|
|
| MD5 |
1bbbad35e81e14a85f026fd7454d5466
|
|
| BLAKE2b-256 |
36ee5d34c6c106ebd121d51d5b2d57d50c2dc4ab6b03a4a6723ec7ba691653e3
|