AI-powered data documentation generator for databases and dbt projects.

These details have not been verified by PyPI

Project description

✍️ Schema Scribe: AI-Powered Data Documentation

Tired of writing data documentation? Let AI do it for you.

Schema Scribe is a CLI tool that scans your databases and dbt projects, uses AI to generate descriptions, and automatically updates your documentation.

✨ See it in Action

Stop manually updating YAML files or writing Markdown tables. Let schema-scribe do the work in seconds.

Magically update dbt `schema.yml`	Instantly generate DB catalogs (w/ ERD)
Run `schema-scribe dbt --update` and watch AI fill in your missing descriptions, tags, and tests.	Point `schema-scribe db` at a database and get a full Markdown catalog, complete with a Mermaid ERD.

🚀 Quick Start (60 Seconds)

Get your first AI-generated catalog in less than a minute.

1. Install

Clone the repo and install dependencies.

git clone https://github.com/dongwonmoon/SchemaScribe.git
cd SchemaScribe
pip install -r requirements.txt

(Note: For specific databases, install optional dependencies: pip install -e " .[postgres, snowflake]") (Note: To use the web server, also install server dependencies: pip install "schema-scribe[server]")

2. Initialize

Run the interactive wizard. It will guide you through setting up your database and LLM, automatically creating config.yaml and a secure .env file for your API keys.

schema-scribe init

3. Run!

You're all set.

For a dbt project: (Make sure dbt compile has been run to create manifest.json)

# See what's missing (CI check)
schema-scribe dbt --project-dir /path/to/your/dbt/project --check

# Let AI fix it
schema-scribe dbt --project-dir /path/to/your/dbt/project --update

# Check for documentation drift against the live database
schema-scribe dbt --project-dir /path/to/your/dbt/project --db your_db_profile --drift

# Generate a global, end-to-end lineage graph
schema-scribe lineage --project-dir /path/to/your/dbt/project --db your_db_profile --output your_mermaid_profile

For a database: (Assuming you created an output profile named my_markdown during init)

schema-scribe db --output my_markdown

✅ Key Features

🤖 Automated Catalog Generation: Scans live databases or dbt projects to generate documentation. Includes AI-generated table summaries for databases.
✍️ LLM-Powered Descriptions: Uses AI (OpenAI, Google, Ollama) to create meaningful business descriptions for tables, views, models, and columns.
🧬 Deep dbt Integration:
- Direct YAML Updates: Seamlessly updates your dbt schema.yml files with AI-generated content.
- CI/CD Validation: Use the --check flag in your CI pipeline to fail builds if documentation is outdated.
- Interactive Updates: Use the --interactive flag to review and approve AI-generated changes one by one.
- Documentation Drift Detection: Use the --drift flag to compare your existing documentation against the live database, catching descriptions that have become inconsistent with reality.
🔒 Security-Aware: The init wizard helps you store sensitive keys (passwords, API tokens) in a .env file, not in config.yaml.
🔌 Extensible by Design: A pluggable architecture supports multiple backends.
🌐 Global End-to-End Lineage: Generate a single, project-wide lineage graph that combines physical database foreign keys with logical dbt ref and source dependencies.
🚀 Web API Server: Launch a FastAPI server to trigger documentation workflows programmatically. Includes built-in API documentation via Swagger/ReDoc.

🛠️ Supported Backends

Type	Supported Providers
Databases	`sqlite`, `postgres`, `mariadb`, `mysql`, `duckdb` (files, directories, S3), `snowflake`
LLMs	`openai`, `ollama`, `google`
Outputs	`markdown`, `dbt-markdown`, `json`, `confluence`, `notion`, `postgres-comment`

Command Reference

`schema-scribe init`

Runs the interactive wizard to create config.yaml and .env files. This is the recommended first step.

`schema-scribe db`

Scans a live database and generates a catalog.

--db TEXT: (Optional) The database profile from config.yaml to use. Overrides default.
--llm TEXT: (Optional) The LLM profile from config.yaml to use. Overrides default.
--output TEXT: (Required) The output profile from config.yaml to use.

`schema-scribe dbt`

Scans a dbt project's manifest.json file.

--project-dir TEXT: (Required) Path to the dbt project directory.
--update: (Flag) Directly update dbt schema.yml files.
--check: (Flag) Run in CI mode. Fails if documentation is outdated.
--interactive: (Flag) Run in interactive mode. Prompts user for each AI-generated change.
--drift: (Flag) Run in drift detection mode. Fails if existing documentation conflicts with the live database schema. Requires a --db profile.
--llm TEXT: (Optional) The LLM profile to use.
--output TEXT: (Optional) The output profile to use (if not using --update, --check, or --interactive).

Note: --update, --check, --interactive, and --drift flags are mutually exclusive. Choose only one.

`schema-scribe lineage`

Generates a global, end-to-end lineage graph for a dbt project.

--project-dir TEXT: (Required) Path to the dbt project directory.
--db TEXT: (Required) The database profile to scan for physical Foreign Keys.
--output TEXT: (Required) The output profile (must be type 'mermaid') to write the .md file to.

`schema-scribe serve`

Launches the FastAPI web server.

--host TEXT: (Optional) The host to bind the server to. Defaults to 127.0.0.1.
--port INTEGER: (Optional) The port to run the server on. Defaults to 8000.

🚀 Web API Server

Schema Scribe includes a built-in FastAPI web server that exposes the core workflows via a REST API. This is perfect for programmatic integration or for building a custom web UI.

1. Launch the server: (Make sure you have installed the server dependencies: pip install "schema-scribe[server]")

schema-scribe serve --host 0.0.0.0 --port 8000

2. Explore the API: Once the server is running, you can access the interactive API documentation (powered by Swagger UI) at: http://localhost:8000/docs

3. Example: Get available profiles You can interact with the API using any HTTP client, like curl.

curl -X GET "http://localhost:8000/api/profiles" -H "accept: application/json"

This will return a JSON object listing all the database, LLM, and output profiles defined in your config.yaml.

4. Example: Trigger a dbt workflow You can also trigger core workflows. For example, to run a dbt --check on a project:

curl -X POST "http://localhost:8000/api/run/dbt" \
-H "Content-Type: application/json" \
-d '{
  "dbt_project_dir": "/path/to/your/dbt/project",
  "check": true
}'

If the documentation is outdated, the API will return a 409 Conflict status code, making it easy to integrate with CI/CD pipelines.

💡 Extensibility

Adding a new database, LLM, or writer is easy:

Create a new class in the appropriate directory (e.g., schema_scribe/components/db_connectors).
Implement the base interface (e.g., BaseConnector).
Register your new class in schema_scribe/core/factory.py.

The init command and core logic will automatically pick up your new component.

🤝 Contributing

Contributions are welcome! Please feel free to open an issue or submit a pull request.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Nov 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

schema_scribe-0.2.0.tar.gz (78.3 kB view details)

Uploaded Nov 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

schema_scribe-0.2.0-py3-none-any.whl (108.8 kB view details)

Uploaded Nov 11, 2025 Python 3

File details

Details for the file schema_scribe-0.2.0.tar.gz.

File metadata

Download URL: schema_scribe-0.2.0.tar.gz
Upload date: Nov 11, 2025
Size: 78.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for schema_scribe-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`9bd64560ac5b95e9e5de522b3a1126129d5956976db546510db4e48f93462d1a`
MD5	`40fb3e393f06e909e3fdb62ba1447be8`
BLAKE2b-256	`5f96501beb198e8c828dc0d0f9657ff2291164a77170a4e854b71469348362c7`

See more details on using hashes here.

File details

Details for the file schema_scribe-0.2.0-py3-none-any.whl.

File metadata

Download URL: schema_scribe-0.2.0-py3-none-any.whl
Upload date: Nov 11, 2025
Size: 108.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for schema_scribe-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2a5b97b43d5caf2f091f2692ed9885fd3accb88135165d6bb946d156607b80bb`
MD5	`594bbaeb9cf6dc71bc9a280ace7498ca`
BLAKE2b-256	`76dcf7f3219ed17d35b74a2b4b2fdb30282d2fbe3701c5572469406c796ef64f`

See more details on using hashes here.

schema-scribe 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

✍️ Schema Scribe: AI-Powered Data Documentation

✨ See it in Action

🚀 Quick Start (60 Seconds)

1. Install

2. Initialize

3. Run!

✅ Key Features

🛠️ Supported Backends

Command Reference

schema-scribe init

schema-scribe db

schema-scribe dbt

schema-scribe lineage

schema-scribe serve

🚀 Web API Server

💡 Extensibility

🤝 Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`schema-scribe init`

`schema-scribe db`

`schema-scribe dbt`

`schema-scribe lineage`

`schema-scribe serve`