AI-powered data documentation generator for databases and dbt projects.
Project description
✍️ Schema Scribe: AI-Powered Data Documentation
Tired of writing data documentation? Let AI do it for you.
Schema Scribe is a CLI tool that scans your databases and dbt projects, uses AI to generate descriptions, and automatically updates your documentation.
✨ See it in Action
Stop manually updating YAML files or writing Markdown tables. Let schema-scribe do the work in seconds.
Magically update dbt schema.yml |
Instantly generate DB catalogs (w/ ERD) |
|---|---|
Run schema-scribe dbt --update and watch AI fill in your missing descriptions, tags, and tests. |
Point schema-scribe db at a database and get a full Markdown catalog, complete with a Mermaid ERD. |
🚀 Quick Start (60 Seconds)
Get your first AI-generated catalog in less than a minute.
1. Install
Clone the repo and install dependencies.
git clone https://github.com/dongwonmoon/SchemaScribe.git
cd SchemaScribe
pip install -r requirements.txt
(Note: For specific databases, install optional dependencies: pip install -e " .[postgres, snowflake]")
(Note: To use the web server, also install server dependencies: pip install "schema-scribe[server]")
2. Initialize
Run the interactive wizard. It will guide you through setting up your database and LLM, automatically creating config.yaml and a secure .env file for your API keys.
schema-scribe init
3. Run!
You're all set.
For a dbt project:
(Make sure dbt compile has been run to create manifest.json)
# See what's missing (CI check)
schema-scribe dbt --project-dir /path/to/your/dbt/project --check
# Let AI fix it
schema-scribe dbt --project-dir /path/to/your/dbt/project --update
# Check for documentation drift against the live database
schema-scribe dbt --project-dir /path/to/your/dbt/project --db your_db_profile --drift
# Generate a global, end-to-end lineage graph
schema-scribe lineage --project-dir /path/to/your/dbt/project --db your_db_profile --output your_mermaid_profile
For a database:
(Assuming you created an output profile named my_markdown during init)
schema-scribe db --output my_markdown
✅ Key Features
- 🤖 Automated Catalog Generation: Scans live databases or dbt projects to generate documentation. Includes AI-generated table summaries for databases.
- ✍️ LLM-Powered Descriptions: Uses AI (OpenAI, Google, Ollama) to create meaningful business descriptions for tables, views, models, and columns.
- 🧬 Deep dbt Integration:
- Direct YAML Updates: Seamlessly updates your dbt
schema.ymlfiles with AI-generated content. - CI/CD Validation: Use the
--checkflag in your CI pipeline to fail builds if documentation is outdated. - Interactive Updates: Use the
--interactiveflag to review and approve AI-generated changes one by one. - Documentation Drift Detection: Use the
--driftflag to compare your existing documentation against the live database, catching descriptions that have become inconsistent with reality.
- Direct YAML Updates: Seamlessly updates your dbt
- 🔒 Security-Aware: The
initwizard helps you store sensitive keys (passwords, API tokens) in a.envfile, not inconfig.yaml. - 🔌 Extensible by Design: A pluggable architecture supports multiple backends.
- 🌐 Global End-to-End Lineage: Generate a single, project-wide lineage graph that combines physical database foreign keys with logical dbt
refandsourcedependencies. - 🚀 Web API Server: Launch a FastAPI server to trigger documentation workflows programmatically. Includes built-in API documentation via Swagger/ReDoc.
🛠️ Supported Backends
| Type | Supported Providers |
|---|---|
| Databases | sqlite, postgres, mariadb, mysql, duckdb (files, directories, S3), snowflake |
| LLMs | openai, ollama, google |
| Outputs | markdown, dbt-markdown, json, confluence, notion, postgres-comment |
Command Reference
schema-scribe init
Runs the interactive wizard to create config.yaml and .env files. This is the recommended first step.
schema-scribe db
Scans a live database and generates a catalog.
--db TEXT: (Optional) The database profile fromconfig.yamlto use. Overrides default.--llm TEXT: (Optional) The LLM profile fromconfig.yamlto use. Overrides default.--output TEXT: (Required) The output profile fromconfig.yamlto use.
schema-scribe dbt
Scans a dbt project's manifest.json file.
--project-dir TEXT: (Required) Path to the dbt project directory.--update: (Flag) Directly update dbtschema.ymlfiles.--check: (Flag) Run in CI mode. Fails if documentation is outdated.--interactive: (Flag) Run in interactive mode. Prompts user for each AI-generated change.--drift: (Flag) Run in drift detection mode. Fails if existing documentation conflicts with the live database schema. Requires a--dbprofile.--llm TEXT: (Optional) The LLM profile to use.--output TEXT: (Optional) The output profile to use (if not using--update,--check, or--interactive).
Note: --update, --check, --interactive, and --drift flags are mutually exclusive. Choose only one.
schema-scribe lineage
Generates a global, end-to-end lineage graph for a dbt project.
--project-dir TEXT: (Required) Path to the dbt project directory.--db TEXT: (Required) The database profile to scan for physical Foreign Keys.--output TEXT: (Required) The output profile (must be type 'mermaid') to write the.mdfile to.
schema-scribe serve
Launches the FastAPI web server.
--host TEXT: (Optional) The host to bind the server to. Defaults to127.0.0.1.--port INTEGER: (Optional) The port to run the server on. Defaults to8000.
🚀 Web API Server
Schema Scribe includes a built-in FastAPI web server that exposes the core workflows via a REST API. This is perfect for programmatic integration or for building a custom web UI.
1. Launch the server:
(Make sure you have installed the server dependencies: pip install "schema-scribe[server]")
schema-scribe serve --host 0.0.0.0 --port 8000
2. Explore the API: Once the server is running, you can access the interactive API documentation (powered by Swagger UI) at: http://localhost:8000/docs
3. Example: Get available profiles
You can interact with the API using any HTTP client, like curl.
curl -X GET "http://localhost:8000/api/profiles" -H "accept: application/json"
This will return a JSON object listing all the database, LLM, and output profiles defined in your config.yaml.
4. Example: Trigger a dbt workflow
You can also trigger core workflows. For example, to run a dbt --check on a project:
curl -X POST "http://localhost:8000/api/run/dbt" \
-H "Content-Type: application/json" \
-d '{
"dbt_project_dir": "/path/to/your/dbt/project",
"check": true
}'
If the documentation is outdated, the API will return a 409 Conflict status code, making it easy to integrate with CI/CD pipelines.
💡 Extensibility
Adding a new database, LLM, or writer is easy:
- Create a new class in the appropriate directory (e.g.,
schema_scribe/components/db_connectors). - Implement the base interface (e.g.,
BaseConnector). - Register your new class in
schema_scribe/core/factory.py.
The init command and core logic will automatically pick up your new component.
🤝 Contributing
Contributions are welcome! Please feel free to open an issue or submit a pull request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file schema_scribe-0.2.0.tar.gz.
File metadata
- Download URL: schema_scribe-0.2.0.tar.gz
- Upload date:
- Size: 78.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9bd64560ac5b95e9e5de522b3a1126129d5956976db546510db4e48f93462d1a
|
|
| MD5 |
40fb3e393f06e909e3fdb62ba1447be8
|
|
| BLAKE2b-256 |
5f96501beb198e8c828dc0d0f9657ff2291164a77170a4e854b71469348362c7
|
File details
Details for the file schema_scribe-0.2.0-py3-none-any.whl.
File metadata
- Download URL: schema_scribe-0.2.0-py3-none-any.whl
- Upload date:
- Size: 108.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2a5b97b43d5caf2f091f2692ed9885fd3accb88135165d6bb946d156607b80bb
|
|
| MD5 |
594bbaeb9cf6dc71bc9a280ace7498ca
|
|
| BLAKE2b-256 |
76dcf7f3219ed17d35b74a2b4b2fdb30282d2fbe3701c5572469406c796ef64f
|