Reconstruct and document the business meaning of a legacy Oracle database schema — read-only, local-first, privacy-safe.
Project description
Blossa
Blossa connects to a (typically legacy, on-prem) Oracle database, reads it read-only, and reconstructs the business meaning of its schema: what each table and column actually represents, the likely relationships between them, and a human-readable map of the whole database — so a new engineer can understand a database that nobody documented, without needing the person who built it.
The engine is domain-agnostic. The first wedge is legacy Oracle on-prem.
Status: early MVP. CLI only. Produces a Markdown "database map" + a machine-readable JSON artifact.
Why
When the people who understood a database leave, the knowledge of what the data means in business terms walks out the door with them. Blossa accelerates understanding (gets a new person to ~70% in a day instead of weeks). It is honest about its limits: it reconstructs what is inferable from schema
- data + existing docs — it does not recover purely tacit knowledge that was never written down.
How it works
Oracle (read-only)
│ introspect data dictionary (ALL_TABLES, ALL_TAB_COLUMNS, ALL_CONSTRAINTS, …)
▼
[ deterministic core ] ── PKs/FKs, candidate FKs, orphans, type/naming issues, missing comments
│ build compact, PII-SAFE per-table summaries (aggregates + masked samples, never raw rows)
▼
[ LLM semantic pass ] ── runs ONLY over the structured summaries (local model by default)
│ → table purpose + column meaning, each with confidence + evidence
▼
[ renderer ] ── database_map.md + database_map.json
The deterministic core does the heavy lifting. The LLM is used sparingly, only over compact structured summaries — never over raw schema dumps or raw data.
Privacy / safety (hard constraints)
- The database connection is read-only.
- Never sends raw row values to any LLM — only aggregates, value patterns, and masked samples.
- Runs locally by default (Ollama / vLLM). With a local model, Blossa makes no external network calls.
- Do not develop or test against production / employer data — use the bundled synthetic schema.
Install
Once released to PyPI:
python -m pip install blossa
From source (for development):
python -m pip install -e ".[dev]"
Requires Python 3.12+. Uses the oracledb driver in thin mode — no Oracle Instant Client needed.
First run (the fast path)
blossa init # interactive: writes blossa.local.yml (DSN, user, schema, LLM choice)
blossa doctor # checks Python, driver, config, Oracle, the LLM, and output dir — tells you what's missing
blossa scan # run it
blossa doctor is the friend that tells a brand-new user exactly what to fix before scanning
(e.g. "Ollama not reachable — run ollama pull qwen2.5:14b", or "start the demo DB").
Quick start
1. Bring up a synthetic Oracle (for development)
cd docker
docker compose up -d # Oracle XE + the synthetic BLOSSA_DEMO schema
The seed script creates a few related tables with deliberately missing comments and one undeclared foreign key, so the whole pipeline can be exercised without real data.
2. Configure
Copy blossa.yml → blossa.local.yml and set your DSN / credentials (or use BLOSSA_* env vars).
Prefer the env var for the password:
export BLOSSA_ORACLE__PASSWORD=blossa_demo
3. Scan
blossa scan --config blossa.local.yml
Outputs out/database_map.md and out/database_map.json.
Try it without Oracle or a GPU
# Runs the full pipeline over a bundled offline fixture, with the heuristic (no-LLM) provider.
blossa scan --demo --llm-provider heuristic
Commands
| Command | What it does |
|---|---|
blossa init |
Interactive first-run setup; writes blossa.local.yml. |
blossa doctor |
Check every prerequisite (Python, driver, config, Oracle, LLM, output) and report fixes. |
blossa scan |
Full pipeline against the configured Oracle schema → Markdown + JSON. |
blossa scan --demo |
Run against the bundled offline fixture (no Oracle needed). |
blossa introspect |
Just dump the raw introspected schema as JSON (no checks, no LLM). |
blossa check-llm |
Verify the configured LLM provider is reachable. |
Run blossa --help for all flags.
Scope (MVP)
In: read-only Oracle introspection, deterministic schema analysis, PII-safe summaries, a local-LLM semantic pass, Markdown + JSON output.
Out (for now): web UI, chat interface, any write access, non-Oracle engines, query-log/lineage ingestion, managed cloud, model fine-tuning.
Develop & release
python -m pip install -e ".[dev]"
ruff check src tests # lint
pytest # tests
python -m build # build sdist + wheel into dist/
CI (lint + tests on Python 3.12/3.13, Linux + Windows) runs via .github/workflows/ci.yml.
To publish to PyPI: configure a Trusted Publisher for the project on PyPI, then push a version tag:
git tag v0.1.0 && git push --tags
.github/workflows/release.yml builds and publishes automatically (no API token needed).
Note: confirm the blossa name is available on PyPI before the first release.
License
Blossa is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0-only).
In short: you can use, run, study, and modify Blossa freely — but if you distribute it or run a modified version as a network service, you must make your source changes available under the same license. See NOTICE for copyright.
Copyright (c) 2026 Bogdan Voinea. "Blossa" is the project name; the license covers the code, not the name. A separate commercial license may be offered in the future — contributions are accepted under the CLA (see CONTRIBUTING.md) to keep that option open.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file blossa-0.1.0.tar.gz.
File metadata
- Download URL: blossa-0.1.0.tar.gz
- Upload date:
- Size: 53.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
415f1797dffffdc95f0767a7b0d6e3710524b7a599af8795d95133461c3037d9
|
|
| MD5 |
9e99526608ce17fe46da017cb4a8ccd0
|
|
| BLAKE2b-256 |
2b9dcf4e5ff5462e4bb2ae7c8a1f7341043ed6d65891e0cf34526eb4ac2b357d
|
Provenance
The following attestation bundles were made for blossa-0.1.0.tar.gz:
Publisher:
release.yml on bogdanv98/blossa
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
blossa-0.1.0.tar.gz -
Subject digest:
415f1797dffffdc95f0767a7b0d6e3710524b7a599af8795d95133461c3037d9 - Sigstore transparency entry: 1887137243
- Sigstore integration time:
-
Permalink:
bogdanv98/blossa@2a7fb93be5cb8dd48188db3d679172db27a5e418 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/bogdanv98
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@2a7fb93be5cb8dd48188db3d679172db27a5e418 -
Trigger Event:
push
-
Statement type:
File details
Details for the file blossa-0.1.0-py3-none-any.whl.
File metadata
- Download URL: blossa-0.1.0-py3-none-any.whl
- Upload date:
- Size: 56.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9b2c1c0bb5db7f3067df5628cb3cb328014d2f65f8ed4f041f4ea6de556e3eec
|
|
| MD5 |
cbc6aba586f4f281bb56109219ae5bee
|
|
| BLAKE2b-256 |
112fead3aee408757133b82316affd979491ecd92946bf63aa29b7a17aa1c12b
|
Provenance
The following attestation bundles were made for blossa-0.1.0-py3-none-any.whl:
Publisher:
release.yml on bogdanv98/blossa
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
blossa-0.1.0-py3-none-any.whl -
Subject digest:
9b2c1c0bb5db7f3067df5628cb3cb328014d2f65f8ed4f041f4ea6de556e3eec - Sigstore transparency entry: 1887137386
- Sigstore integration time:
-
Permalink:
bogdanv98/blossa@2a7fb93be5cb8dd48188db3d679172db27a5e418 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/bogdanv98
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@2a7fb93be5cb8dd48188db3d679172db27a5e418 -
Trigger Event:
push
-
Statement type: