Extract relevant metadata from databases and transform it into context for Retrieval-Augmented Generation (RAG) in generative AI applications.
Project description
database2prompt
An open-source project designed to extract relevant data from databases and transform it into context for Retrieval-Augmented Generation (RAG) in generative AI applications.
How is it useful?
database2prompt makes it easy to generate prompts to LLMS by reading your database and generating a markdown containing its schema. This provides context for the AI to maximize the effectiveness of your prompts.
Databases Support (WIP)
| Databases | Support |
|---|---|
| PostgreSQL | ✅ |
We will add support for most databases including analytical databases
Output Formats
| Output Format | Support |
|---|---|
| JSON | ✅ |
| Markdown | ✅ |
Example Outputs
You can find example outputs generated by database2prompt in the following files:
- summary-database.md - Example of markdown output
- summary-database.json - Example of JSON output
Usage
Installation
pip install database2prompt
Quick Start
Here's a simple example of how to use database2prompt:
from database2prompt.database.core.database_config import DatabaseConfig
from database2prompt.database.core.database_params import DatabaseParams
from database2prompt.database.core.database_factory import DatabaseFactory
from database2prompt.database.processing.database_processor import DatabaseProcessor
from database2prompt.markdown.markdown_generator import MarkdownGenerator
# 1. Configure database connection
config = DatabaseConfig(
host="localhost",
port=5432,
user="your_user",
password="your_password",
database="your_database",
schema="your_schema"
)
# 2. Connect to database
strategy = DatabaseFactory.run("pgsql", config)
next(strategy.connection())
# 3. Configure which tables to document
params = DatabaseParams()
# Option A: Document specific tables
params.tables(["schema.table1", "schema.table2"])
# Option B: Ignore specific tables
params.ignore_tables(["schema.table_to_ignore"])
# 4. Process database information
database_processor = DatabaseProcessor(strategy, params)
# 5. Generate content to prompt (markdown or json)
content = database_processor.database_to_prompt(output_format="json")
Configuration
Configure the database connection:
# .env file
DB_HOST=localhost
DB_PORT=5432
DB_USER=postgres
DB_PASSWORD=postgres
DB_NAME=postgres
DB_SCHEMA=public
config = DatabaseConfig.from_env()
Contributing
Development Setup
-
Clone the repository:
git clone https://github.com/orladigital/database2prompt.git cd database2prompt
-
Create a virtual environment:
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install development dependencies:
pip install poetry poetry install
-
Start the development database (optional):
docker compose up -d
-
Run the project:
poetry run python database2prompt/main.py
How to Contribute
You can contribute to database2prompt in many different ways:
- Suggest a feature
- Code an approved feature idea (check our issues)
- Report a bug
- Fix something and open a pull request
- Help with documentation
- Spread the word!
License
Licensed under the MIT License, see LICENSE for more information.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file database2prompt-0.2.0.tar.gz.
File metadata
- Download URL: database2prompt-0.2.0.tar.gz
- Upload date:
- Size: 9.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5058999058d6c00cdd7f879bf2b0c210cdbb9e4a7e5aaec4d7941625e33d854f
|
|
| MD5 |
138f0c6b461ed2472b7a9b56080d2846
|
|
| BLAKE2b-256 |
38fb8e5cff66ce54530989fd21459b90a24c10c6ee1114423f6eeb0d9fee7b13
|
Provenance
The following attestation bundles were made for database2prompt-0.2.0.tar.gz:
Publisher:
pypi-publish.yaml on orladigital/database2prompt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
database2prompt-0.2.0.tar.gz -
Subject digest:
5058999058d6c00cdd7f879bf2b0c210cdbb9e4a7e5aaec4d7941625e33d854f - Sigstore transparency entry: 229776237
- Sigstore integration time:
-
Permalink:
orladigital/database2prompt@1e686a64cb55f7d0d55fb3bd47124bc0a6d451ae -
Branch / Tag:
refs/heads/release/v0.2.0 - Owner: https://github.com/orladigital
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yaml@1e686a64cb55f7d0d55fb3bd47124bc0a6d451ae -
Trigger Event:
push
-
Statement type:
File details
Details for the file database2prompt-0.2.0-py3-none-any.whl.
File metadata
- Download URL: database2prompt-0.2.0-py3-none-any.whl
- Upload date:
- Size: 12.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7920c1807c5020e1036c6e54282112c72414e6e889733dc9b546e29e42b8997
|
|
| MD5 |
6499b24a8b69a969b7560ac25ae68ca6
|
|
| BLAKE2b-256 |
f6a60f72b68d44ca4f567495792c00d5c2700da1355f0b01a6050da4b3261abc
|
Provenance
The following attestation bundles were made for database2prompt-0.2.0-py3-none-any.whl:
Publisher:
pypi-publish.yaml on orladigital/database2prompt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
database2prompt-0.2.0-py3-none-any.whl -
Subject digest:
c7920c1807c5020e1036c6e54282112c72414e6e889733dc9b546e29e42b8997 - Sigstore transparency entry: 229776238
- Sigstore integration time:
-
Permalink:
orladigital/database2prompt@1e686a64cb55f7d0d55fb3bd47124bc0a6d451ae -
Branch / Tag:
refs/heads/release/v0.2.0 - Owner: https://github.com/orladigital
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yaml@1e686a64cb55f7d0d55fb3bd47124bc0a6d451ae -
Trigger Event:
push
-
Statement type: