A simple RAG implementation for educational purposes implemented by Murat Karakaya Akademi

These details have not been verified by PyPI

Project links

Homepage

Project description

rag-kmk

A compact helper library for small Retrieval-Augmented Generation (RAG) workflows.

Free software: MIT License
Docs: see docs/ for examples and developer notes

Quick install

pip:

pip install rag-kmk

From source:

git clone https://github.com/kmkarakaya/rag-kmk.git
cd rag-kmk
pip install -e .

Quick start — unified rag_client interface

from rag_kmk import rag_client

rag = rag_client()  # Optionally: rag_client(config_path="path/to/config.yaml")

# List collections
print(rag.list_collections())

# Create a collection
print(rag.create_collection("my_collection"))

# Add documents to a collection
print(rag.add_doc("my_collection", doc_path="tests/sample_documents"))

# Summarize a collection
print(rag.summarize_collection("my_collection"))

# Chat with the collection
print(rag.chat("my_collection", prompt="What is this document about?"))

# Delete a collection
print(rag.delete_collection("my_collection"))

# Clean up
rag.close()

Vector DB API (ChromaDB) — Consistent Client-based Usage

All vector DB operations now require an explicit ChromaDB client parameter for clarity and efficiency.
You must first create a client, then pass it to all DB functions.

from rag_kmk.vector_db.database import (
    create_chromadb_client,
    create_collection,
    load_collection,
    list_collection_names,
    summarize_collection,
    delete_collection,
    ChromaDBStatus,
)

# 1. Create/load persistent ChromaDB client
client_result = create_chromadb_client()
if client_result['client'] is None:
    raise RuntimeError(client_result['error'])
client = client_result['client']

# 2. List all collections
collections_result = list_collection_names(client)
print(collections_result)

# 3. Create a new collection
create_result, created_collection = create_collection(client, "my_collection")
print(create_result)

# 4. Load a collection
load_result, loaded_collection = load_collection(client, "my_collection")
print(load_result)

# 5. Summarize a collection
if loaded_collection:
    summary = summarize_collection(loaded_collection)
    print(summary)

# 6. Delete a collection
delete_result = delete_collection(client, "my_collection")
print(delete_result)

Example: Minimal `run.py`

from rag_kmk import CONFIG
from rag_kmk.vector_db.database import (
    create_chromadb_client,
    create_collection,
    load_collection,
    list_collection_names,
    summarize_collection,
    delete_collection,
    ChromaDBStatus,
)
import json

# Update config if needed
CONFIG['llm']['model'] = 'gemini-2.5-flash'

# Create/load client
client_result = create_chromadb_client()
if client_result['client'] is None:
    print(client_result['error'])
    exit(1)
client = client_result['client']

# List collections
collections_result = list_collection_names(client)
print(json.dumps(collections_result, indent=2))

# Create collection
collection_name = "my_new_collection"
create_result, created_collection = create_collection(client, collection_name)
print(json.dumps(create_result, indent=2))

# Load collection
load_result, loaded_collection = load_collection(client, collection_name)
print(json.dumps(load_result, indent=2))

# Summarize collection
if loaded_collection:
    summary_result = summarize_collection(loaded_collection)
    print(json.dumps(summary_result, indent=2))

# Delete collection
delete_result = delete_collection(client, collection_name)
print(json.dumps(delete_result, indent=2))

Configuration

Important config keys (see rag_kmk/config/config.yaml):

llm:
- api_key — direct API key (not recommended in source)
- api_key_env_var — name of environment variable that holds the API key
- model — model identifier used by the configured LLM backend
- system_prompt — optional system instruction
vector_db:
- chromaDB_path — filesystem path for persistent ChromaDB; set to a directory path for persistent storage

Notes:

Legacy key chroma_db is accepted and normalized to chromaDB_path by load_config().
Use rag_kmk.config.config.mask_config(cfg) when printing or logging config to avoid leaking secrets.
Prefer calling initialize_rag() or load_config() explicitly in long-running programs instead of relying on the import-time CONFIG population.

API reference (short)

Primary helpers and their key parameters (one-line):

rag_kmk.initialize_rag(custom_config_path=None) -> dict
- Loads config using load_config() and returns the config dict.
rag_kmk.config.config.load_config(config_path=None) -> dict
- Loads and normalizes repository config (populates module CONFIG).
rag_kmk.config.config.mask_config(config, keys=('api_key','api_key_env_var')) -> dict
- Returns a shallow copy with sensitive values masked for safe logging.
rag_kmk.knowledge_base.document_loader.build_knowledge_base(collection_name: str, document_directory_path: Optional[str]=None, add_documents: bool=False, chromaDB_path: Optional[str]=None, cfg: Optional[dict]=None, overwrite: bool=False) -> (collection, ChromaDBStatus)
- Create (or open) a collection and optionally ingest documents.
rag_kmk.knowledge_base.document_loader.load_knowledge_base(collection_name: str, cfg: Optional[dict]=None) -> (collection or None, ChromaDBStatus)
- Open-only helper (does not create directories).
rag_kmk.vector_db.database.create_chromadb_client(chromaDB_path=None) -> {'status': str, 'client': client or None, 'error': str or None}
rag_kmk.vector_db.database.create_collection(client, collection_name) -> (result_dict, collection or None)
rag_kmk.vector_db.database.load_collection(client, collection_name) -> (result_dict, collection or None)
rag_kmk.vector_db.database.list_collection_names(client) -> {'status': str, 'collections': list, 'error': str or None}
rag_kmk.vector_db.database.summarize_collection(chroma_collection) -> {'status': str, 'summary': dict, 'error': str or None}
rag_kmk.vector_db.database.delete_collection(client, collection_name) -> {'status': str, 'success': bool, 'error': str or None}
rag_kmk.vector_db.database.ChromaDBStatus
- Enum-like statuses (CLIENT_READY, COLLECTION_CREATED, COLLECTION_LOADED, COLLECTION_LISTED, SUMMARY_READY, etc.)
rag_kmk.chat_flow.llm_interface.build_chatBot(config: Optional[dict]=None) -> ChatClient
- Lazily builds an LLM-backed ChatClient or returns a no-op client when SDK/creds missing.
rag_kmk.chat_flow.llm_interface.generate_LLM_answer(client, prompt: str, timeout_seconds: int=30, **opts) -> str
- Runs client generation with a timeout and returns text output.
rag_kmk.chat_flow.llm_interface.run_rag_pipeline(client, kb_collection, non_interactive: bool=False)
- Small interactive loop (prints to stdout); supply non_interactive=True in scripts/CI.
rag_kmk.utils.compute_fingerprint(path: str) -> str
- SHA256 hex digest for a file; raises FileNotFoundError if missing.
rag_kmk.utils.now_isoutc() -> str
- Current UTC timestamp as ISO8601 string ending with 'Z'.

If you need exact parameter details, consult the module source in rag_kmk/ (this README aims to be a concise reference).

Persistence & semantics

Path resolution precedence used by build_knowledge_base():

explicit chromaDB_path argument
cfg.get('vector_db', {}).get('chromaDB_path') returned by load_config()
default: ./chromaDB created under the current working directory

Notes on persistence behavior (persistent-only):
The library requires a filesystem path for persistent ChromaDB. Pass a directory to chromaDB_path or configure vector_db.chromaDB_path in the config.
Supplying a filesystem path forces persistent storage; build_knowledge_base will create the path if needed.

Development & testing

Run tests:

pytest -q tests

Coverage helper (repository includes a helper script):

scripts\run_coverage.bat

An environment spec exists at env-rag-backup.yml.

Contributing & CI

See docs/contributing.md for contribution guidelines.
CI workflows are under .github/workflows/.

Troubleshooting & notes

If the LLM SDK or credentials are missing the library returns a no-op ChatClient so non-LLM parts of the pipeline continue to work.
generate_LLM_answer() enforces a timeout (default 30s) and raises a RuntimeError on timeout.
When debugging auth or model issues, print rag_kmk.config.config.mask_config(config) rather than the raw config to avoid leaking secrets.

Logging

The library uses Python's standard logging module. By default the package is non-invasive (it will not configure the global logging handlers so host applications remain in control).

To enable file+console logging for development, set the environment variable RAG_KMK_AUTOLOG=1 before running your application. The library will read CONFIG['logging'] (see config.yaml) and create a rotating file at the configured path (default logs/rag_kmk.log) as well as stream logs to the console.
You can also programmatically initialize logging from your application using the helper rag_kmk.logging_setup.init_logging_from_config(config, force=False).

PowerShell example to run the sample runner with logging enabled:

$env:RAG_KMK_AUTOLOG = "1"
python run.py

Or programmatically (no env var):

python - <<'PY'
import rag_kmk.logging_setup as ls
ls.init_logging_from_config(None, force=True)
import run
PY

Log file location and rotation are configurable via CONFIG['logging'] keys: file, level, max_bytes, and backup_count.

What's new (changelog fragment)

All vector DB operations now require an explicit client parameter for clarity and efficiency.
README and run.py updated to reflect the new API.
Clarified persistence resolution (explicit arg > config > default) and removed references to a non-existent force_persistence parameter.

For more examples and developer notes see docs/ and run.py (canonical usage example).

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.55

Oct 19, 2025

0.0.54

Oct 17, 2025

0.0.53

Oct 13, 2025

0.0.52

Oct 13, 2025

0.0.51

Oct 13, 2025

0.0.49

Oct 13, 2025

0.0.40

Nov 29, 2024

0.0.38

Sep 24, 2024

0.0.26

Jul 22, 2024

0.0.25

Jul 22, 2024

0.0.24

Jul 20, 2024

0.0.22

Jul 20, 2024

0.0.20

Jul 19, 2024

0.0.12

Jul 18, 2024

0.0.11

Jul 18, 2024

0.0.9

Jul 18, 2024

0.0.5

Jul 18, 2024

0.0.4

Jul 17, 2024

0.0.2

Jul 17, 2024

0.0.1

Jul 17, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rag_kmk-0.0.55.tar.gz (460.8 kB view details)

Uploaded Oct 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rag_kmk-0.0.55-py2.py3-none-any.whl (27.6 kB view details)

Uploaded Oct 19, 2025 Python 2Python 3

File details

Details for the file rag_kmk-0.0.55.tar.gz.

File metadata

Download URL: rag_kmk-0.0.55.tar.gz
Upload date: Oct 19, 2025
Size: 460.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.8

File hashes

Hashes for rag_kmk-0.0.55.tar.gz
Algorithm	Hash digest
SHA256	`9784b1b5c3597a4de8d420ec7a524ac7653d31e8a599d7984f1dbd61dbe74190`
MD5	`81f4491e7f378fa85a4f4ad18b40a3ad`
BLAKE2b-256	`ba44c4ea4376ad4ecd7dc0ffddacee0341266d2f1f2d831a5356696324399e89`

See more details on using hashes here.

File details

Details for the file rag_kmk-0.0.55-py2.py3-none-any.whl.

File metadata

Download URL: rag_kmk-0.0.55-py2.py3-none-any.whl
Upload date: Oct 19, 2025
Size: 27.6 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.8

File hashes

Hashes for rag_kmk-0.0.55-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`e9131cffad065fbb5fdaf4de00818afc1a5fc7187ce319112ec533e861dbc255`
MD5	`087b95ab6d62063345f5f74416db1b48`
BLAKE2b-256	`19e7e4d8f08e6e1905ba7bb981021917a1a8fb95c8ab68827f344b5292b87e79`

See more details on using hashes here.

rag-kmk 0.0.55

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

rag-kmk

Quick install

Quick start — unified rag_client interface

Vector DB API (ChromaDB) — Consistent Client-based Usage

Example: Minimal `run.py`

Configuration

API reference (short)

Persistence & semantics

Development & testing

Contributing & CI

Troubleshooting & notes

Logging

What's new (changelog fragment)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

rag-kmk 0.0.55

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

rag-kmk

Quick install

Quick start — unified rag_client interface

Vector DB API (ChromaDB) — Consistent Client-based Usage

Example: Minimal run.py

Configuration

API reference (short)

Persistence & semantics

Development & testing

Contributing & CI

Troubleshooting & notes

Logging

What's new (changelog fragment)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Example: Minimal `run.py`