RAG (Retrieval-Augmented Generation) System

These details have not been verified by PyPI

Project links

Repository

Project description

RAG (Retrieval-Augmented Generation) System

A Python-based RAG system that processes text files, generates embeddings, and stores them in a Postgres database with pgvector for efficient similarity search.

For detailed information, see:

Design Document - System architecture and requirements
Developer Guide - Detailed setup and development instructions

Features

File ingestion (source code, Markdown, plain text)
Text chunking with configurable overlap
Vector embeddings via OpenAI or Hugging Face
Postgres + pgvector for vector storage and search
Project-based organization of documents
Comprehensive Taskfile for development workflows
Dockerized Postgres database with pgvector

Getting Started

Prerequisites

Python 3.10+
Docker
Poetry (will be installed automatically by the setup script)

Poetry Management

This project uses Poetry for dependency management. Key Poetry commands are wrapped in Taskfile tasks:

Install dependencies:

task install

Set up development environment:

task setup-dev

Update dependencies:

task update-deps

Export requirements files:

task export-reqs

Check dependency status:

task verify-deps

The project includes both runtime and development dependencies specified in pyproject.toml.

Installation

Clone the repository:

git clone https://github.com/SpillwaveSolutions/vector-rag
cd vector-rag

Set up the development environment:

task setup-dev

Configure environment variables:

cp environment/.env.example .env
# Edit .env with your settings:
# - Database credentials
# - OpenAI API key (if using OpenAI embeddings)

Running the System

Start the Database

The system uses a Dockerized Postgres database with pgvector:

task db:up

Run Examples

With mock embeddings (no API key required):

task demo:mock

With OpenAI embeddings (requires API key in .env):

task demo:openai

Interactive Database Access

To access the database directly:

task psql

Testing

The project includes comprehensive tests:

Run all tests:

task test:all

Run integration tests:

task test:integration

Run tests with coverage report:

task test:coverage

Run a specific test:

task test:single -- tests/path/to/test_file.py::test_name

Development Workflow

Code Formatting and Linting

task format  # Runs black and isort
task typecheck  # Runs mypy
task lint  # Runs all code quality checks

Dependency Management

Update dependencies:

task update-deps

Export requirements files:

task export-reqs

Database Management

Recreate the database from scratch:

task db:recreate

Stop the database:

task db:down

Configuration

The system is configured through environment variables in .env. Key settings include:

DB_*: Database connection settings
OPENAI_API_KEY: Required for OpenAI embeddings
LOCAL_EMBEDDING: Set to true to use local SentenceTransformers
EMBEDDINGS_DIM: Vector dimension (384 for local, 1536 for OpenAI)
CHUNK_SIZE/CHUNK_OVERLAP: Text chunking parameters

License

MIT License

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

Please ensure all tests pass and code is properly formatted before submitting PRs.

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

This version

0.1.1

Mar 14, 2025

0.1.0

Jan 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vector_rag-0.1.1.tar.gz (19.3 kB view details)

Uploaded Mar 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vector_rag-0.1.1-py3-none-any.whl (26.2 kB view details)

Uploaded Mar 14, 2025 Python 3

File details

Details for the file vector_rag-0.1.1.tar.gz.

File metadata

Download URL: vector_rag-0.1.1.tar.gz
Upload date: Mar 14, 2025
Size: 19.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for vector_rag-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`2d035f33ee6e95b7495cfe9def7b5eed1a4f4033064b97644cd205eca8e6fa7b`
MD5	`6951d8735d8940f8074ec9734a85e4fa`
BLAKE2b-256	`b1b1d393bfd3fc4900fa7dac879afba780d4a3163fd693bfb857f9a9c302ad80`

See more details on using hashes here.

File details

Details for the file vector_rag-0.1.1-py3-none-any.whl.

File metadata

Download URL: vector_rag-0.1.1-py3-none-any.whl
Upload date: Mar 14, 2025
Size: 26.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for vector_rag-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7809061e8df272edd638005b91152979646d8ab15321e61d918a06265820c39d`
MD5	`7d3bec780bc8d35e32632b650b4a9eed`
BLAKE2b-256	`55830e6bebb3a6d6bda9cc4be34fda3659faf85fa70e90b7aebf9a2df6027d2c`

See more details on using hashes here.

vector-rag 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

RAG (Retrieval-Augmented Generation) System

Features

Getting Started

Prerequisites

Poetry Management

Installation

Running the System

Start the Database

Run Examples

Interactive Database Access

Testing

Development Workflow

Code Formatting and Linting

Dependency Management

Database Management

Configuration

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes