RAG from documents in a local directory
Project description
Local Directory RAG
A simple tool for Retrieval-Augmented Generation (RAG) using documents from your local filesystem.
Overview
Local Directory RAG allows you to:
- Create vector embeddings from your local documents (PDF, TXT)
- Query these documents using natural language, leveraging OpenAI's language models
Requirements
- Python 3.13 or higher
- OpenAI API key and other parameters (set in your .env file)
Installation
This project uses Poetry for dependency management.
-
Install Poetry by following the instructions in the official documentation.
Quick installation methods:
# For Linux, macOS, Windows (WSL) curl -sSL https://install.python-poetry.org | python3 -
# For Windows PowerShell (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python -
-
Install Project Dependencies
# Clone the repository git clone https://github.com/sualeh/local-dir-rag.git cd local-dir-rag
-
Install dependencies using Poetry
poetry install --extras "dev" poetry show --tree
Configuration
Copy the ".env.example" file as ".env" in the project root. Update it with your OpenAI API key, location of your documents, and where you would like the vector database to be created.
Usage
-
Create Vector Database
poetry run python -m local_dir_rag.main embed --docs-directory /path/to/docs --vector-db-path /path/to/vector_db
-
Query Documents
poetry run python -m local_dir_rag.main query --vector-db-path /path/to/vector_db
Development and Testing
-
Install dependencies, as above.
-
Run all tests:
poetry run pytest
Or, run a single test:
poetry run pytest tests/test_document_loader.py::test_load_document
Docker Compose Usage
You can also use Docker Compose for easier management of the Local RAG container:
-
Clone the project, as described above.
-
Configure the ".env" file as described above.
-
Run the application using Docker Compose:
# For embedding documents docker-compose run local-dir-rag embed
# For querying documents docker-compose run local-dir-rag query
You can also pass additional arguments:
docker-compose run local-dir-rag embed --docs-directory /data/docs --vector-db-path /data/vector_db
This approach simplifies volume mounting and environment variable management, especially when working with the tool regularly.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file local_dir_rag-0.4.1.tar.gz.
File metadata
- Download URL: local_dir_rag-0.4.1.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52e766bd81ab8e256a95c12d73c41504df826a5a2cc9c3f1247b48cc7677d22d
|
|
| MD5 |
b521a8e841c4f06a42f3240217ba7a99
|
|
| BLAKE2b-256 |
df9a94c1d72eb2149d2d6e457d336f77c93f6904b9cf3162d4f00566393c71fd
|
Provenance
The following attestation bundles were made for local_dir_rag-0.4.1.tar.gz:
Publisher:
publish-pypi.yml on sualeh/local-dir-rag
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
local_dir_rag-0.4.1.tar.gz -
Subject digest:
52e766bd81ab8e256a95c12d73c41504df826a5a2cc9c3f1247b48cc7677d22d - Sigstore transparency entry: 208220742
- Sigstore integration time:
-
Permalink:
sualeh/local-dir-rag@4e6c92f05ddff58385aaff98da4013e6569798f8 -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/sualeh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@4e6c92f05ddff58385aaff98da4013e6569798f8 -
Trigger Event:
push
-
Statement type:
File details
Details for the file local_dir_rag-0.4.1-py3-none-any.whl.
File metadata
- Download URL: local_dir_rag-0.4.1-py3-none-any.whl
- Upload date:
- Size: 9.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a153d0146f3014eb5c0b8d12c03e23c83c43a2db05252cb22c3877fe52b8efd8
|
|
| MD5 |
208409d0047c05b1f7d780df2c980ffa
|
|
| BLAKE2b-256 |
4812f6698acf59452118d8700363002d251b1cacb0a00232dbc98f30d57f7aff
|
Provenance
The following attestation bundles were made for local_dir_rag-0.4.1-py3-none-any.whl:
Publisher:
publish-pypi.yml on sualeh/local-dir-rag
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
local_dir_rag-0.4.1-py3-none-any.whl -
Subject digest:
a153d0146f3014eb5c0b8d12c03e23c83c43a2db05252cb22c3877fe52b8efd8 - Sigstore transparency entry: 208220746
- Sigstore integration time:
-
Permalink:
sualeh/local-dir-rag@4e6c92f05ddff58385aaff98da4013e6569798f8 -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/sualeh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@4e6c92f05ddff58385aaff98da4013e6569798f8 -
Trigger Event:
push
-
Statement type: