nyrag
Project description
NyRAG
NyRAG (pronounced as knee-RAG) is a simple tool for building RAG applications by crawling websites or processing documents, then deploying to Vespa for hybrid search with an integrated chat UI.
How It Works
When a user asks a question, NyRAG performs a multi-stage retrieval process:
- Query Enhancement: An LLM generates additional search queries based on the user's question and initial context to improve retrieval coverage
- Embedding Generation: Each query is converted to embeddings using the configured SentenceTransformer model
- Vespa Search: Queries are executed against Vespa using nearestNeighbor search with the
best_chunk_scoreranking profile to find the most relevant document chunks - Chunk Fusion: Results from all queries are aggregated, deduplicated, and ranked by score to select the top-k most relevant chunks
- Answer Generation: The retrieved context is sent to an LLM which generates a grounded answer based only on the provided chunks
This multi-query RAG approach with chunk-level retrieval ensures answers are comprehensive and grounded in your actual content, whether from crawled websites or processed documents.
LLM Support
NyRAG works with any OpenAI-compatible API, including:
- OpenRouter (100+ models from various providers)
- Ollama (local models: Llama, Mistral, Qwen, etc.)
- LM Studio (local GUI for running models)
- vLLM (high-performance local or remote inference)
- LocalAI (local OpenAI drop-in replacement)
- OpenAI (GPT-4, GPT-3.5, etc.)
- Any other service implementing the OpenAI API format
Installation
pip install nyrag
We recommend uv:
uv init --python 3.10
uv venv
uv sync
source .venv/bin/activate
uv pip install -U nyrag
For development:
git clone https://github.com/abhishekkrthakur/nyrag.git
cd nyrag
pip install -e .
Quick Start
nyrag operates in two deployment modes (Local or Cloud) and two data modes (Web or Docs):
| Deployment | Data Mode | Description |
|---|---|---|
| Local | Web | Crawl websites → Local Vespa Docker |
| Local | Docs | Process documents → Local Vespa Docker |
| Cloud | Web | Crawl websites → Vespa Cloud |
| Cloud | Docs | Process documents → Vespa Cloud |
Local Mode
Runs Vespa in a local Docker container. Great for development and testing.
Web Crawling (Local)
export NYRAG_LOCAL=1
nyrag --config configs/example.yml
Example config for web crawling:
name: mywebsite
mode: web
start_loc: https://example.com/
exclude:
- https://example.com/admin/*
- https://example.com/private/*
crawl_params:
respect_robots_txt: true
follow_subdomains: true
user_agent_type: chrome
rag_params:
embedding_model: sentence-transformers/all-MiniLM-L6-v2
chunk_size: 1024
chunk_overlap: 50
Document Processing (Local)
export NYRAG_LOCAL=1
nyrag --config configs/doc_example.yml
Example config for document processing:
name: mydocs
mode: docs
start_loc: /path/to/documents/
exclude:
- "*.csv"
doc_params:
recursive: true
file_extensions:
- .pdf
- .docx
- .txt
- .md
rag_params:
embedding_model: sentence-transformers/all-mpnet-base-v2
chunk_size: 512
chunk_overlap: 50
Chat UI (Local)
After crawling/processing is complete:
export NYRAG_CONFIG=configs/example.yml
export OPENROUTER_API_KEY=your-api-key
export OPENROUTER_MODEL=openai/gpt-5.1
uvicorn nyrag.api:app --host 0.0.0.0 --port 8000
Open http://localhost:8000/chat
Cloud Mode
Deploys to Vespa Cloud for production use.
Web Crawling (Cloud)
export NYRAG_LOCAL=0
export VESPA_CLOUD_TENANT=your-tenant
nyrag --config configs/example.yml
Document Processing (Cloud)
export NYRAG_LOCAL=0
export VESPA_CLOUD_TENANT=your-tenant
nyrag --config configs/doc_example.yml
Chat UI (Cloud)
After crawling/processing is complete:
export NYRAG_CONFIG=configs/example.yml
export VESPA_URL="https://<your-endpoint>.z.vespa-app.cloud"
export OPENROUTER_API_KEY=your-api-key
export OPENROUTER_MODEL=openai/gpt-5.1
uvicorn nyrag.api:app --host 0.0.0.0 --port 8000
Open http://localhost:8000/chat
Configuration Reference
Web Mode Parameters (crawl_params)
| Parameter | Type | Default | Description |
|---|---|---|---|
respect_robots_txt |
bool | true |
Respect robots.txt rules |
aggressive_crawl |
bool | false |
Faster crawling with more concurrent requests |
follow_subdomains |
bool | true |
Follow links to subdomains |
strict_mode |
bool | false |
Only crawl URLs matching start pattern |
user_agent_type |
str | chrome |
chrome, firefox, safari, mobile, bot |
custom_user_agent |
str | None |
Custom user agent string |
allowed_domains |
list | None |
Explicitly allowed domains |
Docs Mode Parameters (doc_params)
| Parameter | Type | Default | Description |
|---|---|---|---|
recursive |
bool | true |
Process subdirectories |
include_hidden |
bool | false |
Include hidden files |
follow_symlinks |
bool | false |
Follow symbolic links |
max_file_size_mb |
float | None |
Max file size in MB |
file_extensions |
list | None |
Only process these extensions |
RAG Parameters (rag_params)
| Parameter | Type | Default | Description |
|---|---|---|---|
embedding_model |
str | sentence-transformers/all-MiniLM-L6-v2 |
Embedding model |
embedding_dim |
int | 384 |
Embedding dimension |
chunk_size |
int | 1024 |
Chunk size for text splitting |
chunk_overlap |
int | 50 |
Overlap between chunks |
distance_metric |
str | angular |
Distance metric |
max_tokens |
int | 8192 |
Max tokens per document |
llm_base_url |
str | None |
LLM API base URL (OpenAI-compatible) |
llm_model |
str | None |
LLM model name |
llm_api_key |
str | None |
LLM API key |
Environment Variables
Deployment Mode
| Variable | Description |
|---|---|
NYRAG_LOCAL |
1 for local Docker, 0 for Vespa Cloud |
Local Mode
| Variable | Description |
|---|---|
NYRAG_VESPA_DOCKER_IMAGE |
Docker image (default: vespaengine/vespa:latest) |
Cloud Mode
| Variable | Description |
|---|---|
VESPA_CLOUD_TENANT |
Your Vespa Cloud tenant |
VESPA_CLOUD_APPLICATION |
Application name (optional) |
VESPA_CLOUD_INSTANCE |
Instance name (default: default) |
VESPA_CLOUD_API_KEY_PATH |
Path to API key file |
VESPA_CLIENT_CERT |
Path to mTLS certificate |
VESPA_CLIENT_KEY |
Path to mTLS private key |
Chat UI
| Variable | Description |
|---|---|
NYRAG_CONFIG |
Path to config file |
VESPA_URL |
Vespa endpoint URL (optional for local, required for cloud) |
LLM_BASE_URL |
LLM API base URL (OpenAI-compatible API) |
LLM_MODEL |
LLM model name |
LLM_API_KEY |
LLM API key |
OPENROUTER_API_KEY |
OpenRouter API key (alternative to LLM_API_KEY) |
OPENROUTER_MODEL |
OpenRouter model (alternative to LLM_MODEL) |
OPENROUTER_BASE_URL |
OpenRouter base URL (alternative to LLM_BASE_URL) |
Using Local Models
NyRAG supports running LLMs locally using any OpenAI-compatible server. Here are some popular options:
Ollama
-
Install Ollama: https://ollama.ai
-
Pull a model:
ollama pull llama3.2
-
Configure NyRAG (option 1: environment variables):
export LLM_BASE_URL=http://localhost:11434/v1 export LLM_MODEL=llama3.2 export LLM_API_KEY=dummy # Any value works
-
Or configure in YAML:
rag_params: llm_base_url: http://localhost:11434/v1 llm_model: llama3.2 llm_api_key: dummy
-
Start the chat UI:
nyrag-api
LM Studio
-
Install LM Studio: https://lmstudio.ai
-
Load a model and start the server (default port: 1234)
-
Configure NyRAG:
export LLM_BASE_URL=http://localhost:1234/v1 export LLM_MODEL=local-model # Model name from LM Studio export LLM_API_KEY=dummy
vLLM
-
Install and run vLLM:
pip install vllm python -m vllm.entrypoints.openai.api_server \ --model meta-llama/Llama-3.2-3B-Instruct \ --port 8000
-
Configure NyRAG:
export LLM_BASE_URL=http://localhost:8000/v1 export LLM_MODEL=meta-llama/Llama-3.2-3B-Instruct export LLM_API_KEY=dummy
OpenRouter (Cloud)
For access to 100+ models without local setup:
export LLM_BASE_URL=https://openrouter.ai/api/v1
export LLM_MODEL=anthropic/claude-3.5-sonnet
export LLM_API_KEY=your-openrouter-key
Or use the legacy environment variables:
export OPENROUTER_API_KEY=your-key
export OPENROUTER_MODEL=anthropic/claude-3.5-sonnet
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nyrag-0.0.8.tar.gz.
File metadata
- Download URL: nyrag-0.0.8.tar.gz
- Upload date:
- Size: 50.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e6dfc6d27dd94d338c9f19a85ee27c13bcd11e6d4048a6d020091c6e8f540d7
|
|
| MD5 |
9706a63b59c91c745aa41f370a9fdb1d
|
|
| BLAKE2b-256 |
9faf58641a111a0ee9feba03207115d4a54b0b10e47ab100120d436383aebec3
|
File details
Details for the file nyrag-0.0.8-py3-none-any.whl.
File metadata
- Download URL: nyrag-0.0.8-py3-none-any.whl
- Upload date:
- Size: 52.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e23f304c7d5efaf7b276d840d8dbb7240c74657971e2e087fb837ddaa5cdf3f1
|
|
| MD5 |
5a1340af2aafcab72a27f437621704a9
|
|
| BLAKE2b-256 |
69f3f50b7753656ce7291810bd74d93ead079151e1984ed1a8e31ce171b47749
|