Skip to main content

nyrag

Project description

NyRAG

NyRAG (pronounced as knee-RAG) is a simple tool for building RAG applications by crawling websites or processing documents, then deploying to Vespa for hybrid search with an integrated chat UI.

NyRAG Chat UI

How It Works

When a user asks a question, NyRAG performs a multi-stage retrieval process:

  1. Query Enhancement: An LLM generates additional search queries based on the user's question and initial context to improve retrieval coverage
  2. Embedding Generation: Each query is converted to embeddings using the configured SentenceTransformer model
  3. Vespa Search: Queries are executed against Vespa using nearestNeighbor search with the best_chunk_score ranking profile to find the most relevant document chunks
  4. Chunk Fusion: Results from all queries are aggregated, deduplicated, and ranked by score to select the top-k most relevant chunks
  5. Answer Generation: The retrieved context is sent to an LLM which generates a grounded answer based only on the provided chunks

This multi-query RAG approach with chunk-level retrieval ensures answers are comprehensive and grounded in your actual content, whether from crawled websites or processed documents.

LLM Support

NyRAG works with any OpenAI-compatible API, including:

  • OpenRouter (100+ models from various providers)
  • Ollama (local models: Llama, Mistral, Qwen, etc.)
  • LM Studio (local GUI for running models)
  • vLLM (high-performance local or remote inference)
  • LocalAI (local OpenAI drop-in replacement)
  • OpenAI (GPT-4, GPT-3.5, etc.)
  • Any other service implementing the OpenAI API format

Installation

pip install nyrag

We recommend uv:

uv init --python 3.10
uv venv
uv sync
source .venv/bin/activate
uv pip install -U nyrag

For development:

git clone https://github.com/abhishekkrthakur/nyrag.git
cd nyrag
pip install -e .

Quick Start

nyrag operates in two deployment modes (Local or Cloud) and two data modes (Web or Docs):

Deployment Data Mode Description
Local Web Crawl websites → Local Vespa Docker
Local Docs Process documents → Local Vespa Docker
Cloud Web Crawl websites → Vespa Cloud
Cloud Docs Process documents → Vespa Cloud

Local Mode

Runs Vespa in a local Docker container. Great for development and testing.

Web Crawling (Local)

export NYRAG_LOCAL=1

nyrag --config configs/example.yml

Example config for web crawling:

name: mywebsite
mode: web
start_loc: https://example.com/
exclude:
  - https://example.com/admin/*
  - https://example.com/private/*

crawl_params:
  respect_robots_txt: true
  follow_subdomains: true
  user_agent_type: chrome

rag_params:
  embedding_model: sentence-transformers/all-MiniLM-L6-v2
  chunk_size: 1024
  chunk_overlap: 50

Document Processing (Local)

export NYRAG_LOCAL=1

nyrag --config configs/doc_example.yml

Example config for document processing:

name: mydocs
mode: docs
start_loc: /path/to/documents/
exclude:
  - "*.csv"

doc_params:
  recursive: true
  file_extensions:
    - .pdf
    - .docx
    - .txt
    - .md

rag_params:
  embedding_model: sentence-transformers/all-mpnet-base-v2
  chunk_size: 512
  chunk_overlap: 50

Chat UI (Local)

After crawling/processing is complete:

export NYRAG_CONFIG=configs/example.yml
export OPENROUTER_API_KEY=your-api-key
export OPENROUTER_MODEL=openai/gpt-5.1

uvicorn nyrag.api:app --host 0.0.0.0 --port 8000

Open http://localhost:8000/chat


Cloud Mode

Deploys to Vespa Cloud for production use.

Web Crawling (Cloud)

export NYRAG_LOCAL=0
export VESPA_CLOUD_TENANT=your-tenant

nyrag --config configs/example.yml

Document Processing (Cloud)

export NYRAG_LOCAL=0
export VESPA_CLOUD_TENANT=your-tenant

nyrag --config configs/doc_example.yml

Chat UI (Cloud)

After crawling/processing is complete:

export NYRAG_CONFIG=configs/example.yml
export VESPA_URL="https://<your-endpoint>.z.vespa-app.cloud"
export OPENROUTER_API_KEY=your-api-key
export OPENROUTER_MODEL=openai/gpt-5.1

uvicorn nyrag.api:app --host 0.0.0.0 --port 8000

Open http://localhost:8000/chat


Configuration Reference

Web Mode Parameters (crawl_params)

Parameter Type Default Description
respect_robots_txt bool true Respect robots.txt rules
aggressive_crawl bool false Faster crawling with more concurrent requests
follow_subdomains bool true Follow links to subdomains
strict_mode bool false Only crawl URLs matching start pattern
user_agent_type str chrome chrome, firefox, safari, mobile, bot
custom_user_agent str None Custom user agent string
allowed_domains list None Explicitly allowed domains

Docs Mode Parameters (doc_params)

Parameter Type Default Description
recursive bool true Process subdirectories
include_hidden bool false Include hidden files
follow_symlinks bool false Follow symbolic links
max_file_size_mb float None Max file size in MB
file_extensions list None Only process these extensions

RAG Parameters (rag_params)

Parameter Type Default Description
embedding_model str sentence-transformers/all-MiniLM-L6-v2 Embedding model
embedding_dim int 384 Embedding dimension
chunk_size int 1024 Chunk size for text splitting
chunk_overlap int 50 Overlap between chunks
distance_metric str angular Distance metric
max_tokens int 8192 Max tokens per document
llm_base_url str None LLM API base URL (OpenAI-compatible)
llm_model str None LLM model name
llm_api_key str None LLM API key

Environment Variables

Deployment Mode

Variable Description
NYRAG_LOCAL 1 for local Docker, 0 for Vespa Cloud

Local Mode

Variable Description
NYRAG_VESPA_DOCKER_IMAGE Docker image (default: vespaengine/vespa:latest)

Cloud Mode

Variable Description
VESPA_CLOUD_TENANT Your Vespa Cloud tenant
VESPA_CLOUD_APPLICATION Application name (optional)
VESPA_CLOUD_INSTANCE Instance name (default: default)
VESPA_CLOUD_API_KEY_PATH Path to API key file
VESPA_CLIENT_CERT Path to mTLS certificate
VESPA_CLIENT_KEY Path to mTLS private key

Chat UI

Variable Description
NYRAG_CONFIG Path to config file
VESPA_URL Vespa endpoint URL (optional for local, required for cloud)
LLM_BASE_URL LLM API base URL (OpenAI-compatible API)
LLM_MODEL LLM model name
LLM_API_KEY LLM API key
OPENROUTER_API_KEY OpenRouter API key (alternative to LLM_API_KEY)
OPENROUTER_MODEL OpenRouter model (alternative to LLM_MODEL)
OPENROUTER_BASE_URL OpenRouter base URL (alternative to LLM_BASE_URL)

Using Local Models

NyRAG supports running LLMs locally using any OpenAI-compatible server. Here are some popular options:

Ollama

  1. Install Ollama: https://ollama.ai

  2. Pull a model:

    ollama pull llama3.2
    
  3. Configure NyRAG (option 1: environment variables):

    export LLM_BASE_URL=http://localhost:11434/v1
    export LLM_MODEL=llama3.2
    export LLM_API_KEY=dummy  # Any value works
    
  4. Or configure in YAML:

    rag_params:
      llm_base_url: http://localhost:11434/v1
      llm_model: llama3.2
      llm_api_key: dummy
    
  5. Start the chat UI:

    nyrag-api
    

LM Studio

  1. Install LM Studio: https://lmstudio.ai

  2. Load a model and start the server (default port: 1234)

  3. Configure NyRAG:

    export LLM_BASE_URL=http://localhost:1234/v1
    export LLM_MODEL=local-model  # Model name from LM Studio
    export LLM_API_KEY=dummy
    

vLLM

  1. Install and run vLLM:

    pip install vllm
    python -m vllm.entrypoints.openai.api_server \
      --model meta-llama/Llama-3.2-3B-Instruct \
      --port 8000
    
  2. Configure NyRAG:

    export LLM_BASE_URL=http://localhost:8000/v1
    export LLM_MODEL=meta-llama/Llama-3.2-3B-Instruct
    export LLM_API_KEY=dummy
    

OpenRouter (Cloud)

For access to 100+ models without local setup:

export LLM_BASE_URL=https://openrouter.ai/api/v1
export LLM_MODEL=anthropic/claude-3.5-sonnet
export LLM_API_KEY=your-openrouter-key

Or use the legacy environment variables:

export OPENROUTER_API_KEY=your-key
export OPENROUTER_MODEL=anthropic/claude-3.5-sonnet

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nyrag-0.0.8.tar.gz (50.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nyrag-0.0.8-py3-none-any.whl (52.6 kB view details)

Uploaded Python 3

File details

Details for the file nyrag-0.0.8.tar.gz.

File metadata

  • Download URL: nyrag-0.0.8.tar.gz
  • Upload date:
  • Size: 50.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for nyrag-0.0.8.tar.gz
Algorithm Hash digest
SHA256 8e6dfc6d27dd94d338c9f19a85ee27c13bcd11e6d4048a6d020091c6e8f540d7
MD5 9706a63b59c91c745aa41f370a9fdb1d
BLAKE2b-256 9faf58641a111a0ee9feba03207115d4a54b0b10e47ab100120d436383aebec3

See more details on using hashes here.

File details

Details for the file nyrag-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: nyrag-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 52.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for nyrag-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 e23f304c7d5efaf7b276d840d8dbb7240c74657971e2e087fb837ddaa5cdf3f1
MD5 5a1340af2aafcab72a27f437621704a9
BLAKE2b-256 69f3f50b7753656ce7291810bd74d93ead079151e1984ed1a8e31ce171b47749

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page