Skip to main content

nyrag

Project description

nyrag

A simple tool for building RAG applications by crawling websites or processing documents, then deploying to Vespa for semantic search with an integrated chat UI.

Installation

pip install nyrag

For development:

git clone https://github.com/abhishekkrthakur/nyrag.git
cd nyrag
pip install -e .

Quick Start

nyrag operates in two deployment modes (Local or Cloud) and two data modes (Web or Docs):

Deployment Data Mode Description
Local Web Crawl websites → Local Vespa Docker
Local Docs Process documents → Local Vespa Docker
Cloud Web Crawl websites → Vespa Cloud
Cloud Docs Process documents → Vespa Cloud

Local Mode

Runs Vespa in a local Docker container. Great for development and testing.

Web Crawling (Local)

export NYRAG_LOCAL=1

nyrag --config configs/example.yml

Example config for web crawling:

name: mywebsite
mode: web
start_loc: https://example.com/
exclude:
  - https://example.com/admin/*
  - https://example.com/private/*

crawl_params:
  respect_robots_txt: true
  follow_subdomains: true
  user_agent_type: chrome

rag_params:
  embedding_model: sentence-transformers/all-MiniLM-L6-v2
  chunk_size: 1024
  chunk_overlap: 50

Document Processing (Local)

export NYRAG_LOCAL=1

nyrag --config configs/doc_example.yml

Example config for document processing:

name: mydocs
mode: docs
start_loc: /path/to/documents/
exclude:
  - "*.csv"

doc_params:
  recursive: true
  file_extensions:
    - .pdf
    - .docx
    - .txt
    - .md

rag_params:
  embedding_model: sentence-transformers/all-mpnet-base-v2
  chunk_size: 512
  chunk_overlap: 50

Chat UI (Local)

After crawling/processing is complete:

export NYRAG_CONFIG=configs/example.yml
export OPENROUTER_API_KEY=your-api-key
export OPENROUTER_MODEL=openai/gpt-5.1

uvicorn nyrag.api:app --host 0.0.0.0 --port 8000

Open http://localhost:8000/chat


Cloud Mode

Deploys to Vespa Cloud for production use.

Web Crawling (Cloud)

export NYRAG_LOCAL=0
export VESPA_CLOUD_TENANT=your-tenant

nyrag --config configs/example.yml

Document Processing (Cloud)

export NYRAG_LOCAL=0
export VESPA_CLOUD_TENANT=your-tenant

nyrag --config configs/doc_example.yml

Chat UI (Cloud)

After crawling/processing is complete:

export NYRAG_CONFIG=configs/example.yml
export VESPA_URL="https://<your-endpoint>.z.vespa-app.cloud"
export OPENROUTER_API_KEY=your-api-key
export OPENROUTER_MODEL=openai/gpt-5.1

uvicorn nyrag.api:app --host 0.0.0.0 --port 8000

Open http://localhost:8000/chat


Configuration Reference

Web Mode Parameters (crawl_params)

Parameter Type Default Description
respect_robots_txt bool true Respect robots.txt rules
aggressive_crawl bool false Faster crawling with more concurrent requests
follow_subdomains bool true Follow links to subdomains
strict_mode bool false Only crawl URLs matching start pattern
user_agent_type str chrome chrome, firefox, safari, mobile, bot
custom_user_agent str None Custom user agent string
allowed_domains list None Explicitly allowed domains

Docs Mode Parameters (doc_params)

Parameter Type Default Description
recursive bool true Process subdirectories
include_hidden bool false Include hidden files
follow_symlinks bool false Follow symbolic links
max_file_size_mb float None Max file size in MB
file_extensions list None Only process these extensions

RAG Parameters (rag_params)

Parameter Type Default Description
embedding_model str sentence-transformers/all-MiniLM-L6-v2 Embedding model
embedding_dim int 384 Embedding dimension
chunk_size int 1024 Chunk size for text splitting
chunk_overlap int 50 Overlap between chunks
distance_metric str angular Distance metric
max_tokens int 8192 Max tokens per document

Environment Variables

Deployment Mode

Variable Description
NYRAG_LOCAL 1 for local Docker, 0 for Vespa Cloud

Local Mode

Variable Description
NYRAG_VESPA_DOCKER_IMAGE Docker image (default: vespaengine/vespa:latest)

Cloud Mode

Variable Description
VESPA_CLOUD_TENANT Your Vespa Cloud tenant
VESPA_CLOUD_APPLICATION Application name (optional)
VESPA_CLOUD_INSTANCE Instance name (default: default)
VESPA_CLOUD_API_KEY_PATH Path to API key file
VESPA_CLIENT_CERT Path to mTLS certificate
VESPA_CLIENT_KEY Path to mTLS private key

Chat UI

Variable Description
NYRAG_CONFIG Path to config file
VESPA_URL Vespa endpoint URL (optional for local, required for cloud)
OPENROUTER_API_KEY OpenRouter API key for LLM
OPENROUTER_MODEL LLM model (e.g., openai/gpt-4o)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nyrag-0.0.7.tar.gz (41.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nyrag-0.0.7-py3-none-any.whl (42.9 kB view details)

Uploaded Python 3

File details

Details for the file nyrag-0.0.7.tar.gz.

File metadata

  • Download URL: nyrag-0.0.7.tar.gz
  • Upload date:
  • Size: 41.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for nyrag-0.0.7.tar.gz
Algorithm Hash digest
SHA256 7abb94d3e1e4fd374905428ad040e9f71589f76bb6aacab2d1ee0d659bfae447
MD5 65f5ac205f707e3a0f5e2b4d3684b8a5
BLAKE2b-256 bfd7f8671c79f50c81f1c150afec5efeddc46e20d967902e14b42d49b1f8a8e5

See more details on using hashes here.

File details

Details for the file nyrag-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: nyrag-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 42.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for nyrag-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 e1876ea112bfbe3545935dd2215e2b710b2973b62a7be25bd6f8022e9d8f1a84
MD5 256afc0192df1195a6ef93a10b5f5edf
BLAKE2b-256 089c34ebc48e4b710da3644b865a1f15d49b973e8e8aefa8237ac27ed31c8080

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page