Skip to main content

nyrag

Project description

nyrag

A simple tool for building RAG applications by crawling websites or processing documents, then deploying to Vespa for semantic search with an integrated chat UI.

Installation

pip install nyrag

For development:

git clone https://github.com/abhishekkrthakur/nyrag.git
cd nyrag
pip install -e .

Quick Start

nyrag operates in two deployment modes (Local or Cloud) and two data modes (Web or Docs):

Deployment Data Mode Description
Local Web Crawl websites → Local Vespa Docker
Local Docs Process documents → Local Vespa Docker
Cloud Web Crawl websites → Vespa Cloud
Cloud Docs Process documents → Vespa Cloud

Local Mode

Runs Vespa in a local Docker container. Great for development and testing.

Web Crawling (Local)

export NYRAG_LOCAL=1

nyrag --config configs/example.yml

Example config for web crawling:

name: mywebsite
mode: web
start_loc: https://example.com/
exclude:
  - https://example.com/admin/*
  - https://example.com/private/*

crawl_params:
  respect_robots_txt: true
  follow_subdomains: true
  user_agent_type: chrome

rag_params:
  embedding_model: sentence-transformers/all-MiniLM-L6-v2
  chunk_size: 1024
  chunk_overlap: 50

Document Processing (Local)

export NYRAG_LOCAL=1

nyrag --config configs/doc_example.yml

Example config for document processing:

name: mydocs
mode: docs
start_loc: /path/to/documents/
exclude:
  - "*.csv"

doc_params:
  recursive: true
  file_extensions:
    - .pdf
    - .docx
    - .txt
    - .md

rag_params:
  embedding_model: sentence-transformers/all-mpnet-base-v2
  chunk_size: 512
  chunk_overlap: 50

Chat UI (Local)

After crawling/processing is complete:

export NYRAG_CONFIG=configs/example.yml
export OPENROUTER_API_KEY=your-api-key
export OPENROUTER_MODEL=openai/gpt-5.1

uvicorn nyrag.api:app --host 0.0.0.0 --port 8000

Open http://localhost:8000/chat


Cloud Mode

Deploys to Vespa Cloud for production use.

Web Crawling (Cloud)

export NYRAG_LOCAL=0
export VESPA_CLOUD_TENANT=your-tenant

nyrag --config configs/example.yml

Document Processing (Cloud)

export NYRAG_LOCAL=0
export VESPA_CLOUD_TENANT=your-tenant

nyrag --config configs/doc_example.yml

Chat UI (Cloud)

After crawling/processing is complete:

export NYRAG_CONFIG=configs/example.yml
export VESPA_URL="https://<your-endpoint>.z.vespa-app.cloud"
export OPENROUTER_API_KEY=your-api-key
export OPENROUTER_MODEL=openai/gpt-5.1

uvicorn nyrag.api:app --host 0.0.0.0 --port 8000

Open http://localhost:8000/chat


Configuration Reference

Web Mode Parameters (crawl_params)

Parameter Type Default Description
respect_robots_txt bool true Respect robots.txt rules
aggressive_crawl bool false Faster crawling with more concurrent requests
follow_subdomains bool true Follow links to subdomains
strict_mode bool false Only crawl URLs matching start pattern
user_agent_type str chrome chrome, firefox, safari, mobile, bot
custom_user_agent str None Custom user agent string
allowed_domains list None Explicitly allowed domains

Docs Mode Parameters (doc_params)

Parameter Type Default Description
recursive bool true Process subdirectories
include_hidden bool false Include hidden files
follow_symlinks bool false Follow symbolic links
max_file_size_mb float None Max file size in MB
file_extensions list None Only process these extensions

RAG Parameters (rag_params)

Parameter Type Default Description
embedding_model str sentence-transformers/all-MiniLM-L6-v2 Embedding model
embedding_dim int 384 Embedding dimension
chunk_size int 1024 Chunk size for text splitting
chunk_overlap int 50 Overlap between chunks
distance_metric str angular Distance metric
max_tokens int 8192 Max tokens per document

Environment Variables

Deployment Mode

Variable Description
NYRAG_LOCAL 1 for local Docker, 0 for Vespa Cloud

Local Mode

Variable Description
NYRAG_VESPA_DOCKER_IMAGE Docker image (default: vespaengine/vespa:latest)

Cloud Mode

Variable Description
VESPA_CLOUD_TENANT Your Vespa Cloud tenant
VESPA_CLOUD_APPLICATION Application name (optional)
VESPA_CLOUD_INSTANCE Instance name (default: default)
VESPA_CLOUD_API_KEY_PATH Path to API key file
VESPA_CLIENT_CERT Path to mTLS certificate
VESPA_CLIENT_KEY Path to mTLS private key

Chat UI

Variable Description
NYRAG_CONFIG Path to config file
VESPA_URL Vespa endpoint URL (optional for local, required for cloud)
OPENROUTER_API_KEY OpenRouter API key for LLM
OPENROUTER_MODEL LLM model (e.g., openai/gpt-4o)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nyrag-0.0.3.tar.gz (41.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nyrag-0.0.3-py3-none-any.whl (43.4 kB view details)

Uploaded Python 3

File details

Details for the file nyrag-0.0.3.tar.gz.

File metadata

  • Download URL: nyrag-0.0.3.tar.gz
  • Upload date:
  • Size: 41.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for nyrag-0.0.3.tar.gz
Algorithm Hash digest
SHA256 4473703f7c7a391ba806b756cc294855d5d419a32335882ce18e7f3d94222931
MD5 4df24daf2623b15f59c864f52d15e2d7
BLAKE2b-256 fe892ac5e208727c4ef411889fe02307503d0f9dc43249e7b01d603a1457f06b

See more details on using hashes here.

File details

Details for the file nyrag-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: nyrag-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 43.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for nyrag-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 cc45cb861ee6001331e5e3a9889a94e25c91341a981014c325ac76e0e2c46636
MD5 acd38aa50322ad18b8506f4642d1f49b
BLAKE2b-256 c84e0d10060f138e40055e0c27f9f9c9aed58cfa424205c286000652b5501cbc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page