Skip to main content

FAISS-first RAG CLI for documents and web pages

Project description

QueryNest

QueryNest is a terminal-first, Python-based Retrieval Augmented Generation (RAG) application that allows users to ask natural language questions against external knowledge sources directly from the command line.

It is designed to be developer-friendly, fully self-hostable, and incrementally extensible, with a strong focus on local execution and minimal external dependencies.


Contents


Installation

QueryNest can be used either as a Python CLI (via PyPI) or as a Docker-based CLI.


Option 1: Install via PyPI (Python Package)

QueryNest is distributed as a Python package and can be installed directly from PyPI.

Requirements

  • Python 3.10 or higher
  • pip installed and available in PATH
  • Internet access for first-time dependency installation

Install using pip

pip install querynest-cli==2.0.0

This installs the querynest CLI in your environment.

Verify Installation

querynest --help

If installed correctly, you should see the available CLI commands.

PyPI Package

Official PyPI release: https://pypi.org/project/querynest-cli/2.0.0/


Option 2: Use via Docker (Recommended for Isolated Usage)

QueryNest is also available as a Docker image, allowing you to use the CLI without installing Python or dependencies locally.

Pull the Docker image

docker pull divyansh1552005/querynest:latest

Run QueryNest using Docker

docker run --rm divyansh1552005/querynest --help

Example: Chat with a web page

docker run --rm \
  -e GEMINI_API_KEY=YOUR_API_KEY \
  divyansh1552005/querynest chat --web "https://example.com"

Interactive mode (TTY)

docker run -it --rm \
  -e GEMINI_API_KEY=YOUR_API_KEY \
  divyansh1552005/querynest chat

Docker Security Note

Docker Scout may report OS-level CVEs inherited from the base image. QueryNest does not expose network services and is safe for CLI usage.


CLI Usage

The CLI supports:

  • Chatting with a single web page or a PDF (or folder of PDFs)
  • Automatic session creation and resume
  • Session inspection, search, rename, and deletion
  • Viewing chat history
  • Configuration management (API keys and LLM model selection)

Entry Point

After installation (editable or normal), the CLI is exposed as:

querynest

Internally, this maps to:

querynest.cli.main:main

On startup, the CLI:

  1. Runs the bootstrap process (ensures config and API key exist)
  2. Registers all subcommands
  3. Dispatches to the appropriate command handler

Command Structure

querynest
├── chat        # Core chat functionality
├── config      # Configuration management
├── history     # View chat history
└── sessions    # Session management

Each top-level command is isolated and does not share side effects with others.


1. Chat Command

Purpose

The chat command is the primary entry point for QueryNest. It allows you to start or resume a conversational session with a single knowledge source.

Supported Sources

  • One web page URL
  • One PDF file
  • One folder containing multiple PDFs

Only one source is allowed per session.

Usage

# Start chat with a web page
querynest chat --web "https://example.com"

# Start chat with a single PDF
querynest chat --pdf "/path/to/file.pdf"

# Start chat with multiple PDFs in a folder
querynest chat --pdf "/path/to/folder/"

# Force rebuild the vector index (useful if the source has been updated)
querynest chat --web "https://example.com" --force
querynest chat --pdf "/path/to/file.pdf" --force

Behavior

  • A deterministic session ID is generated from the source
  • If a session already exists for the source, it is resumed automatically
  • If not, a new session is created with rich progress feedback
  • On first creation, the user is prompted for a session name
  • Documents are loaded (with progress bars), split into chunks, embedded, and indexed using FAISS
  • A conversational chat loop is started with real-time streaming responses
  • Model used is shown on startup and determined by your current config (defaults to Gemini)

The --force Flag

querynest chat --web "https://example.com" --force

Forces a complete rebuild of the vector index even if a session already exists for the source. Use this when:

  • The web page content has been updated
  • The PDF has been modified
  • You want a fresh index without resuming the old session

This clears the existing chat history and vector index for that source and starts fresh.

Key Characteristics

  • Interactive REPL-style chat with streaming token-by-token responses
  • Plain text responses with structured formatting (headings, lists) — no markdown symbols
  • Sliding window memory for efficient conversation context
  • Automatic persistence of chat and vectors
  • Rich progress feedback during document processing
  • Multi-model support — Use any LLM through LiteLLM
  • Graceful handling of Ctrl+C and EOF

Exit

Type either of the following to end the chat:

exit
quit

2. Config Command

Purpose

Manage QueryNest configuration — API keys and LLM model selection.

Commands

Set Gemini API Key

querynest config set-gemini-key
  • Prompts securely for a new Gemini API key
  • Used exclusively for embeddings (text-embedding-004)
  • Updates the local configuration file
  • Takes effect immediately

Set LLM Model

querynest config set-llm
  • Shows a curated menu of supported LLM providers and models
  • Also supports entering a custom model string (e.g. groq/llama-3.1-8b-instant)
  • Prompts for the provider API key (skipped if Gemini is selected as LLM)
  • Available options:
1. Gemini 2.5 Flash (default)
2. OpenAI - GPT-4o
3. OpenAI - GPT-4o Mini
4. Anthropic - Claude Sonnet
5. Groq - Llama 3.3 70B
6. Mistral - Large
7. Enter custom model string

Set LLM API Key (without changing model)

querynest config set-llm-key
  • Updates only the API key for the currently configured LLM provider
  • Useful when rotating API keys without switching models
  • If current LLM is Gemini, redirects to set-gemini-key

Show Current Models

querynest config show-models
  • Displays the currently configured embedding model and LLM
  • Example output:
Current Configuration:
  Embeddings : Google Gemini (text-embedding-004)
  LLM        : groq/llama-3.3-70b-versatile

3. History Command

Purpose

View the chat history associated with a session.

Usage

History can be accessed in three mutually exclusive ways:

querynest history show --session-id <SESSION_ID>
querynest history show --web "https://example.com"
querynest history show --pdf "/path/to/file.pdf"

Rules

  • Exactly one of --session-id, --web, or --pdf must be provided
  • History is read-only
  • Messages are shown in chronological order

Output

Each message is displayed with its role:

USER: ...
ASSISTANT: ...

4. Sessions Command

The sessions command provides full control and visibility over stored sessions.

4.1 List Sessions

Basic Listing

querynest sessions list

Displays:

  • Session ID
  • Session name
  • Source type (WEB / PDF)

Full Metadata

querynest sessions list --all

Displays all metadata fields for every session.

Sorting Options

Sorting flags are mutually exclusive:

querynest sessions list --recent   # Sort by last_used_at (descending)
querynest sessions list --oldest   # Sort by created_at (ascending)
querynest sessions list --name     # Sort alphabetically by name

The --all flag may be combined with any single sorting flag.


4.2 Session Information

querynest sessions info <SESSION_ID>

Displays detailed metadata for the specified session.


4.3 Rename Session

querynest sessions rename <SESSION_ID> "New Session Name"
  • Updates only the session metadata
  • Does not affect vectors or chat history

4.4 Delete Session

querynest sessions delete <SESSION_ID>
  • Requires confirmation

  • Permanently removes:

    • Vector index
    • Chat history
    • Metadata

4.5 Search Sessions

Search across stored sessions using metadata fields.

Search by Name (default)

querynest sessions search "query"

Search by Source

querynest sessions search "example.com" --source

Search by Source Type

querynest sessions search "pdf" --type

Search Everywhere

querynest sessions search "http" --all

Search is:

  • Case-insensitive
  • Partial match
  • Metadata-only (no vector loading)

Design Constraints and Guarantees

  • One session corresponds to exactly one source
  • Sessions are resumed automatically
  • Multiple PDFs are supported only via a single folder
  • JavaScript-rendered web pages are not supported
  • Image-only documents are not supported
  • Embedding model is fixed (Google Gemini) — changing it would invalidate existing indexes

Features

  • Terminal-based conversational interface with streaming responses for real-time feedback
  • Multi-model LLM support — Seamlessly switch between Gemini, OpenAI, Claude, Groq, Mistral and 100+ providers via LiteLLM
  • Rich progress bars for PDF loading, chunking, and embedding operations
  • Streaming responses — Responses stream token-by-token in real-time
  • Force re-indexing — Rebuild vector index on demand with --force
  • Query external knowledge sources using natural language
  • Support for multiple data sources:
    • Website URLs (cleaned page content)
    • PDF documents (local files or folders)
  • Retrieval Augmented Generation (RAG) pipeline
  • Conversational context awareness (sliding window memory)
  • Deterministic session creation and automatic session resume
  • Fully local storage of data and configuration
  • Bring-your-own API key model
  • No frontend, browser, or GUI dependency

Supported Data Sources

Websites

  • Accepts a website URL
  • Fetches and cleans main page content
  • Allows semantic querying over web pages

Limitations:

  • JavaScript-rendered pages are NOT supported
  • Image-only pages are NOT supported
  • Login / paywall pages are NOT supported

PDF Documents

  • Accepts a local PDF file path or folder of PDFs
  • Extracts document text with rich progress feedback
  • Enables question answering over document content

Key Features In-Depth

Multi-Model LLM Support

QueryNest supports 100+ LLM models through LiteLLM integration. Embeddings always use Google Gemini (text-embedding-004) for consistency across sessions. The LLM is fully configurable:

# Default: Gemini
querynest chat --pdf "document.pdf"

# Switch to Groq (fast + free tier)
querynest config set-llm   # select option 5

# Switch to OpenAI
querynest config set-llm   # select option 2

# Check what's currently configured
querynest config show-models

Configuration is stored in ~/.querynest/config.json and persists across sessions.

Rich Progress Bars

Visual feedback during document processing:

  • PDF Loading: Shows file processing status with filename and progress
  • Embedding: Live progress bar for vector embedding operations (batched, 50 chunks at a time)

Example output:

Using Embeddings: Google Gemini (text-embedding-004)
Using LLM:        groq/llama-3.3-70b-versatile
Loading documents...
⠸ Embedding chunks... ━━━━━━━━━━━━━━━  45% 45/100 chunks

Streaming Responses

LLM responses stream token-by-token in real-time with clean formatted output:

You: What is machine learning?

Thinking...

Assistant
Machine learning is a subset of artificial intelligence that enables
systems to learn and improve from experience without being explicitly
programmed...

High-Level Architecture

User (Terminal)
     ↓
QueryNest CLI
     ↓
Source Loader (Web / PDF)
     ↓
Text Cleaning & Normalization
     ↓
Text Chunking
     ↓
Embeddings (Google Gemini — fixed)
     ↓
Vector Store (FAISS)
     ↓
Similarity Search
     ↓
LLM (Configurable via LiteLLM)
     ↓
Terminal Response (Streamed)

Technical Stack

Language

  • Python 3.10+

LLM and Embeddings

  • LLM (via LiteLLM): Google Gemini (default), OpenAI, Anthropic, Groq, Mistral, and 100+ more
  • Embeddings: Google Gemini text-embedding-004 (fixed — ensures index consistency)

Vector Storage

  • FAISS (CPU-based, default)
  • Chroma (planned)

Content Extraction

  • Websites: requests, beautifulsoup4, readability-lxml
  • PDFs: pypdf

UI & Progress Feedback

  • Rich: Terminal formatting, live progress bars
  • LiteLLM: Multi-model LLM abstraction layer
  • tqdm: Progress bars for directory PDF loading

Memory Design

QueryNest separates memory into two independent systems:

1. Knowledge Memory (Vector Memory)

  • Stores embeddings of source content
  • Used only for semantic retrieval
  • Implemented using FAISS

2. Conversational Memory (Chat History)

  • Stores user–assistant messages
  • Maintains conversational continuity
  • Sliding window of recent messages (last 4 exchanges)
  • Stored as local JSON files

Local Storage Structure

All persistent data is stored locally on the user's machine.

Base Directory

~/.querynest/

Directory Layout

~/.querynest/
├── config.json
└── sessions/
    └── <session_id>/
        ├── meta.json
        ├── chat.json
        └── vectors.faiss

Configuration (config.json)

{
  "gemini_api_key": "...",
  "llm_model": "groq/llama-3.3-70b-versatile",
  "llm_api_key": "..."
}

API keys are never bundled in distributed artifacts.


Session Management

  • Sessions are deterministically generated using a SHA-256 hash of the input source
  • Same source results in the same session and memory
  • Enables automatic session resume without manual configuration
  • Use --force to bypass resume and rebuild from scratch

Prompt Construction Strategy

Each LLM request includes:

  • Retrieved context chunks from the vector store
  • Recent conversation history (sliding window)
  • Current user query

The LLM is explicitly instructed to:

  • Answer only from the provided context
  • Use plain text formatting (no markdown symbols)
  • Respond with "I don't know" if the answer cannot be inferred

Roadmap

v1 – Terminal-Based Application

  • Basic terminal-based interaction using input/output
  • Support for Website and PDF sources
  • Gemini embeddings and LLM integration
  • FAISS (in-memory)
  • No persistence

v2 – Full CLI Tool

  • Professional command-based CLI interface
  • Local persistence (sessions, chat history, vectors)
  • Improved prompt handling and error management

v3 – Dockerized Self-Hosting

  • Dockerfile and Docker Compose support
  • Volume-mounted persistent storage
  • Same CLI experience inside containers

v4 – Multi-Model Support (Current)

  • LiteLLM integration for 100+ LLM providers
  • Curated model selection menu with custom model support
  • Per-provider API key management
  • Rich progress bars for embedding pipeline
  • Streaming responses
  • Force re-indexing with --force

v5 – Distribution & Introduction Website (Planned)

Distribution formats:

  • Docker Image — primary self-host method
  • pip package
  • Windows executable — .exe via PyInstaller
  • Linux packages — .rpm and .deb
  • AppImage — packaging format research and build pipeline
  • Tarball

Introduction website (TypeScript):

  • Home — project intro, tagline, quick feature highlights
  • About — what QueryNest is, how it works, the tech behind it
  • Download — all distribution options listed clearly (pip, Docker, .exe, .rpm, .deb, AppImage, Tarball)
  • Documentation — full usage guide, CLI reference, configuration options, and examples

Distribution

QueryNest is distributed through multiple formats:

  • Docker image (divyansh1552005/querynest:latest)
  • pip package (querynest-cli on PyPI)
  • Windows executable (.exe via PyInstaller) — planned
  • Linux packages (.rpm, .deb) — planned

Secrets and API keys are never bundled in distributed artifacts.


Security Principles

  • All data stored locally by default
  • No telemetry or external logging
  • No data shared externally except with the configured LLM provider

Engineering Principles

  • Clear separation of concerns
  • Incremental complexity
  • No premature optimization
  • Storage and memory abstractions for easy migration

License

QueryNest is licensed under the GNU General Public License v3 (GPL-3.0).


Status

QueryNest is under active development. APIs, CLI commands, and internal architecture may evolve across releases.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

querynest_cli-2.0.0.tar.gz (70.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

querynest_cli-2.0.0-py3-none-any.whl (58.4 kB view details)

Uploaded Python 3

File details

Details for the file querynest_cli-2.0.0.tar.gz.

File metadata

  • Download URL: querynest_cli-2.0.0.tar.gz
  • Upload date:
  • Size: 70.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for querynest_cli-2.0.0.tar.gz
Algorithm Hash digest
SHA256 c28fad417eb2c1504a7ae654f19deacfb7e4b3cddb67fc54b599202178d64d86
MD5 a194c6f516965d95f9ea2db13112e94e
BLAKE2b-256 537ce5f9044183fe9dc92e4b810da133c21938f1d0a253a6f37c749ae4b6631f

See more details on using hashes here.

File details

Details for the file querynest_cli-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: querynest_cli-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 58.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for querynest_cli-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fa0cebfefd80e4a9cbfaa936f590a2c7978876c0f86b44a41b42afe464ff6fb5
MD5 67a77be942ebad117c90f7e74c8c6c81
BLAKE2b-256 8897cd961ba742c62aae1a325135fc1140b4e5ecf7391ce73c42c876c524f521

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page