A document ingestion and RAG query system with FAISS indexing and OCR support

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

PyRagix

A local-first Retrieval-Augmented Generation (RAG) system built with modern techniques from academic research and production deployments. PyRagix implements query expansion, cross-encoder reranking, hybrid search (semantic + keyword), and semantic chunking to deliver state-of-the-art retrieval quality while maintaining complete data privacy through local-only operation.

Built for both performance and privacy, PyRagix runs entirely on your infrastructure with zero external API dependencies for document processing and search. All AI operations leverage local models via Ollama, ensuring your documents never leave your control.

Looking for a cross-platform .NET solution? See pyragix-net!

Python License

Architecture

PyRagix implements coordinated ingestion and query pipelines that stay in sync through a shared metadata store.

Query Pipeline

flowchart TD
    Q["User query"] --> Validate["Runtime checks<br/>(config, FAISS, BM25, Ollama)"]
    Validate --> Expand{"Query expansion enabled?"}
    Expand -->|Yes| Gen["Generate rewrites via Ollama"]
    Gen --> Variants["Aggregate original + rewrites"]
    Expand -->|No| Variants
    Variants --> Embed["Batch embed variants<br/>(SentenceTransformer)"]
    Embed --> SearchFAISS["FAISS vector search<br/>per variant"]
    SearchFAISS --> Hybrid{"Hybrid search enabled?"}
    Hybrid -->|Yes| BM25["Lookup BM25 keyword scores"]
    BM25 --> Fuse["Dynamic alpha fusion<br/>(semantic + keyword)"]
    Hybrid -->|No| Rerank
    Fuse --> Rerank["Cross-encoder reranking<br/>(top-k)"]
    Rerank --> Answer["Answer generation via Ollama"]
    Answer --> Output["Final answer with cited chunks"]

Ingestion Pipeline

flowchart TD
    Start["ingest_folder CLI"] --> Env["Environment manager<br/>applies runtime settings"]
    Env --> Stale["Detect stale documents<br/>and choose strategy"]
    Stale --> Scan["Scan filesystem + skip rules<br/>(extension filter, SHA256 dedupe)"]
    Scan --> Extract["Extract text<br/>(PyMuPDF, BeautifulSoup, PaddleOCR)"]
    Extract --> Chunk["Semantic chunking<br/>(sentence-aware)"]
    Chunk --> Embed["Embed chunks<br/>(SentenceTransformer)"]
    Embed --> Index{"Existing FAISS index?"}
    Index -->|No| Create["Create index and add vectors"]
    Index -->|Yes| Update["Append vectors"]
    Create --> Persist
    Update --> Persist["Persist metadata to SQLite<br/>and processed_files log"]
    Persist --> Hybrid{"Hybrid search enabled?"}
    Hybrid -->|Yes| BuildBM25["Build/refresh BM25 index"]
    Hybrid -->|No| Done
    BuildBM25 --> Done["Pipeline complete"]

[!NOTE] Query-time hybrid weighting automatically adapts to query length, giving short queries stronger keyword bias and long-form questions more semantic focus.

This architecture delivers 20-30% improved recall through query expansion, 15-25% better precision via reranking, and 30-40% better structured query handling through hybrid search.

Performance Optimizations:

Batch encoding of query variants for reduced embedding overhead
O(1) BM25 document lookup using hash-based indexing
Optimized FAISS nprobe parameter handling
Memory-efficient numpy array operations

Key Features

Modern RAG Techniques

Query Expansion: Generates multiple query variants to capture diverse phrasing and improve recall on ambiguous questions
Cross-Encoder Reranking: Re-scores retrieved chunks using a specialized relevance model for precision
Hybrid Search: Combines semantic similarity (FAISS) with keyword matching (BM25) using dynamic weighting tuned to the query
Semantic Chunking: Respects sentence and paragraph boundaries to preserve context coherence

Privacy-First Architecture

100% Local Operation: All document processing, indexing, and search happen on your infrastructure
No External APIs: Zero dependencies on cloud services for core functionality
Data Sovereignty: Your documents never leave your network
Configurable Models: Choose and run any Ollama-compatible LLM locally

Infrastructure

Scalable Indexing: FAISS IVF indexing with automatic optimization for dataset size
Memory Efficient: Adaptive batch processing and intelligent memory management
Resumable Ingestion: Incremental updates without reprocessing entire corpus
Cross-Platform: Runs identically on Windows, Linux, and macOS
Modern Web UI: Professional TypeScript-based interface with REST API (auto-compiled via dev.sh)

Document Processing

Multi-Format Support: PDF, HTML, HTM, and images (JPEG, PNG, TIFF, BMP, WEBP)
Advanced OCR: PaddleOCR with adaptive DPI and tiled processing for large pages
Metadata Tracking: SQLite database for chunk provenance and search filtering
Batch Operations: Parallel processing with automatic retry on memory constraints

Type Safety & Architecture

PyRagix is built with extreme type safety as a foundational principle. The entire codebase passes pyright --strict with zero errors:

Strict Type Checking

Zero # type: ignore comments: All types are properly defined through stubs or Protocols
Modern Python 3.13+ syntax: Uses X | None, list[T], dict[K, V] throughout
Ultra-strict pyright configuration: 40+ type checking rules set to "error" level
No implicit Any types: Every variable and function has explicit type annotations

Protocol-Based Architecture

PyRagix uses Python's Protocol for duck-typed interfaces with third-party libraries:

# Example: PDF library interface (ingestion/models.py)
class PDFPage(Protocol):
    """Protocol for PyMuPDF Page objects."""
    def get_text(self, option: str) -> str: ...
    def get_pixmap(self, dpi: int) -> PDFPixmap: ...

Benefits:

✅ Type-safe integration with C++ libraries (FAISS, PyMuPDF)
✅ Easy mocking in tests without inheritance
✅ Clear documentation of external API contracts
✅ Structural typing instead of nominal typing

Custom Type Stubs

The typings/ directory contains comprehensive type stubs for libraries with incomplete typing:

faiss: FAISS C++ bindings with GPU detection
fitz (PyMuPDF): PDF manipulation
paddleocr: OCR engine
rank_bm25: BM25 algorithm
sqlite_utils: Database utilities
And more...

Pydantic v2 Data Validation

All configuration and data models use Pydantic v2 with strict validation:

# Example: Immutable metadata with validation
class MetadataDict(BaseModel):
    model_config = ConfigDict(frozen=True, validate_assignment=True)

    source: str
    chunk_index: int = Field(ge=0)  # Must be >= 0
    total_chunks: int
    file_type: str

Key Models:

MetadataDict: Frozen, validated chunk metadata
RAGConfig: Query pipeline configuration with type coercion
ProcessingConfig: Ingestion settings dataclass
SearchResult, DocumentChunk: Query result types

Modular Package Design

Clean separation of concerns with explicit module boundaries:

# Ingestion pipeline: ingestion/
from ingestion import (
    FAISSManager,      # Vector index management
    FileScanner,       # Document discovery
    MetadataStore,     # SQLite operations
    TextProcessor,     # Extraction pipeline
)

# Query pipeline: rag/
from rag import (
    RAGConfig,         # Configuration
    load_models,       # Model initialization
    hybrid_search,     # Multi-stage retrieval
    generate_answer,   # LLM generation
)

# Utilities: utils/
from utils import (
    BM25Index,         # Keyword search
    QueryExpander,     # Query rewriting
    Reranker,          # Cross-encoder scoring
)

This architecture ensures maintainability, testability, and type safety across 3000+ lines of strictly-typed Python code.

Quick Start

Prerequisites

Python 3.13+ with uv package manager (recommended) or pip
Ollama for local LLM inference - download from ollama.com
8GB+ RAM (16GB+ recommended for optimal performance)

[!TIP] Use uv sync --frozen in CI or shared environments to guarantee the resolved versions match the committed uv.lock.

Installation

# Clone repository
git clone https://github.com/psarno/PyRagix.git
cd PyRagix

# Install dependencies with uv (recommended - fast and reliable)
uv sync

# Or with pip (installs from pyproject.toml)
pip install -e .

# Pull Ollama model for local LLM
ollama pull qwen2.5:7b
ollama serve

Basic Usage

# Ingest documents (builds FAISS + BM25 indexes)
uv run python ingest_folder.py --fresh ./docs
# Append --verbose to stream per-file timings instead of the default spinner-driven progress UI.

# The CLI now validates that FAISS/BM25 artifacts exist before querying.

# Start web interface (compiles TypeScript frontend and starts server)
./dev.sh
# Open http://localhost:8000/web/

# Or use console interface
uv run python query_rag.py --verbose
# Use --no-spinner if your terminal does not support carriage returns.

Configuration

PyRagix uses settings.toml for all configuration. The file is auto-generated with optimal defaults for your system on first run. A template is available at settings.example.toml.

Enable modern RAG techniques:

[query_expansion]
ENABLE_QUERY_EXPANSION = true
QUERY_EXPANSION_COUNT = 3

[reranking]
ENABLE_RERANKING = true
RERANKER_MODEL = "cross-encoder/ms-marco-MiniLM-L-6-v2"
RERANK_TOP_K = 20

[hybrid_search]
ENABLE_HYBRID_SEARCH = true
HYBRID_ALPHA = 0.7

[semantic_chunking]
ENABLE_SEMANTIC_CHUNKING = true
SEMANTIC_CHUNK_MAX_SIZE = 1600
SEMANTIC_CHUNK_OVERLAP = 200

Query Expansion: Set ENABLE_QUERY_EXPANSION: true to generate multiple query variants. This improves recall by 20-30% on paraphrased or ambiguous queries. Adjust QUERY_EXPANSION_COUNT (default: 3) to control the number of variants.

Reranking: Enable ENABLE_RERANKING: true to re-score retrieved chunks with a cross-encoder model. This improves precision by 15-25% by filtering out keyword-matched but semantically irrelevant chunks. RERANK_TOP_K controls the candidate pool size (default: 20).

Hybrid Search: Set ENABLE_HYBRID_SEARCH: true to combine FAISS semantic search with BM25 keyword matching. This dramatically improves structured queries (names, dates, IDs) by 30-40%. HYBRID_ALPHA provides the baseline fusion weight (0.7 = 70% semantic, 30% keyword), and PyRagix dynamically adjusts this balance per query length for better recall.

Semantic Chunking: Enable ENABLE_SEMANTIC_CHUNKING: true to chunk documents at sentence boundaries instead of fixed character counts. This preserves context coherence and improves answer quality.

Performance Impact: Enabling all features adds approximately 300-700ms per query (query expansion + hybrid fusion + reranking), which is negligible compared to LLM generation time. Features can be enabled incrementally for A/B testing.

Hardware Tuning

For memory-constrained systems (8-12GB RAM):

[embeddings]
BATCH_SIZE = 8

[threading]
TORCH_NUM_THREADS = 4

[pdf]
BASE_DPI = 100

For high-performance systems (32GB+ RAM):

[embeddings]
BATCH_SIZE = 32

[threading]
TORCH_NUM_THREADS = 12

[pdf]
BASE_DPI = 200

[faiss]
NLIST = 2048
NPROBE = 32

LLM Configuration

Customize Ollama model and generation parameters:

[llm]
OLLAMA_MODEL = "qwen2.5:7b"
TEMPERATURE = 0.1
TOP_P = 0.9
MAX_TOKENS = 500
REQUEST_TIMEOUT = 180

[retrieval]
DEFAULT_TOP_K = 7

Advanced Usage

Incremental Ingestion

Add new documents without reprocessing:

# Initial ingestion
uv run python ingest_folder.py ./docs

# Later: add more documents (automatically skips processed files)
uv run python ingest_folder.py ./more_docs

Custom Document Filters

Skip specific file types or patterns:

[pdf]
SKIP_FILES = ["*.tmp", "backup_*", "archive/*"]

FAISS Index Optimization

PyRagix ships with IVF (Inverted File) indexing enabled in the default settings for fast search on large corpora, while automatically falling back to flat indexing when the corpus is small or IVF training fails:

[faiss]
INDEX_TYPE = "ivf"
NLIST = 1024
NPROBE = 16

NLIST: Number of clusters (default: 1024). Increase for larger datasets (10k+ chunks).
NPROBE: Search clusters (default: 16). Higher values improve recall at the cost of speed.

The system automatically falls back to flat indexing for small collections (< 2048 chunks), then upgrades to IVF as your corpus grows.

GPU Acceleration

PyRagix includes GPU detection with automatic CPU fallback:

[gpu]
GPU_ENABLED = true
GPU_DEVICE = 0
GPU_MEMORY_FRACTION = 0.8

Note: GPU FAISS requires compatible hardware and special installation. The system works perfectly with CPU-only FAISS (default).

Project Structure

PyRagix uses a modular architecture with clear separation of concerns:

PyRagix/
├── ingest_folder.py         # Document ingestion CLI wrapper
├── query_rag.py             # Console query CLI with spinner/Ollama checks
├── dev.sh                   # Frontend build + FastAPI server launcher
├── config.py                # Pydantic-backed runtime configuration
├── types_models.py          # Shared Pydantic models (MetadataDict, RAGConfig, etc.)
├── CHANGELOG.md             # Release notes
│
├── ingestion/               # Document processing pipeline
│   ├── cli.py               # CLI argument parsing and path safety guards
│   ├── environment.py       # Environment tuning and shared context creation
│   ├── faiss_manager.py     # FAISS index creation/persistence helpers
│   ├── file_scanner.py      # Extraction, chunking, embedding, persistence
│   ├── progress.py          # Spinner-based progress reporting
│   ├── pipeline.py          # Top-level orchestration + BM25 rebuild
│   └── ...                  # metadata_store.py, text_processing.py, stale_cleaner.py, etc.
│
├── rag/                     # Query-time retrieval pipeline
│   ├── configuration.py     # Runtime defaults + validation
│   ├── loader.py            # Load FAISS/metadata/embedder
│   ├── llm.py               # Ollama client with retry/backoff
│   ├── retrieval.py         # Hybrid retrieval, dynamic alpha, reranking
│   └── __init__.py          # Lazy re-exports to avoid heavy imports
│
├── utils/                   # Shared utilities
│   ├── bm25_index.py        # BM25 persistence and search helpers
│   ├── faiss_importer.py    # Centralised FAISS import/warning suppression
│   ├── faiss_types.py       # Protocols for FAISS type safety
│   ├── ollama_status.py     # Ollama health probes and caching
│   ├── query_expander.py    # Multi-query expansion via Ollama
│   ├── reranker.py          # Cross-encoder reranker wrapper
│   └── spinner.py           # Lightweight CLI spinner
│
├── web/                     # Web UI + API server
│   ├── server.py            # FastAPI server with health + visualization endpoints
│   ├── visualization_utils.py # Embedding visualization helpers
│   └── ...                  # TypeScript sources, static assets, dev scripts
│
├── tests/                   # Pytest suite
│   ├── test_rag_configuration.py   # Runtime validation coverage
│   ├── test_retrieval_dynamic_alpha.py # Dynamic hybrid alpha tests
│   └── ...                  # Ingestion/environment regression tests
└── typings/                 # Third-party type stubs (keep pyright --strict green)
    └── ...

Architecture Highlights:

Modular Packages: Clear separation between ingestion, query, and utility logic
Protocol-Based Typing: Uses Python Protocols for duck-typed interfaces (PDF libraries, OCR)
Type Safety: All code passes pyright --strict with comprehensive type stubs
Pydantic v2: Data validation and serialization throughout
Test Coverage: Pytest suite with fixtures for all major components

Dependencies

PyRagix uses modern Python 3.13+ with strict type safety. All dependencies managed via pyproject.toml:

Core ML/AI:

torch (2.9+): Embedding model backend with CUDA support
sentence-transformers: Dense embeddings and cross-encoder reranking
transformers: HuggingFace model infrastructure
faiss-cpu (1.12+): High-performance vector search with IVF indexing
rank-bm25: BM25 keyword search for hybrid retrieval

Document Processing:

paddleocr: OCR for images and scanned documents
paddlepaddle (3.2+): PaddleOCR backend
pymupdf: PDF text extraction
beautifulsoup4: HTML parsing
langchain-text-splitters: Semantic chunking with sentence boundaries
pillow: Image processing

Data & Infrastructure:

fastapi: Web API and UI server
uvicorn: ASGI server with WebSockets
sqlite-utils: Metadata database management
pydantic: Data validation and settings management
numpy: Numerical operations

Utilities:

scikit-learn: ML utilities (used by reranker)
umap-learn: Dimensionality reduction (visualization)
psutil: System resource monitoring
requests: HTTP client
tenacity: Resilient retry/backoff decorators for Ollama and ingestion pipelines

Development Tools:

pyright: Strict static type checking
ruff: Fast Python linter and formatter
pytest: Testing framework

Installation:

# Recommended: Use uv for fast, reliable dependency management
uv sync

# Alternative: Traditional pip installation
pip install -e .

# Development dependencies
uv sync --dev

All dependencies are pinned to minimum versions. PyRagix requires Python 3.13+ and makes no backwards compatibility compromises.

Why PyRagix?

Privacy: Unlike cloud-based RAG services, PyRagix processes everything locally. Your documents, queries, and generated answers never leave your infrastructure.

Performance: Modern RAG techniques (query expansion, reranking, hybrid search) deliver enterprise-grade retrieval quality previously only available through expensive cloud APIs.

Flexibility: Every component is configurable and swappable. Use your preferred LLM, embedding model, or retrieval strategy.

Transparency: Open-source Python codebase with clear documentation. Understand exactly how your RAG system works.

Cost: Zero runtime costs beyond your hardware. No per-query API fees, no subscription tiers.

Control: Version your models, control your deployment, audit your data flows. Perfect for regulated industries.

Use Cases

Enterprise Knowledge Management: Index internal documentation, wikis, and knowledge bases with complete data privacy
Legal Document Analysis: Process contracts, case files, and legal research with confidentiality
Medical Research: Search clinical notes, research papers, and patient data (HIPAA-compliant when properly deployed)
Software Documentation: Build internal developer knowledge bases from code, docs, and tickets
Personal Knowledge Management: Create private search engines over personal notes, books, and research

CI/CD

PyRagix includes GitHub Actions workflows for automated quality assurance:

CI Workflow: Runs on every push and pull request
- Type checking with pyright --strict
- Linting and formatting with ruff
- Full test suite with pytest
- Ensures Python 3.13+ compatibility
Publish Workflow: Automated package publishing (when configured)

All code must pass strict type checking and tests before merging.

Contributing

Contributions are welcome.

Development Setup:

git clone https://github.com/psarno/PyRagix.git
cd PyRagix
uv sync

Code Quality Standards:

PyRagix maintains strict type safety as a core principle. All code must pass pyright --strict with zero type errors:

Type Safety (Non-Negotiable):

✅ All code passes pyright --strict (zero errors, minimal warnings)
✅ Modern Python 3.13+ syntax: X | None, list[T], dict[K, V] (not Optional, List, Dict)
✅ Pydantic v2 for all data models with validation
✅ Protocol-based typing for duck-typed interfaces (PDF libraries, OCR)
✅ Comprehensive type stubs in typings/ for third-party libraries
❌ NO # type: ignore comments - use proper type stubs or cast() instead
❌ NO Any types except for legitimate sentinel values and validators

Code Structure:

Modular packages with clear separation of concerns
Protocol definitions in ingestion/models.py for external library interfaces
Pydantic models for all data validation and serialization
Pytest tests with fixtures for new features

Development Workflow:

# Type check (must pass before committing)
uv run pyright

# Run tests
uv run pytest

# Lint and format
uv run ruff check .
uv run ruff format .

Contributing Guidelines:

Update type stubs if adding new third-party library features
Add docstrings to Protocol definitions explaining their purpose
Write tests for new functionality using pytest fixtures from tests/conftest.py
Follow existing patterns: see ingestion/ and rag/ packages for examples

License

MIT License - see LICENSE for details.

Acknowledgements

PyRagix builds on the shoulders of giants:

FAISS (Meta AI Research)
Sentence Transformers (UKP Lab)
Ollama (Ollama Team)
PaddleOCR (PaddlePaddle)
LangChain (LangChain AI)

Built with privacy, performance, and pragmatism in mind.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

psarno

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.4.1

Nov 5, 2025

0.4.0

Oct 28, 2025

0.3.1

Sep 2, 2025

0.3.0

Sep 2, 2025

0.2.0

Aug 30, 2025

0.1.1

Aug 29, 2025

0.1.0

Aug 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyragix-0.4.1.tar.gz (36.3 kB view details)

Uploaded Nov 5, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyragix-0.4.1-py3-none-any.whl (23.4 kB view details)

Uploaded Nov 5, 2025 Python 3

File details

Details for the file pyragix-0.4.1.tar.gz.

File metadata

Download URL: pyragix-0.4.1.tar.gz
Upload date: Nov 5, 2025
Size: 36.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyragix-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`517077a027cbbf51b138eed35919707de0dbc29aae1ac6356561289e736355f1`
MD5	`419d019fe3cd2b8e0d04c9b273af064a`
BLAKE2b-256	`62ddb699efd378ec47743b92651850ba73c05821b39f392740ed5e378a2f4196`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyragix-0.4.1.tar.gz:

Publisher: publish.yml on psarno/PyRagix

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pyragix-0.4.1.tar.gz
- Subject digest: 517077a027cbbf51b138eed35919707de0dbc29aae1ac6356561289e736355f1
- Sigstore transparency entry: 672857622
- Sigstore integration time: Nov 5, 2025
Source repository:
- Permalink: psarno/PyRagix@cebe226f4a163e10607604174beaa2fce3cfa49b
- Branch / Tag: refs/tags/v0.4.1
- Owner: https://github.com/psarno
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@cebe226f4a163e10607604174beaa2fce3cfa49b
- Trigger Event: push

File details

Details for the file pyragix-0.4.1-py3-none-any.whl.

File metadata

Download URL: pyragix-0.4.1-py3-none-any.whl
Upload date: Nov 5, 2025
Size: 23.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyragix-0.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0aa07f3ad079852dfec770eaa71c7e64b1b96be7bebdae3fed7a4e6d3d904bae`
MD5	`9006ff48701e388b5db4b0adbff412fc`
BLAKE2b-256	`a4ca809001d9205cb575262e22bbb1fb41ff374afa6ee5576d4d1f4ca306cb0b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyragix-0.4.1-py3-none-any.whl:

Publisher: publish.yml on psarno/PyRagix

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pyragix-0.4.1-py3-none-any.whl
- Subject digest: 0aa07f3ad079852dfec770eaa71c7e64b1b96be7bebdae3fed7a4e6d3d904bae
- Sigstore transparency entry: 672857624
- Sigstore integration time: Nov 5, 2025
Source repository:
- Permalink: psarno/PyRagix@cebe226f4a163e10607604174beaa2fce3cfa49b
- Branch / Tag: refs/tags/v0.4.1
- Owner: https://github.com/psarno
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@cebe226f4a163e10607604174beaa2fce3cfa49b
- Trigger Event: push

pyragix 0.4.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

PyRagix

Architecture

Key Features

Modern RAG Techniques

Privacy-First Architecture

Infrastructure

Document Processing

Type Safety & Architecture

Strict Type Checking

Protocol-Based Architecture

Custom Type Stubs

Pydantic v2 Data Validation

Modular Package Design

Quick Start

Prerequisites

Installation

Basic Usage

Configuration

Hardware Tuning

LLM Configuration

Advanced Usage

Incremental Ingestion

Custom Document Filters

FAISS Index Optimization

GPU Acceleration

Project Structure

Dependencies

Why PyRagix?

Use Cases

CI/CD

Contributing

License

Acknowledgements

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance