Skip to main content

This blueprint serves as a reference solution for a foundational Retrieval Augmented Generation (RAG) pipeline.

Project description

NVIDIA RAG Blueprint

Retrieval-Augmented Generation (RAG) combines the reasoning power of large language models (LLMs) with real-time retrieval from trusted data sources. It grounds AI responses in enterprise knowledge, reducing hallucinations and ensuring accuracy, compliance, and freshness.

Overview

The NVIDIA RAG Blueprint is a reference solution and foundational starting point for building Retrieval-Augmented Generation (RAG) pipelines with NVIDIA NIM microservices. It enables enterprises to deliver natural language question answering grounded in their own data, while meeting governance, latency, and scalability requirements. Designed to be decomposable and configurable, the blueprint integrates GPU-accelerated components with NeMo Retriever models, Multimodal and Vision Language Models, and guardrailing services, to provide an enterprise-ready framework. With a pre-built reference UI, open-source code, and multiple deployment options — including local docker (with and without NVIDIA Hosted endpoints) and Kubernetes — it serves as a flexible starting point that developers can adapt and extend to their specific needs.

Key Features

Data Ingestion
  • Multimodal content extraction - Documents with text, tables, charts, infographics, and audio. For the full list of supported file types, see [NeMo Retriever Extraction Overview](https://docs.nvidia.com/nemo/retriever/latest/extraction/overview/).
  • Custom metadata support
Search and Retrieval
  • Multi-collection searchability
  • Hybrid search with dense and sparse search
  • Reranking to further improve accuracy
  • GPU-accelerated Index creation and search
  • Pluggable vector database
Query Processing
  • Query decomposition
  • Dynamic filter expression creation
Generation and Enrichment
  • Opt-in for Multimodal and Vision Language Model Support in the answer generation pipeline.
  • Document summarization with multiple strategies, flexible page filtering, and real-time progress tracking
  • Improve accuracy with optional reflection
  • Optional programmable guardrails for content safety
Evaluation
  • Evaluation scripts (RAGAS framework)
User Experience
  • Sample user interface
  • Multi-turn conversations
  • Multi-session support
Deployment and Operations
  • Telemetry and observability
  • Decomposable and customizable
  • NIM Operator support
  • Python library mode support
  • OpenAI-compatible APIs

Software Components

The RAG blueprint is built from the following complementary categories of software:

  • NVIDIA NIM microservices – Deliver the core AI functionality. Large-scale inference (e.g. for example, Nemotron LLM models for response generation), retrieval and reranking models, and specialized extractors for text, tables, charts, and graphics. Optional NIMs extend these capabilities with OCR, content safety, topic control, and multimodal embeddings.

  • The integration and orchestration layer – Acts as the glue that binds the system into a complete solution.

This modular design ensures efficient query processing, accurate retrieval of information, and easy customization.

NVIDIA NIM Microservices

Get Started With NVIDIA RAG Blueprint

The recommended way to get started with this python package is refer to this notebook.

Refer to the full documentation to learn about the following:

  • Minimum Requirements
  • Deployment Options
  • Configuration Settings
  • Common Customizations
  • Available Notebooks
  • Troubleshooting
  • Additional Resources

Blog Posts

License

This NVIDIA AI BLUEPRINT is licensed under the Apache License, Version 2.0. Use of the models in this blueprint is governed by the NVIDIA AI Foundation Models Community License.

Terms of Use

This blueprint is governed by the NVIDIA Agreements | Enterprise Software | NVIDIA Software License Agreement and the NVIDIA Agreements | Enterprise Software | Product Specific Terms for AI Product. The models are governed by the NVIDIA Agreements | Enterprise Software | NVIDIA Community Model License and the NVIDIA RAG dataset which is governed by the NVIDIA Asset License Agreement. The following models that are built with Llama are governed by the Llama 3.2 Community License Agreement: nvidia/llama-nemotron-embed-1b-v2 and nvidia/llama-nemotron-rerank-1b-v2 and llama-3.2-nemoretriever-1b-vlm-embed-v1.

Additional Information

The Llama 3.1 Community License Agreement for the llama-3.1-nemotron-nano-vl-8b-v1, llama-3.1-nemoguard-8b-content-safety and llama-3.1-nemoguard-8b-topic-control models. The Llama 3.2 Community License Agreement for the nvidia/llama-nemotron-embed-1b-v2, nvidia/llama-nemotron-rerank-1b-v2 and llama-3.2-nemoretriever-1b-vlm-embed-v1 models. The Llama 3.3 Community License Agreement for the llama-3.3-nemotron-super-49b-v1.5 models. Built with Llama. Apache 2.0 for NVIDIA Ingest and for the nemoretriever-page-elements-v2, nemotron-table-structure-v1, nemotron-graphic-elements-v1, paddleocr and nemoretriever-ocr-v1 models.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nvidia_rag-2.5.0-py3-none-any.whl (255.7 kB view details)

Uploaded Python 3

File details

Details for the file nvidia_rag-2.5.0-py3-none-any.whl.

File metadata

  • Download URL: nvidia_rag-2.5.0-py3-none-any.whl
  • Upload date:
  • Size: 255.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for nvidia_rag-2.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4e7f41396dd1dfcd93503f6a4d0bb3902d99c7ab4f47d293b0773eda8dd86b3d
MD5 0e52aff8376532f4d8297eb89b9bad29
BLAKE2b-256 ca46ef887b1b08add07e0564c9438076f23946280f1a5a3c6c09d7b4c7105150

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page