A modern RAG ingestion pipeline from Nvidia

These details have not been verified by PyPI

Project description

Quick Start for NeMo Retriever Library

NeMo Retriever Library is a retrieval-augmented generation (RAG) ingestion pipeline for documents that can parse text, tables, charts, and infographics. NeMo Retriever Library parses documents, creates embeddings, optionally stores embeddings in LanceDB, and performs recall evaluation.

This quick start guide shows how to run NeMo Retriever Library as a library all within local Python processes without containers. NeMo Retriever Library supports two inference options:

Pull and run Nemotron RAG models from Hugging Face on your local GPU(s).
Make over the network inference calls to build.nvidia.com hosted or locally deployed NeMo Retriever NIM endpoints.

You’ll set up a CUDA 13–compatible environment, install the library and its dependencies, and run GPU‑accelerated ingestion pipelines that convert PDFs, HTML, plain text, audio, or video into vector embeddings stored in LanceDB (on local disk), with Ray‑based scaling and built‑in recall benchmarking.

Prerequisites

Before starting, make sure your system meets the following requirements:

The host is running CUDA 13.x so that libcudart.so.13 is available.
Your GPUs are visible to the system and compatible with CUDA 13.x. If optical character recognition (OCR) fails with a libcudart.so.13 error, install the CUDA 13 runtime for your platform and update LD_LIBRARY_PATH to include the CUDA lib64 directory, then rerun the pipeline.

For example, the following command can be used to update the LD_LIBRARY_PATH value.

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda/lib64

Setup your environment

Complete the following steps to setup your environment. You will create and activate isolated Python and project virtual environments, install the NeMo Retriever Library and its dependencies, and then run the provided ingestion snippets to validate your setup.

Create and activate the NeMo Retriever Library environment

Before installing NeMo Retriever Library, create an isolated Python environment so its dependencies do not conflict with other projects on your system. In this step, you set up a new virtual environment and activate it so that all subsequent installs are scoped to NeMo Retriever Library.

In your terminal, run the following commands from any location.

For local GPU inference (Nemotron models running on your GPU), install with the [local] extra, which includes the model packages, transformers, and GPU tooling:

uv venv retriever --python 3.12
source retriever/bin/activate
uv pip install "nemo-retriever[local]==26.3.0" nv-ingest-client==26.3.0 nv-ingest==26.3.0 nv-ingest-api==26.3.0

For remote NIM inference only (no local GPU required), the base package is sufficient:

uv venv retriever --python 3.12
source retriever/bin/activate
uv pip install nemo-retriever==26.3.0 nv-ingest-client==26.3.0 nv-ingest==26.3.0 nv-ingest-api==26.3.0

This creates a dedicated Python environment and installs the nemo-retriever PyPI package, the canonical distribution for the NeMo Retriever Library.

Override Torch and Torchvision with CUDA 13 builds (local GPU only)

The [local] extra pulls PyTorch from PyPI, which defaults to a CPU build on Linux. Reinstall from the CUDA 13.0 wheel index to match the CUDA runtime required by the Nemotron model packages:

uv pip install torch==2.9.1 torchvision -i https://download.pytorch.org/whl/cu130

Skip this step if you are using remote NIM inference only.

Run the pipeline

The test PDF contains text, tables, charts, and images. Additional test data resides here.

Note: batch is the primary intended run_mode of operation for this library. Other modes are experimental and subject to change or removal.

The examples below use default local GPU inference (no invoke_url specified) and require the [local] extra and the CUDA 13 torch override from the setup steps above. For remote NIM inference without a local GPU, see Run with remote inference.

Ingest a test pdf

from nemo_retriever import create_ingestor
from nemo_retriever.io import to_markdown, to_markdown_by_page
from pathlib import Path

documents = [str(Path("../data/multimodal_test.pdf"))]
ingestor = create_ingestor(run_mode="batch")

# ingestion tasks are chainable and defined lazily
ingestor = (
  ingestor.files(documents)
  .extract(
    # below are the default values, but content types can be controlled
    extract_text=True,
    extract_charts=True,
    extract_tables=True,
    extract_infographics=True
  )
  .embed()
  .vdb_upload()
)

# ingestor.ingest() actually executes the pipeline
# results are returned as a ray dataset and inspectable as chunks
ray_dataset = ingestor.ingest()
chunks = ray_dataset.get_dataset().take_all()

Inspect extracts

You can inspect how recall accuracy optimized text chunks for various content types were extracted into text representations:

# page 1 raw text:
>>> chunks[0]["text"]
'TestingDocument\r\nA sample document with headings and placeholder text\r\nIntroduction\r\nThis is a placeholder document that can be used for any purpose...'

# markdown formatted table from the first page
'| Table | 1 |\n| This | table | describes | some | animals, | and | some | activities | they | might | be | doing | in | specific |\n| locations. |\n| Animal | Activity | Place |\n| Giraffe | Driving | a | car | At | the | beach |\n| Lion | Putting | on | sunscreen | At | the | park |\n| Cat | Jumping | onto | a | laptop | In | a | home | office |\n| Dog | Chasing | a | squirrel | In | the | front | yard |\n| Chart | 1 |'

# a chart from the first page
>>> chunks[2]["text"]
'Chart 1\nThis chart shows some gadgets, and some very fictitious costs.\nGadgets and their cost\n$160.00\n$140.00\n$120.00\n$100.00\nDollars\n$80.00\n$60.00\n$40.00\n$20.00\n$-\nPowerdrill\nBluetooth speaker\nMinifridge\nPremium desk fan\nHammer\nCost'

# markdown formatting for full pages or documents:
# document results are keyed by source filename
>>> to_markdown_by_page(chunks).keys()
dict_keys(['multimodal_test.pdf'])

# results per document are keyed by page number
>>> to_markdown_by_page(chunks)["multimodal_test.pdf"].keys()
dict_keys([1, 2, 3])

>>> to_markdown_by_page(chunks)["multimodal_test.pdf"][1]
'TestingDocument\r\nA sample document with headings and placeholder text\r\nIntroduction\r\nThis is a placeholder document that can be used for any purpose. It contains some \r\nheadings and some placeholder text to fill the space. The text is not important and contains \r\nno real value, but it is useful for testing. Below, we will have some simple tables and charts \r\nthat we can use to confirm Ingest is working as expected.\r\nTable 1\r\nThis table describes some animals, and some activities they might be doing in specific \r\nlocations.\r\nAnimal Activity Place\r\nGira@e Driving a car At the beach\r\nLion Putting on sunscreen At the park\r\nCat Jumping onto a laptop In a home o@ice\r\nDog Chasing a squirrel In the front yard\r\nChart 1\r\nThis chart shows some gadgets, and some very fictitious costs.\n\n| This | table | describes | some | animals, | and | some | activities | they | might | be | doing | in | specific |\n| locations. |\n| Animal | Activity | Place |\n| Giraffe | Driving | a | car | At | the | beach |\n| Lion | Putting | on | sunscreen | At | the | park |\n| Cat | Jumping | onto | a | laptop | In | a | home | office |\n| Dog | Chasing | a | squirrel | In | the | front | yard |\n| Chart | 1 |\n\nChart 1 This chart shows some gadgets, and some very fictitious costs. Gadgets and their cost $160.00 $140.00 $120.00 $100.00 Dollars $80.00 $60.00 $40.00 $20.00 $- Powerdrill Bluetooth speaker Minifridge Premium desk fan Hammer Cost\n\n### Table 1\n\n| This | table | describes | some | animals, | and | some | activities | they | might | be | doing | in | specific |\n| locations. |\n| Animal | Activity | Place |\n| Giraffe | Driving | a | car | At | the | beach |\n| Lion | Putting | on | sunscreen | At | the | park |\n| Cat | Jumping | onto | a | laptop | In | a | home | office |\n| Dog | Chasing | a | squirrel | In | the | front | yard |\n| Chart | 1 |\n\n### Chart 1\n\nChart 1 This chart shows some gadgets, and some very fictitious costs. Gadgets and their cost $160.00 $140.00 $120.00 $100.00 Dollars $80.00 $60.00 $40.00 $20.00 $- Powerdrill Bluetooth speaker Minifridge Premium desk fan Hammer Cost\n\n### Table 2\n\n| This | table | describes | some | animals, | and | some | activities | they | might | be | doing | in | specific |\n| locations. |\n| Animal | Activity | Place |\n| Giraffe | Driving | a | car | At | the | beach |\n| Lion | Putting | on | sunscreen | At | the | park |\n| Cat | Jumping | onto | a | laptop | In | a | home | office |\n| Dog | Chasing | a | squirrel | In | the | front | yard |\n| Chart | 1 |\n\n### Chart 2\n\nChart 1 This chart shows some gadgets, and some very fictitious costs. Gadgets and their cost $160.00 $140.00 $120.00 $100.00 Dollars $80.00 $60.00 $40.00 $20.00 $- Powerdrill Bluetooth speaker Minifridge Premium desk fan Hammer Cost\n\n### Table 3\n\n| This | table | describes | some | animals, | and | some | activities | they | might | be | doing | in | specific |\n| locations. |\n| Animal | Activity | Place |\n| Giraffe | Driving | a | car | At | the | beach |\n| Lion | Putting | on | sunscreen | At | the | park |\n| Cat | Jumping | onto | a | laptop | In | a | home | office |\n| Dog | Chasing | a | squirrel | In | the | front | yard |\n| Chart | 1 |\n\n### Chart 3\n\nChart 1 This chart shows some gadgets, and some very fictitious costs. Gadgets and their cost $160.00 $140.00 $120.00 $100.00 Dollars $80.00 $60.00 $40.00 $20.00 $- Powerdrill Bluetooth speaker Minifridge Premium desk fan Hammer Cost'

# full document markdown also keyed by source filename
>>> to_markdown(chunks).keys()
dict_keys(['multimodal_test.pdf'])

Since the ingestion job automatically populated a lancedb table with all these chunks, you can use queries to retrieve semantically relevant chunks for feeding directly into an LLM:

Run a recall query

from nemo_retriever.retriever import Retriever

retriever = Retriever(
  # default values
  lancedb_uri="lancedb",
  lancedb_table="nv-ingest",
  embedder="nvidia/llama-3.2-nv-embedqa-1b-v2",
  top_k=5,
  reranker=False
)

query = "Given their activities, which animal is responsible for the typos in my documents?"

# you can also submit a list with retriever.queries[...]
hits = retriever.query(query)

# retrieved text from the first page
>>> hits[0]
{'text': 'TestingDocument\r\nA sample document with headings and placeholder text\r\nIntroduction\r\nThis is a placeholder document that can be used for any purpose. It contains some \r\nheadings and some placeholder text to fill the space. The text is not important and contains \r\nno real value, but it is useful for testing. Below, we will have some simple tables and charts \r\nthat we can use to confirm Ingest is working as expected.\r\nTable 1\r\nThis table describes some animals, and some activities they might be doing in specific \r\nlocations.\r\nAnimal Activity Place\r\nGira@e Driving a car At the beach\r\nLion Putting on sunscreen At the park\r\nCat Jumping onto a laptop In a home o@ice\r\nDog Chasing a squirrel In the front yard\r\nChart 1\r\nThis chart shows some gadgets, and some very fictitious costs.', 'metadata': '{"page_number": 1, "pdf_page": "multimodal_test_1", "page_elements_v3_num_detections": 9, "page_elements_v3_counts_by_label": {"table": 1, "chart": 1, "title": 3, "text": 4}, "ocr_table_detections": 1, "ocr_chart_detections": 1, "ocr_infographic_detections": 0}', 'source': '{"source_id": "/home/dev/projects/NeMo-Retriever/data/multimodal_test.pdf"}', 'page_number': 1, '_distance': 1.5822279453277588}

# retrieved text of the table from the first page
>>> hits[1]
{'text': '| Table | 1 |\n| This | table | describes | some | animals, | and | some | activities | they | might | be | doing | in | specific |\n| locations. |\n| Animal | Activity | Place |\n| Giraffe | Driving | a | car | At | the | beach |\n| Lion | Putting | on | sunscreen | At | the | park |\n| Cat | Jumping | onto | a | laptop | In | a | home | office |\n| Dog | Chasing | a | squirrel | In | the | front | yard |\n| Chart | 1 |', 'metadata': '{"page_number": 1, "pdf_page": "multimodal_test_1", "page_elements_v3_num_detections": 9, "page_elements_v3_counts_by_label": {"table": 1, "chart": 1, "title": 3, "text": 4}, "ocr_table_detections": 1, "ocr_chart_detections": 1, "ocr_infographic_detections": 0}', 'source': '{"source_id": "/home/dev/projects/NeMo-Retriever/data/multimodal_test.pdf"}', 'page_number': 1, '_distance': 1.614684820175171}

Generate a query answer using an LLM

The above retrieval results are often feedable directly to an LLM for answer generation.

To do so, first install the openai client and set your build.nvidia.com API key:

uv pip install openai
export NVIDIA_API_KEY=nvapi-...

from openai import OpenAI
import os

client = OpenAI(
  base_url = "https://integrate.api.nvidia.com/v1",
  api_key = os.environ.get("NVIDIA_API_KEY")
)

hit_texts = [hit["text"] for hit in hits]
prompt = f"""
Given the following retrieved documents, answer the question: {query}

Documents:
{hit_texts}
"""

completion = client.chat.completions.create(
  model="nvidia/nemotron-3-super-120b-a12b",
  messages=[{"role":"user","content":prompt}],
  stream=False
)

answer = completion.choices[0].message.content
print(answer)

Answer:

Cat is the animal whose activity (jumping onto a laptop) matches the location of the typos, so the cat is responsible for the typos in the documents.

Ingest other types of content:

For PowerPoint and Docx files, ensure libeoffice is installed by your system's package manager. This is required to make their pages renderable as images for our page-elements content classifier.

For example, with apt-get on Ubuntu:

sudo apt install -y libreoffice

For SVG files, install the optional cairosvg dependency. SVG support is available in the NeMo Retriever Library, but not in the container deployment. cairosvg requires network access to install, so it will not work in air-gapped environments.

uv pip install "nemo-retriever[multimedia]"
# or to install only the SVG dependency:
uv pip install "cairosvg>=2.7.0"

Example usage:

# docx and pptx files
documents = [str(Path(f"../data/*{ext}")) for ext in [".pptx", ".docx"]]
# mixed types of images
images = [str(Path(f"../data/*{ext}")) for ext in [".png", ".jpeg", ".bmp"]]
ingestor = (
  # above file types can be combined into a single job
  ingestor.files(documents + images)
  .extract()
)

Note: the split() task uses a tokenizer to split texts by a max_token length

Render results as markdown

If you want a readable markdown view of extracted results, pass the full in-process result list to nemo_retriever.io.to_markdown. The helper now returns a dict[str, str] keyed by input filename, where each value is the document collapsed into one markdown string without per-page headers, so both single-document and multi-document runs follow the same contract.

PDF text is split at the page level.

HTML and .txt files have no natural page delimiters, so they almost always need to be paired with the .split() task.

# html and text files - include a split task to prevent texts from exceeding the embedder's max sequence length
documents = [str(Path(f"../data/*{ext}")) for ext in [".txt", ".html"]]
ingestor = (
  ingestor.files(documents)
  .extract()
  .split(max_tokens=5) #1024 by default, set low here to demonstrate chunking
)
results = ingestor.ingest()
markdown_docs = to_markdown(results)
print(markdown_docs["multimodal_test.pdf"])

Use to_markdown_by_page(results) when you want a nested dict[str, dict[int, str]] instead, where each filename maps to its per-page markdown strings. For audio and video files, ensure ffmpeg is installed by your system's package manager.

For example, with apt-get on Ubuntu:

sudo apt install -y ffmpeg

ingestor = create_ingestor(run_mode="batch")
ingestor = ingestor.files([str(INPUT_AUDIO)]).extract_audio()

Store extracted images and text

Use .store() to persist extracted images, tables, charts, and text to local disk or object storage (S3, MinIO, GCS via fsspec). Stored URIs are written back to the DataFrame so downstream stages (embed, VDB upload) can reference them. By default, base64 payloads are stripped after writing to reduce memory pressure.

ingestor = (
  ingestor.files(documents)
  .extract()
  .store(
    storage_uri="s3://my-bucket/citation-assets",  # or a local path
    storage_options={"key": "...", "secret": "..."},  # fsspec auth for S3/MinIO
    store_text=True,       # also write .txt files for page text and structured content
    strip_base64=True,     # free image payloads after writing (default)
  )
  .embed()
  .vdb_upload()
)

Explore Different Pipeline Options:

You can use the Nemotron RAG VL Embedder

ingestor = (
  ingestor.files(documents)
  .extract()
  .embed(
    model_name="nvidia/llama-nemotron-embed-vl-1b-v2",
    #works with plain "text"s, "image"s, and "text_image" pairs
    embed_modality="text_image"  
  )
)

You can use a different ingestion pipeline based on Nemotron-Parse combined with the default embedder:

ingestor = ingestor.files(documents).extract(method="nemotron_parse")

Run with remote inference, no local GPU required:

For build.nvidia.com hosted inference, make sure you have NVIDIA_API_KEY set as an environment variable.

ingestor = (
  ingestor.files(documents)
  .extract(
    # for self hosted NIMs, your URLs will depend on your NIM container DNS settings
    page_elements_invoke_url="https://ai.api.nvidia.com/v1/cv/nvidia/nemotron-page-elements-v3",
    graphic_elements_invoke_url="https://ai.api.nvidia.com/v1/cv/nvidia/nemotron-graphic-elements-v1",
    ocr_invoke_url="https://ai.api.nvidia.com/v1/cv/nvidia/nemoretriever-ocr-v1",
    table_structure_invoke_url="https://ai.api.nvidia.com/v1/cv/nvidia/nemotron-table-structure-v1"
  )
  .embed(
    embed_invoke_url="https://integrate.api.nvidia.com/v1/embeddings",
    model_name="nvidia/llama-nemotron-embed-1b-v2",
    embed_modality="text",
  )
  .vdb_upload()
)

Ray cluster setup

NeMo Retriever Library uses Ray Data for distributed ingestion and benchmarking. NeMo Ray run guide

Local Ray cluster with dashboard

To start a Ray cluster with the dashboard on a single machine use the following command.

ray start --head

Open http://127.0.0.1:8265 in your browser for the Ray Dashboard, and run your NeMo Retriever Library pipeline on the same machine with --ray-address auto to attach to this cluster. Connecting to a remote Ray cluster on Kubernetes

Single‑GPU cluster on multi‑GPU nodes

To restrict Ray to a single GPU on a multi‑GPU node use the following command.

CUDA_VISIBLE_DEVICES=0 ray start --head --num-gpus=1

Then run your pipeline as before with --ray-address auto so it connects to this single‑GPU Ray cluster. NeMo Ray run guide

Running multiple NIM instances on multi‑GPU hosts

Resource heuristics (batch mode)

By default, batch mode computes resources using this order:

Auto-detected resources (Ray cluster if connected, otherwise local machine)
Environment variables
Explicit function arguments (highest precedence)

This means defaults are deterministic but easy to override when you need fixed behavior.

Default behavior

cpu_count / gpu_count are detected from Ray (cluster_resources) or local host.
Worker heuristics:
- page_elements_workers = gpu_count * page_elements_per_gpu
- detect_workers = gpu_count * ocr_per_gpu
- embed_workers = gpu_count * embed_per_gpu
- minimum of 1 per stage
Stage GPU defaults:
- If gpu_count >= 2 and concurrent_gpu_stage_count == 3, uses high-overlap values for page-elements/OCR/embed.
- Otherwise uses min(max_gpu_per_stage, gpu_count / concurrent_gpu_stage_count).

Override variables

Variable	Where to set	Meaning
`override_cpu_count`, `override_gpu_count`	function args	Highest-priority CPU/GPU override

Running multiple NIM service instances on multi-GPU hosts

Start two stacks on separate GPUs

# GPU 0 stack
GPU_ID=0 \
PAGE_ELEMENTS_HTTP_PORT=8000 PAGE_ELEMENTS_GRPC_PORT=8001 PAGE_ELEMENTS_METRICS_PORT=8002 \
OCR_HTTP_PORT=8019 OCR_GRPC_PORT=8010 OCR_METRICS_PORT=8011 \
docker compose -p retriever-gpu0 up -d page-elements ocr

# GPU 1 stack
GPU_ID=1 \
PAGE_ELEMENTS_HTTP_PORT=8100 PAGE_ELEMENTS_GRPC_PORT=8101 PAGE_ELEMENTS_METRICS_PORT=8102 \
OCR_HTTP_PORT=8119 OCR_GRPC_PORT=8110 OCR_METRICS_PORT=8111 \
docker compose -p retriever-gpu1 up -d page-elements ocr

The -p project names create isolated stacks, GPU_ID pins each stack to a specific physical GPU, and distinct host ports avoid collisions between the services.

Check and tear down stacks

To verify that both stacks are running use the following command.

docker compose -p retriever-gpu0 ps
docker compose -p retriever-gpu1 ps

To stop and remove both stacks use the following command.

docker compose -p retriever-gpu0 down
docker compose -p retriever-gpu1 down

ViDoRe Harness Sweep

The harness includes BEIR-style ViDoRe dataset presets in nemo_retriever/harness/test_configs.yaml and a ready-made sweep definition in nemo_retriever/harness/vidore_sweep.yaml.

The ViDoRe harness datasets are configured to:

read PDFs from /datasets/nv-ingest/vidore_v3_corpus_pdf/...
ingest with embed_modality: text_image
embed at embed_granularity: page
enable extract_page_as_image: true and extract_infographics: true
evaluate with BEIR-style ndcg and recall metrics

To run the full ViDoRe sweep:

cd ~/nv-ingest/nemo_retriever
retriever-harness sweep --runs-config harness/vidore_sweep.yaml

The same commands also work under the main CLI as retriever harness ... if you prefer a single top-level command namespace.

Harness with image/text storage

The harness can persist extracted images and text alongside other run artifacts. Set store_images_uri in test_configs.yaml (per-dataset or in active:) or via --override:

retriever harness run --dataset bo20 --preset single_gpu \
  --override store_images_uri=stored_images --override store_text=true

When store_images_uri is a relative path (like stored_images), it resolves to artifact_dir/stored_images/ so each run is isolated. Absolute paths and fsspec URIs (e.g. s3://bucket/prefix) are passed through as-is.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

2026.5.4.dev81 pre-release

May 4, 2026

2026.5.3.dev80 pre-release

May 3, 2026

2026.5.2.dev79 pre-release

May 2, 2026

2026.5.1.dev78 pre-release

May 1, 2026

2026.4.30.dev77 pre-release

Apr 30, 2026

2026.4.29.dev76 pre-release

Apr 29, 2026

2026.4.28.dev75 pre-release

Apr 28, 2026

2026.4.27.dev74 pre-release

Apr 27, 2026

2026.4.26.dev73 pre-release

Apr 26, 2026

2026.4.25.dev72 pre-release

Apr 25, 2026

2026.4.24.dev71 pre-release

Apr 24, 2026

2026.4.23.dev70 pre-release

Apr 23, 2026

2026.4.22.dev69 pre-release

Apr 22, 2026

2026.4.21.dev68 pre-release

Apr 21, 2026

2026.4.20.dev67 pre-release

Apr 20, 2026

This version

2026.4.19.dev66 pre-release

Apr 19, 2026

2026.4.18.dev65 pre-release

Apr 18, 2026

2026.4.17.dev64 pre-release

Apr 17, 2026

2026.4.16.dev63 pre-release

Apr 16, 2026

2026.4.15.dev62 pre-release

Apr 15, 2026

2026.4.14.dev61 pre-release

Apr 14, 2026

2026.4.13.dev60 pre-release

Apr 13, 2026

2026.4.12.dev59 pre-release

Apr 12, 2026

2026.4.11.dev58 pre-release

Apr 11, 2026

2026.4.10.dev57 pre-release

Apr 10, 2026

2026.4.9.dev56 pre-release

Apr 9, 2026

2026.4.8.dev55 pre-release

Apr 8, 2026

2026.4.7.dev54 pre-release

Apr 7, 2026

2026.4.6.dev53 pre-release

Apr 6, 2026

2026.4.5.dev52 pre-release

Apr 5, 2026

2026.4.4.dev51 pre-release

Apr 4, 2026

2026.4.3.dev50 pre-release

Apr 3, 2026

2026.4.2.dev49 pre-release

Apr 2, 2026

2026.4.1.dev48 pre-release

Apr 1, 2026

2026.3.31.dev47 pre-release

Mar 31, 2026

2026.3.30.dev46 pre-release

Mar 30, 2026

2026.3.29.dev45 pre-release

Mar 29, 2026

2026.3.28.dev44 pre-release

Mar 28, 2026

2026.3.27.dev43 pre-release

Mar 27, 2026

2026.3.26.dev42 pre-release

Mar 26, 2026

2026.3.25.dev41 pre-release

Mar 25, 2026

2026.3.24.dev40 pre-release

Mar 24, 2026

2026.3.23.dev39 pre-release

Mar 23, 2026

2026.3.22.dev38 pre-release

Mar 22, 2026

2026.3.21.dev37 pre-release

Mar 21, 2026

2026.3.20.dev36 pre-release

Mar 20, 2026

2026.3.19.dev35 pre-release

Mar 19, 2026

2026.3.18.dev34 pre-release

Mar 18, 2026

2026.3.17.dev33 pre-release

Mar 17, 2026

2026.3.16.dev32 pre-release

Mar 16, 2026

2026.3.15.dev31 pre-release

Mar 15, 2026

2026.3.14.dev30 pre-release

Mar 14, 2026

2026.3.13.dev29 pre-release

Mar 13, 2026

2026.3.12.dev28 pre-release

Mar 12, 2026

2026.3.11.dev27 pre-release

Mar 11, 2026

26.5rc1 pre-release

May 19, 2026

26.3.0

Mar 16, 2026

26.3.0rc4 pre-release

Mar 14, 2026

26.3.0rc3 pre-release

Mar 12, 2026

26.3.0rc2 pre-release

Mar 12, 2026

26.3.0rc1.post20 pre-release

Mar 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nemo_retriever-2026.4.19.dev66.tar.gz (639.3 kB view details)

Uploaded Apr 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nemo_retriever-2026.4.19.dev66-py3-none-any.whl (709.0 kB view details)

Uploaded Apr 19, 2026 Python 3

File details

Details for the file nemo_retriever-2026.4.19.dev66.tar.gz.

File metadata

Download URL: nemo_retriever-2026.4.19.dev66.tar.gz
Upload date: Apr 19, 2026
Size: 639.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for nemo_retriever-2026.4.19.dev66.tar.gz
Algorithm	Hash digest
SHA256	`a78504dc4e4a6f5f20498b9c8c4fab707327a07e7e9b1b5e02a93b331b44e56b`
MD5	`24aa2bd2f55983595230c40ab475b7cd`
BLAKE2b-256	`c019cc302b301abce4735b48759bdd43b2bf7e5194fff2cf0f0eda8a19eb0e79`

See more details on using hashes here.

File details

Details for the file nemo_retriever-2026.4.19.dev66-py3-none-any.whl.

File metadata

Download URL: nemo_retriever-2026.4.19.dev66-py3-none-any.whl
Upload date: Apr 19, 2026
Size: 709.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for nemo_retriever-2026.4.19.dev66-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3bef686d7729a8c24983af68c6ed0c1177ad5cebf67a79f315894dca5a442fbd`
MD5	`8bc78bd80f16bacf1b62a7422676b9d3`
BLAKE2b-256	`b374a673fbaddf44827b95abbeb4df3e62e1b4a104b3ad032c6342c63c31aa62`

See more details on using hashes here.

nemo-retriever 2026.4.19.dev66

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Quick Start for NeMo Retriever Library

Prerequisites

Setup your environment

Run the pipeline

Ingest a test pdf

Inspect extracts

Run a recall query

Generate a query answer using an LLM

Ingest other types of content:

Render results as markdown

Store extracted images and text

Explore Different Pipeline Options:

Run with remote inference, no local GPU required:

Ray cluster setup

Local Ray cluster with dashboard

Single‑GPU cluster on multi‑GPU nodes

Running multiple NIM instances on multi‑GPU hosts

Resource heuristics (batch mode)

Default behavior

Override variables

Running multiple NIM service instances on multi-GPU hosts

Start two stacks on separate GPUs

Check and tear down stacks

ViDoRe Harness Sweep

Harness with image/text storage

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes