Skip to main content

Standalone Visual-RAG PDF Parser - text extraction and Vision-LLM figure descriptions to JSONL

Project description

visual-parser (Standalone Visual-RAG PDF Ingestion)

Python 3.12.10

visual-parser is a standalone document-ingestion tool that converts PDFs into a multi-modal JSONL knowledge base (text chunks + figure descriptions + metadata). The intended workflow is:

  1. Run visual-parser on curated PDFs to generate JSONL KB files.
  2. Run RADIANT-LLM Visual-RAG for QA over the generated KB.

Outputs (JSONL KB)

By default, the pipeline writes:

  • 01_chunks_kb.jsonl: chunked text extracted from PDFs (Nougat by default).
  • 02_visuals_kb.jsonl: figure/page visual descriptions (Vision LLM).
  • 03_metadata_kb.jsonl: document metadata rows (title/author/etc.).
  • 04_processed_pdfs.txt: a tracker so re-runs only process new PDFs (unless --rebuild).

API keys (.env)

Provide at least one provider:

  • OPENAI_API_KEY (OpenAI)
  • GEMINI_API_KEY (Gemini)

Optional:

  • HF_TOKEN (if you use gated Hugging Face models)

Run with Docker (Docker Hub)

Prebuilt images are on zev94/radiant-llm under the visual-parser tags:

Tag Description
visual-parser-latest Always latest build (rolling)
visual-parser-2.0.0 Pinned release (Apache 2.0 release)
visual-parser-1.0 Legacy — v1.0.0, stale

1) Install Docker

  • Docker Desktop (Windows/macOS) or Docker Engine (Linux)

2) Pull the image

docker pull zev94/radiant-llm:visual-parser-latest

3) Run (input + output on the same mounted folder)

Windows PowerShell:

docker run --rm --env-file .env `
  -v "C:\path\to\pdfs:/data" `
  zev94/radiant-llm:visual-parser-latest `
  --input-dir /data --output-dir /data

Linux / WSL:

docker run --rm --env-file .env \
  -v "/path/to/pdfs:/data" \
  zev94/radiant-llm:visual-parser-latest \
  --input-dir /data --output-dir /data

4) Run (separate output directory)

Windows PowerShell:

docker run --rm --env-file .env `
  -v "C:\path\to\pdfs:/data" `
  -v "C:\path\to\out:/out" `
  zev94/radiant-llm:visual-parser-latest `
  --input-dir /data --output-dir /out

Offline install (legacy .tar)

docker load -i .\visual-parser_0.1.0.tar
docker images   # use the tag printed by Docker

Model overrides (optional)

Default vision model is GPT-5.4 when using --vision-provider gpt. Override on the command line:

docker run --rm --env-file .env -v "C:\path\to\pdfs:/data" `
  zev94/radiant-llm:visual-parser-latest `
  --input-dir /data --output-dir /data --vision-model gpt-5.4

Common configuration flags

After pulling the image, run:

docker run --rm zev94/radiant-llm:visual-parser-latest --help

For copy-paste Docker examples (vision presets, text modes, workers, rebuild), see docker-usage-examples.md.

Paths:

  • --input-dir / -i (required)
  • --output-dir / -o (default: same as input)

Text extraction:

  • --text-mode nougat|lightweight (default: nougat)
  • --nougat-model facebook/nougat-small
  • --chunk-size 500
  • --chunk-overlap 100

Vision LLM:

  • --vision-provider gpt|gemini (default: gpt)
  • --vision-model gpt-5.2 (or gpt-4o, gemini-2.5-flash, etc.)
  • --vision-detail low|high|auto
  • --reasoning-effort none|low|medium|high|xhigh
  • --metadata-pages 2

Performance / misc:

  • --max-workers 4
  • --rebuild (reprocess everything; ignore 04_processed_pdfs.txt)
  • --log-level DEBUG|INFO|WARNING|ERROR

Citation

If you use RADIANT-LLM or the accompanying evaluation materials, please cite the journal article:

@article{ndum2026retrieval,
  title={A retrieval-augmented, domain-intelligent agentic framework for reliable decision support in safety-critical nuclear engineering},
  author={Ndum, Zavier Ndum and Tao, Jian and Ford, John and Yim, Mansung and Liu, Yang},
  journal={Reliability Engineering \& System Safety},
  pages={113057},
  year={2026},
  publisher={Elsevier}
}

Journal: Reliability Engineering & System Safety (2026), article 113057
Preprint: https://arxiv.org/abs/2604.22755


License

Copyright 2026 Zavier N. Ndum

This project is licensed under the Apache License 2.0. See the LICENSE file in the RADIANT_LLM repository for the full license text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

visual_parser-2.0.4.tar.gz (31.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

visual_parser-2.0.4-py3-none-any.whl (37.0 kB view details)

Uploaded Python 3

File details

Details for the file visual_parser-2.0.4.tar.gz.

File metadata

  • Download URL: visual_parser-2.0.4.tar.gz
  • Upload date:
  • Size: 31.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for visual_parser-2.0.4.tar.gz
Algorithm Hash digest
SHA256 9730ca08868a7651d354c075705cbb5e1f074b6cf452c78d3abf8f9797fc8efa
MD5 c4dd053714dba4b9d42797f73d77b4a2
BLAKE2b-256 075f7ef9860910789259244f578b5bc6b0877f93f540ad175d6c4052c7159fb4

See more details on using hashes here.

File details

Details for the file visual_parser-2.0.4-py3-none-any.whl.

File metadata

  • Download URL: visual_parser-2.0.4-py3-none-any.whl
  • Upload date:
  • Size: 37.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for visual_parser-2.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 edeb5b5042777477b4d2b490eae9768f860ec59b57db4dd87b49c3793163c215
MD5 039879f6f66a279e0367e994edc02ef4
BLAKE2b-256 42fd8eaac17ebd096a75ff7a0e62263b0e104b6c676b43189145d97628f37ee3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page