Compile PDFs into a queryable wiki.

Project description

OpenIndex

Overview

OpenIndex parses PDF documents into a hierarchical section tree and compiles them into a persistent, cross-linked wiki that agents can query.

It combines two projects:

PageIndex — LLM-based hierarchical section extraction from PDFs
OpenKB — compiles documents into a queryable wiki with cross-document concept pages

Unlike traditional RAG (which rediscovers knowledge on every query), OpenIndex compiles once: sections are indexed, summaries generated, concept pages created with bidirectional links, and a structured wiki is written to disk. An agent can then search the wiki to answer questions precisely.

Installation
Usage
- Index a PDF
- Query the wiki
License

Installation

From PyPI:

pip install openindex

From source:

uv pip install git+https://github.com/hienhayho/openindex.git

Usage

Set environment variables (or use a .env file):

OPENAI_MODEL_NAME=...
OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_API_KEY=
OPENAI_EXTRA_BODY={}

Note: openindex works with any OpenAI-compatible API server (OpenAI, vLLM, Ollama, LM Studio, etc.). Set OPENAI_BASE_URL to point to your server.

Index a PDF

Runs the full pipeline: section extraction → verification → tree building → summaries → wiki generation.

import os
import json
from dotenv import load_dotenv
from openindex import WikiIndex, TreeConfig

load_dotenv()

index = WikiIndex(
    model_name=os.getenv("OPENAI_MODEL_NAME"),
    base_url=os.getenv("OPENAI_BASE_URL"),
    api_key=os.getenv("OPENAI_API_KEY"),
    extra_body=json.loads(os.getenv("OPENAI_EXTRA_BODY", "{}")),
    config=TreeConfig(max_parallel_llm_calls=8),
)

result = index.build_wiki_sync("paper.pdf", "./wiki")
WikiIndex.print_result(result)

See tools/index.py for a full example.

Output wiki structure:

wiki/
├── index.md              # master catalog
├── summaries/<doc>.md    # section tree with page ranges
├── concepts/<slug>.md    # cross-document concept pages
└── sources/<doc>.json    # full per-page text

Query the wiki

The query agent searches the compiled wiki to answer questions, fetching only the relevant pages.

import os
import json
from dotenv import load_dotenv
from openindex import WikiQueryAgent

load_dotenv()

agent = WikiQueryAgent(
    wiki_dir="./wiki",
    model_name=os.getenv("OPENAI_MODEL_NAME"),
    base_url=os.getenv("OPENAI_BASE_URL"),
    api_key=os.getenv("OPENAI_API_KEY"),
    extra_body=json.loads(os.getenv("OPENAI_EXTRA_BODY", "{}")),
)

answer = agent.ask_sync("What is RAG?")
print(answer)

See tools/query.py for a full example.

License

Apache 2.0. See LICENSE for details.

This project incorporates code from:

PageIndex — MIT License
OpenKB — Apache 2.0 License

Project details

Release history Release notifications | RSS feed

0.1.8

Jun 3, 2026

0.1.7

Jun 3, 2026

0.1.6

Jun 3, 2026

0.1.5

Jun 3, 2026

This version

0.1.4

Jun 3, 2026

0.1.3

May 30, 2026

0.1.2

May 30, 2026

0.1.1

May 30, 2026

0.1.0

May 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openindex-0.1.4.tar.gz (33.3 kB view details)

Uploaded Jun 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

openindex-0.1.4-py3-none-any.whl (39.8 kB view details)

Uploaded Jun 3, 2026 Python 3

File details

Details for the file openindex-0.1.4.tar.gz.

File metadata

Download URL: openindex-0.1.4.tar.gz
Upload date: Jun 3, 2026
Size: 33.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.10

File hashes

Hashes for openindex-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`caf8f1de4561e365eed8d8d67315d5669e50e8ea06f9c6ce4322cbce7c8838a5`
MD5	`3cc2a169a2456321e225c24d20b37500`
BLAKE2b-256	`56d5855c09413e9e732cf5abbf434599e5eedc21a5bb95f931fbb40b8aaea1e6`

See more details on using hashes here.

File details

Details for the file openindex-0.1.4-py3-none-any.whl.

File metadata

Download URL: openindex-0.1.4-py3-none-any.whl
Upload date: Jun 3, 2026
Size: 39.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.10

File hashes

Hashes for openindex-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`91e0c1c1113fe3fa4d726a2ee1c8f965414a3128cbf56e611bb51f9b5614abf6`
MD5	`6d19a20108fd26713c9b85c39ca39c1a`
BLAKE2b-256	`c0175154a742d556940cde4e3cbf59f86553b7f831e378a396f5f56ac5fd918d`

See more details on using hashes here.

openindex 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

OpenIndex

Overview

Table of Contents

Installation

Usage

Index a PDF

Query the wiki

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes