knowai

A conversational RAG agent pipeline using LangGraph

These details have not been verified by PyPI

Project links

Repository

Project description

knowai

An agentic AI pipeline for multiple, large PDF reports interrogation

Set up

Clone this repository into a local directory of your choosing
Build a virtual environment
Install knowai by running: pip install . from the root directory of your clone (OR) install using pip install knowai from PyPI.
Configure a .env file with the following:
- AZURE_OPENAI_API_KEY - Your API key
- AZURE_OPENAI_ENDPOINT - Your Azure endpoint
- AZURE_OPENAI_DEPLOYMENT - Your LLM deployment name (e.g., "gpt-4o")
- AZURE_EMBEDDINGS_DEPLOYMENT - Your embeddings model deployment name (e.g., "text-embedding-3-large", defaults to "text-embedding-3-large")
- AZURE_OPENAI_API_VERSION - Your Azure LLM deployment version (e.g., "2024-02-01")
- AZURE_OPENAI_EMBEDDINGS_API_VERSION - Your Azure embeddings API version (e.g., "2024-02-01", defaults to "2024-02-01")
- VECTORSTORE_EMBEDDING_BATCH_SIZE - Optional number of chunks to embed per vectorstore write batch. Defaults to 50.

Building the vectorstore

Using the CLI (Recommended)

From the root directory of this repository, run the following from a terminal (ensuring that your virtual environment is active) to build the vectorstore:

python -m knowai.cli_vectorstore build <directory_containing_your_input_pdf_files> --metadata_parquet_path <path_to_metadata.parquet> --vectorstore_path <directory_name_for_vectorstore>

Using the Python API

You can also build vectorstores programmatically:

from knowai import get_retriever_from_directory

retriever = get_retriever_from_directory(
    directory_path="path/to/pdfs",
    persist_directory="my_vectorstore",
    metadata_parquet_path="metadata.parquet",
    k=10,
    chunk_size=1400,
    chunk_overlap=200
)

PDF ingestion and embedding batches

When building a vectorstore, KnowAI processes each PDF page by page. Text is extracted with PyMuPDF, using standard text extraction first and a block-based fallback when a page has no standard text. Page text is split into overlapping chunks, very short chunks are skipped, and each retained chunk is stored as a LangChain Document with the chunk text plus metadata from the parquet file.

Each chunk also gets provenance fields that make retrieval traceable:

file_name
source_path
page
chunk_index

Embedding writes are batched by default to avoid sending oversized inputs to the embeddings model. New chunks are embedded and added to FAISS in batches of up to 50 documents unless VECTORSTORE_EMBEDDING_BATCH_SIZE is set. When updating an existing vectorstore, KnowAI checks existing file_name and page metadata and skips pages already present in the FAISS store.

Inspecting Vectorstores

To inspect an existing vectorstore:

python -m knowai.cli_vectorstore inspect <vectorstore_path>

Or programmatically:

from knowai import load_vectorstore, show_vectorstore_schema, list_vectorstore_files, analyze_vectorstore_chunking

# Load vectorstore
vectorstore = load_vectorstore("my_vectorstore")

# Show schema information
schema = show_vectorstore_schema(vectorstore)
print(f"Total vectors: {schema['total_vectors']}")
print(f"Metadata fields: {schema['metadata_fields']}")

# List files in vectorstore
files = list_vectorstore_files(vectorstore)
print(f"Files: {files}")

# Analyze chunking parameters used to build the vectorstore
analysis = analyze_vectorstore_chunking(vectorstore)
print(f"Estimated chunk size: {analysis['recommended_settings']['chunk_size']}")
print(f"Estimated overlap: {analysis['recommended_settings']['chunk_overlap']}")

By default, this will create a vectorstore using FAISS named "test_faiss_store" in the root directory of your repository.

Running the knowai in a simple chatbot example via streamlit

From the root directory, run the following in a terminal after you have your virtual environment active:

streamlit run app_chat_simple.py

This will open the app in your default browser.

Using knowai

Once your vector store is built, you can use knowai either programmatically or through the provided Streamlit interface.

Python quick‑start

The package ships with the KnowAIAgent class for fully programmatic access inside notebooks or scripts:

from knowai.core import KnowAIAgent

# Path that you supplied with --vectorstore_path when building
VSTORE_PATH = "test_faiss_store"

agent = KnowAIAgent(vectorstore_path=VSTORE_PATH)

# A single conversational turn
response = await agent.process_turn(
    user_question="Summarize the key findings in the 2025 maritime report",
    selected_files=["my_report.pdf"],
)

print(response["generation"])

Streaming Responses

For a more responsive user experience, you can enable streaming responses:

def stream_callback(token: str):
    """Called for each token as it's generated."""
    print(token, end='', flush=True)

response = await agent.process_turn(
    user_question="Summarize the key findings in the 2025 maritime report",
    selected_files=["my_report.pdf"],
    streaming_callback=stream_callback  # Enable streaming
)

The response will be streamed in real-time via the callback, while still being available in the returned dictionary.

The returned dictionary contains:

Key	Description
`generation`	Final answer synthesised from the selected documents.
`individual_answers`	Per‑file answers (when bypass_individual_gen=False).
`documents_by_file`	Retrieved document chunks keyed by filename.
`raw_documents_for_synthesis`	Raw text block used when bypassing individual generation.
`bypass_individual_generation`	Whether the bypass mode was used for this turn.

Token Counting Configuration

KnowAI supports two methods for token counting to manage context window limits:

Accurate Token Counting (Default)

Uses tiktoken library for precise token estimation
More accurate batch sizing and context management
Automatically falls back to heuristic method if tiktoken unavailable

# Default behavior (accurate token counting)
agent = KnowAIAgent(vectorstore_path=VSTORE_PATH)

# Explicit accurate token counting
agent = KnowAIAgent(
    vectorstore_path=VSTORE_PATH,
    
)

Heuristic Token Counting

Uses character-based estimation (4 characters ≈ 1 token)
Faster performance, suitable when approximate estimation is sufficient
Always available as fallback

# Use heuristic token counting
agent = KnowAIAgent(
    vectorstore_path=VSTORE_PATH,
    
)

CLI Configuration

curl -X POST http://127.0.0.1:8000/initialize \
  -H "Content-Type: application/json" \
  -d '{
    "vectorstore_s3_uri": "/path/to/vectorstore",
    
  }'

Benefits of Accurate Token Counting:

More precise token limits and batch sizing
Reduced risk of context overflow
Better resource utilization
Improved reliability with large document sets

When to Use Heuristic Counting:

tiktoken not available in environment
Performance is critical
Approximate estimation is sufficient
Debugging token counting issues

Streamlit chat app

If you prefer a ready‑made UI, launch the demo:

streamlit run app_chat_simple.py

Upload or select PDF files, ask questions in the sidebar, and inspect per‑file answers or the combined response in the main panel.

For advanced configuration options (e.g., conversation history length, retriever k values, or combine thresholds) see the docstrings in knowai/core.py and knowai/agent.py.

Containerization

To build and run both the knowai service and the Svelte UI using Docker Compose:

Ensure Docker and Docker Compose are installed on your machine.
From the directory containing this README (the repo root), navigate to the Svelte example folder:
```
cd example_apps/svelte
```

2a. Compile the Svelte app and package the build as svelte-example:

npm install
npm run build
mv dist svelte-example

Start the services and build images:
```
docker compose up --build
```
This will:
- Build the knowai service (listening on port 8000).
- Build the ui service (Svelte app, listening on port 5173).
Open your browser and visit:
- FastAPI docs: http://localhost:8000/docs
- Svelte UI: http://localhost:5173
To stop and remove containers, press CTRL+C and then run:
```
docker compose down
```

Running the knowai CLI Locally

You can start the FastAPI micro-service locally without Docker and point it to either a local vectorstore or one hosted on S3.

Using a Local Vectorstore

Ensure you have a built FAISS vectorstore on disk (e.g., test_faiss_store).
Start the service:
```
python -m knowai.cli
```

In another terminal, initialize the session:

curl -X POST http://127.0.0.1:8000/initialize \
  -H "Content-Type: application/json" \
  -d '{"vectorstore_s3_uri":"/absolute/path/to/your/vectorstore"}'

Ask a question:

curl -X POST http://127.0.0.1:8000/ask \
  -H "Content-Type: application/json" \
  -d '{
    "session_id":"<session_id>",
    "question":"Your question here",
    "selected_files":["file1.pdf","file2.pdf"]
  }'

Streaming API

For real-time streaming responses, use the /ask-stream endpoint:

curl -X POST http://127.0.0.1:8000/ask-stream \
  -H "Content-Type: application/json" \
  -d '{
    "session_id":"<session_id>",
    "question":"Your question here",
    "selected_files":["file1.pdf","file2.pdf"]
  }' \
  --no-buffer

This will stream the response in real-time using Server-Sent Events (SSE). Each token will be sent as it's generated by the LLM.

For more details on streaming functionality, see docs/STREAMING.md.

Using an S3-Hosted Vectorstore

Start the service:
```
python -m knowai.cli
```

Initialize the session against your S3 bucket:

curl -X POST http://127.0.0.1:8000/initialize \
  -H "Content-Type: application/json" \
  -d '{"vectorstore_s3_uri":"s3://your-bucket/path"}'

Ask a question in a similar way:

curl -X POST http://127.0.0.1:8000/ask \
  -H "Content-Type: application/json" \
  -d '{
    "session_id":"<session_id>",
    "question":"Another question example",
    "selected_files":[]
  }'

Enhanced User Feedback

KnowAI provides comprehensive feedback when the search process doesn't find relevant information in your documents.

No-Chunks Feedback

When no text chunks are extracted for a query in a file, KnowAI ensures users are clearly informed:

Individual File Level: Each file that has no matching content receives a specific message explaining that "The search did not retrieve any document chunks that match your query."
Synthesis Level: The final response clearly states which files had no relevant content, helping users understand the scope of the search results.
Progress Tracking: Files with no matching content are tracked separately from files with errors, providing clear distinction in the response.

Example Response

When asking about "climate change impacts" across multiple reports:

I found information about climate change impacts in the following reports:

From report1.pdf (Page 15):
"Global temperatures have increased by 1.1°C since pre-industrial times..."

From report2.pdf (Page 8):
"Sea level rise is accelerating at a rate of 3.3mm per year..."

No matching content found in: report3.pdf (no matching content).

This helps users understand:

Which files contained relevant information
Which files were searched but had no matching content
The specific nature of missing information

Error Handling

KnowAI distinguishes between different types of issues:

No matching content: Files that were searched but had no relevant chunks
Content policy violations: Issues with AI provider content filters
Processing errors: Technical issues during document processing

Each type is handled appropriately and communicated clearly to the user.

Testing

Run the test suite to verify functionality:

# Run all tests
python -m pytest

# Test specific functionality
python -m pytest tests/test_prompts.py -v
python -m pytest tests/test_agent.py -v
python -m pytest tests/test_vectorstore.py -v

# Test no-chunks feedback improvements
python scripts/test_no_chunks_feedback.py

The vectorstore tests cover default embedding batching, VECTORSTORE_EMBEDDING_BATCH_SIZE overrides, incremental updates to existing FAISS stores, and metadata-aware vectorstore inspection utilities.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Individual File Processing

KnowAI supports two processing modes for handling multiple files:

Traditional Batch Processing (Default)

All documents from all files are combined and processed together
Faster processing, good for related content across files

Individual File Processing

Each file is processed separately by the LLM (in parallel, max 10 concurrent), then responses are consolidated
Ensures each file gets equal attention, better for distinct topics
Significantly faster than sequential processing for multiple files
Concurrency limit prevents overwhelming the LLM service

# Enable individual file processing
agent = KnowAIAgent(
    vectorstore_path=VSTORE_PATH,
    process_files_individually=True  # Enable individual processing
)

# Or enable per-request
response = await agent.process_turn(
    user_question="What are the main strategies?",
    selected_files=["file1.pdf", "file2.pdf"],
    process_files_individually=True  # Override for this request
)

When to Use Individual File Processing:

Files contain distinct topics that should be analyzed separately
You want to ensure each file gets equal attention from the LLM
You want to see how each file contributes to the final answer
Dealing with large files that might benefit from focused analysis

For more details, see Individual File Processing Documentation.

Token Counting Configuration

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

This version

0.6.0

May 8, 2026

0.5.0

Oct 16, 2025

0.4.0

Aug 26, 2025

0.3.0

Jul 9, 2025

0.2.3

Jul 1, 2025

0.2.2

Jun 8, 2025

0.2.1

Jun 7, 2025

0.2.0

Jun 6, 2025

0.1.0

May 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

knowai-0.6.0.tar.gz (45.6 kB view details)

Uploaded May 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

knowai-0.6.0-py3-none-any.whl (49.4 kB view details)

Uploaded May 8, 2026 Python 3

File details

Details for the file knowai-0.6.0.tar.gz.

File metadata

Download URL: knowai-0.6.0.tar.gz
Upload date: May 8, 2026
Size: 45.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for knowai-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`12bb213027ed370e0560bfc73ea77b3509ae1919621949d410eaf0c191685532`
MD5	`92274181089f5508f81db5c65a22ec83`
BLAKE2b-256	`c6a7011c5668a7b50c9c47526cf0e7133ca26887aeacc603364aacb6b10fa1a3`

See more details on using hashes here.

File details

Details for the file knowai-0.6.0-py3-none-any.whl.

File metadata

Download URL: knowai-0.6.0-py3-none-any.whl
Upload date: May 8, 2026
Size: 49.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for knowai-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0fc09b99d206b1654cc627ed5b60aa83a605e15c91fbbd9abe9fb5a9df9cfb6d`
MD5	`7393c9e83cff080d56a8ce5d7c52814f`
BLAKE2b-256	`b3fd15992908275ed66b83af6d11f9f8cf6d642914a4c3d41eb3c4f1d736c9f1`

See more details on using hashes here.

knowai 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

knowai

An agentic AI pipeline for multiple, large PDF reports interrogation

Set up

Building the vectorstore

Using the CLI (Recommended)

Using the Python API

PDF ingestion and embedding batches

Inspecting Vectorstores

Running the knowai in a simple chatbot example via streamlit

Using knowai

Python quick‑start

Streaming Responses

Token Counting Configuration

Streamlit chat app

Containerization

Running the knowai CLI Locally

Using a Local Vectorstore

Streaming API

Using an S3-Hosted Vectorstore

Enhanced User Feedback

No-Chunks Feedback

Example Response

Error Handling

Testing

Contributing

Individual File Processing

Token Counting Configuration

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes