Enterprise-grade knowledge loading, indexing, and search for Python

These details have not been verified by PyPI

Project links

Project description

GnosisLLM Knowledge

Enterprise-grade knowledge loading, indexing, and semantic search library for Python.

Features

Semantic Search: Vector-based similarity search using OpenAI embeddings
Hybrid Search: Combine semantic and keyword (BM25) search for best results
Agentic Search: AI-powered search with reasoning and natural language answers
Agentic Memory: Conversational memory with automatic fact extraction
Multiple Loaders: Load content from websites, sitemaps, and files
Intelligent Chunking: Sentence-aware text splitting with configurable overlap
OpenSearch Backend: Production-ready with k-NN vector search
Multi-Tenancy: Index isolation for complete tenant separation (tenant-agnostic library)
Event-Driven: Observer pattern for progress tracking and monitoring
SOLID Architecture: Clean, maintainable, and extensible codebase

Installation

pip install gnosisllm-knowledge

# With OpenSearch backend
pip install gnosisllm-knowledge[opensearch]

# With all optional dependencies
pip install gnosisllm-knowledge[all]

Quick Start (CLI)

# Install
pip install gnosisllm-knowledge

# Set OpenAI API key for embeddings
export OPENAI_API_KEY=sk-...

# Setup OpenSearch with ML model
gnosisllm-knowledge setup --host localhost --port 9200
# ✓ Created connector, model, pipelines, index
# Model ID: abc123  →  Add to .env: OPENSEARCH_MODEL_ID=abc123

export OPENSEARCH_MODEL_ID=abc123

# Load content from a sitemap
gnosisllm-knowledge load https://docs.example.com/sitemap.xml
# ✓ Loaded 247 documents (1,248 chunks) in 45.3s

# Search
gnosisllm-knowledge search "how to configure authentication"
# Found 42 results (23.4ms)
# 1. Authentication Guide (92.3%)
#    To configure authentication, set AUTH_PROVIDER...

# Interactive search mode
gnosisllm-knowledge search --interactive

Quick Start (Python API)

from gnosisllm_knowledge import Knowledge

# Create instance with OpenSearch backend
knowledge = Knowledge.from_opensearch(
    host="localhost",
    port=9200,
)

# Setup backend (creates indices)
await knowledge.setup()

# Load and index a sitemap
await knowledge.load(
    "https://docs.example.com/sitemap.xml",
    collection_id="docs",
)

# Search
results = await knowledge.search("how to configure authentication")
for item in results.items:
    print(f"{item.title}: {item.score}")

CLI Commands

Setup

Configure OpenSearch with neural search capabilities:

gnosisllm-knowledge setup [OPTIONS]

Options:
  --host        OpenSearch host (default: localhost)
  --port        OpenSearch port (default: 9200)
  --use-ssl     Enable SSL connection
  --force       Clean up existing resources first
  --no-hybrid   Skip hybrid search pipeline

Load

Load and index content from URLs or sitemaps:

gnosisllm-knowledge load <URL> [OPTIONS]

Options:
  --type         Source type: website, sitemap (auto-detects)
  --index        Target index name (e.g., knowledge-tenant-123)
  --collection-id Collection grouping ID
  --batch-size   Documents per batch (default: 100)
  --max-urls     Max URLs from sitemap (default: 1000)
  --dry-run      Preview without indexing

Multi-tenancy is achieved through index isolation. Use --index with tenant-specific names (e.g., --index knowledge-tenant-123).

Search

Search indexed content with multiple modes:

gnosisllm-knowledge search <QUERY> [OPTIONS]

Options:
  --mode         Search mode: semantic, keyword, hybrid, agentic
  --index        Index to search (e.g., knowledge-tenant-123)
  --limit        Max results (default: 5)
  --collection-ids Filter by collections (comma-separated)
  --json         Output as JSON for scripting
  --interactive  Interactive search session

Multi-tenancy is achieved through index isolation. Use --index with tenant-specific names.

Architecture

gnosisllm-knowledge/
├── api/                 # High-level Knowledge facade
├── core/
│   ├── domain/          # Document, SearchQuery, SearchResult models
│   ├── interfaces/      # Protocol definitions (IContentLoader, etc.)
│   ├── events/          # Event system for progress tracking
│   └── exceptions.py    # Exception hierarchy
├── loaders/             # Content loaders (website, sitemap)
├── fetchers/            # Content fetchers (HTTP, Neoreader)
├── chunking/            # Text chunking strategies
├── backends/
│   ├── opensearch/      # OpenSearch implementation
│   └── memory/          # In-memory backend for testing
└── services/            # Indexing and search orchestration

Search Modes

from gnosisllm_knowledge import SearchMode

# Semantic search (vector similarity)
results = await knowledge.search(query, mode=SearchMode.SEMANTIC)

# Keyword search (BM25)
results = await knowledge.search(query, mode=SearchMode.KEYWORD)

# Hybrid search (default - combines both)
results = await knowledge.search(query, mode=SearchMode.HYBRID)

Agentic Search

AI-powered search with reasoning and natural language answers using OpenSearch ML agents.

Requirements: OpenSearch 3.4+ for conversational memory support.

Setup

# 1. First run standard setup (creates embedding model)
gnosisllm-knowledge setup --port 9201

# 2. Setup agentic agents (creates LLM connector, VectorDBTool, MLModelTool, agents)
gnosisllm-knowledge agentic setup
# ✓ Flow Agent ID: abc123
# ✓ Conversational Agent ID: def456

# 3. Add agent IDs to environment
export OPENSEARCH_FLOW_AGENT_ID=abc123
export OPENSEARCH_CONVERSATIONAL_AGENT_ID=def456

Usage

# Single-turn agentic search (uses flow agent)
gnosisllm-knowledge search --mode agentic "What is Typer?"

# Interactive multi-turn chat (uses conversational agent with memory)
gnosisllm-knowledge agentic chat
# You: What is Typer?
# Assistant: Typer is a library for building CLI applications...
# You: What did you just say about it?
# Assistant: I told you that Typer is a library for building CLI...

How It Works

┌─────────────────────────────────────────────────────────────────────┐
│                         User Query                                   │
│                    "What is Typer?"                                  │
└─────────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    OpenSearch ML Agent                               │
│              (Flow or Conversational)                                │
└─────────────────────────────────────────────────────────────────────┘
                              │
            ┌─────────────────┴─────────────────┐
            ▼                                   ▼
┌───────────────────────┐           ┌───────────────────────┐
│     VectorDBTool      │           │   Conversation Memory │
│  (Knowledge Search)   │           │   (Conversational     │
│                       │           │    Agent Only)        │
│  - Searches index     │           │                       │
│  - Returns context    │           │  - Stores Q&A pairs   │
│                       │           │  - Injects chat_history│
└───────────────────────┘           └───────────────────────┘
            │                                   │
            └─────────────────┬─────────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│                      MLModelTool (answer_generator)                  │
│                                                                      │
│  Prompt Template:                                                    │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │ Context from knowledge base:                                    │ │
│  │ ${parameters.knowledge_search.output}                           │ │
│  │                                                                 │ │
│  │ Previous conversation:        ← Only for conversational agent  │ │
│  │ ${parameters.chat_history:-}                                    │ │
│  │                                                                 │ │
│  │ Question: ${parameters.question}                                │ │
│  └────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│                      AI-Generated Answer                             │
│  "Typer is a library for building CLI applications in Python..."    │
└─────────────────────────────────────────────────────────────────────┘

Agent Types

Agent	Type	Use Case	Memory
Flow	`flow`	Fast single-turn RAG, API calls	No
Conversational	`conversational_flow`	Multi-turn dialogue, chat	Yes

Key Configuration

The conversational agent requires these settings for memory to work:

# In agent registration (setup.py)
agent_body = {
    "type": "conversational_flow",
    "app_type": "rag",  # Required for memory injection
    "llm": {
        "model_id": llm_model_id,
        "parameters": {
            "message_history_limit": 10,  # Include last N messages
        },
    },
    "memory": {"type": "conversation_index"},
}

# MLModelTool prompt must include:
# ${parameters.chat_history:-}  ← Receives conversation history

Multi-Tenancy

This library is tenant-agnostic. Multi-tenancy is achieved through index isolation - each tenant gets their own OpenSearch index.

# The calling application (e.g., API) constructs tenant-specific index names
index_name = f"knowledge-{account_id}"

# Create Knowledge instance for the tenant
knowledge = Knowledge.from_opensearch(
    host="localhost",
    port=9200,
    index_prefix=index_name,  # knowledge-tenant-123
)

# Load content to tenant's isolated index
await knowledge.load(
    source="https://docs.example.com/sitemap.xml",
    collection_id="docs",
)

# Search within tenant's index (no account_id filter needed)
results = await knowledge.search(
    "query",
    collection_ids=["docs"],
)

Note: For audit purposes, you can store account_id in document metadata:

await knowledge.load(
    source="https://docs.example.com/sitemap.xml",
    document_defaults={"metadata": {"account_id": "tenant-123"}},
)

Agentic Memory

Conversational memory with automatic fact extraction using OpenSearch's ML Memory plugin.

# Setup memory connectors
gnosisllm-knowledge memory setup --openai-key sk-...

# Create container and store conversations
gnosisllm-knowledge memory container create my-memory
gnosisllm-knowledge memory store <container-id> --file messages.json --user-id alice
gnosisllm-knowledge memory recall <container-id> "user preferences" --user-id alice

from gnosisllm_knowledge import Memory, MemoryStrategy, StrategyConfig, Message

memory = Memory.from_env()

# Create container with strategies
container = await memory.create_container(
    name="agent-memory",
    strategies=[
        StrategyConfig(type=MemoryStrategy.SEMANTIC, namespace=["user_id"]),
    ],
)

# Store conversation with fact extraction
await memory.store(
    container_id=container.id,
    messages=[Message(role="user", content="I prefer dark mode")],
    user_id="alice",
    infer=True,
)

# Recall memories
result = await memory.recall(container.id, "preferences", user_id="alice")

See docs/memory.md for full documentation.

Event Tracking

from gnosisllm_knowledge import EventType

# Subscribe to events
@knowledge.events.on(EventType.DOCUMENT_INDEXED)
def on_indexed(event):
    print(f"Indexed: {event.document_id}")

@knowledge.events.on(EventType.BATCH_COMPLETED)
def on_batch(event):
    print(f"Batch complete: {event.documents_indexed} docs")

Configuration

from gnosisllm_knowledge import OpenSearchConfig

# From environment variables
config = OpenSearchConfig.from_env()

# Explicit configuration
config = OpenSearchConfig(
    host="search.example.com",
    port=443,
    use_ssl=True,
    username="admin",
    password="secret",
    embedding_model="text-embedding-3-small",
    embedding_dimension=1536,
)

knowledge = Knowledge.from_opensearch(config=config)

Requirements

Python 3.11+
OpenSearch 2.0+ (for production use)
OpenSearch 3.4+ (for agentic search with conversation memory)

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.6

Feb 7, 2026

0.4.5

Feb 7, 2026

0.4.4

Feb 4, 2026

This version

0.4.3

Jan 20, 2026

0.4.0

Jan 19, 2026

0.3.0

Jan 8, 2026

0.2.0

Jan 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gnosisllm_knowledge-0.4.3.tar.gz (151.6 kB view details)

Uploaded Jan 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gnosisllm_knowledge-0.4.3-py3-none-any.whl (197.6 kB view details)

Uploaded Jan 20, 2026 Python 3

File details

Details for the file gnosisllm_knowledge-0.4.3.tar.gz.

File metadata

Download URL: gnosisllm_knowledge-0.4.3.tar.gz
Upload date: Jan 20, 2026
Size: 151.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.2.1 CPython/3.13.9 Darwin/23.4.0

File hashes

Hashes for gnosisllm_knowledge-0.4.3.tar.gz
Algorithm	Hash digest
SHA256	`a8d09342303d822b828f8e53ae74d511995e9da9cf88f3feba66c0479c8d4c3f`
MD5	`30222a4c1ec39acdb66adae94cf4ffb8`
BLAKE2b-256	`b42f4ce55e13b0caa3bfc093278bf702a8725bbddb50680698339b66a34d2d3c`

See more details on using hashes here.

File details

Details for the file gnosisllm_knowledge-0.4.3-py3-none-any.whl.

File metadata

Download URL: gnosisllm_knowledge-0.4.3-py3-none-any.whl
Upload date: Jan 20, 2026
Size: 197.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.2.1 CPython/3.13.9 Darwin/23.4.0

File hashes

Hashes for gnosisllm_knowledge-0.4.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f2ec9efecafba48dad2dd18283a602095b3459e31bc991c4098db6514e5f8e51`
MD5	`52a55e15f69971271d4761a5cfee6b89`
BLAKE2b-256	`47341cd8b3542dc74068142a745992ee733756bb8d5d4cd2fc0afc40421b2e40`

See more details on using hashes here.

gnosisllm-knowledge 0.4.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

GnosisLLM Knowledge

Features

Installation

Quick Start (CLI)

Quick Start (Python API)

CLI Commands

Setup

Load

Search

Architecture

Search Modes

Agentic Search

Setup

Usage

How It Works

Agent Types

Key Configuration

Multi-Tenancy

Agentic Memory

Event Tracking

Configuration

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes