Skip to main content

Multi-modal knowledge library with vector and full-text search for text, code, images, and PDFs

Project description

Librarian

A personal knowledge library for AI agents, built on Arcade for the Model Context Protocol (MCP).

Overview

Librarian provides AI agents with persistent storage for text, documents, and knowledge. Agents can store information and retrieve it later through semantic and keyword search, maintaining context across conversations.

graph LR
    A[Agent Stores Info] --> B[Parser]
    B --> C[Chunker]
    C --> D[Embedder]
    D --> E[(SQLite + vec)]
    F[Agent Queries] --> G[Hybrid Search]
    E --> G
    G --> H[Relevant Context]

Features

  • Persistent knowledge storage for AI agents
  • SQLite storage with sqlite-vec for vector search
  • Full-text search using FTS5 with BM25 ranking
  • Hybrid search combining semantic and keyword matching
  • Max Marginal Relevance (MMR) for diverse results
  • Configurable embedding models (local or OpenAI-compatible API)
  • Header-aware text chunking with overlap
  • Time-bounded search filters
  • CLI and MCP server interfaces

Multi-Modal Support

Librarian supports indexing and searching across multiple file types:

Asset Type File Extensions Features
Text .md, .txt Frontmatter extraction, header-aware chunking
Code .py, .js, .ts, .go, .rs, .java, .cpp, and more Symbol extraction (classes, functions, methods)
PDF .pdf Page-based text extraction
Image .png, .jpg, .jpeg, .gif, .webp Metadata and EXIF extraction, optional OCR

Installation

git clone https://github.com/ArcadeAI/librarian.git
cd librarian
./setup.sh

Or install manually:

uv pip install -e ".[dev]"

Optional multi-modal dependencies:

uv pip install -e ".[pdf]"      # PDF support (pypdf)
uv pip install -e ".[vision]"   # Image support (Pillow)
uv pip install -e ".[all]"      # All optional features

CLI Usage

# Add files to the library
libr add ~/notes

# Search the library
libr search "machine learning concepts"

# List sources
libr list

# View library statistics
libr index

# Rebuild the index
libr index build

MCP Server

Start the server for AI assistant integration:

# stdio transport (Claude Desktop, CLI)
libr serve stdio

# HTTP transport (Cursor, VS Code)
libr serve http --port 8000

See the Arcade MCP documentation for integration details.

Available Tools

Core Tools (always enabled):

Tool Description
Librarian_SearchLibrary Unified search with mode selection (hybrid/semantic/keyword), asset type filtering, and timeframe support
Librarian_AddToLibrary Store new content in the library
Librarian_UpdateLibraryDoc Update existing content
Librarian_ReadFromLibrary Read full document content
Librarian_RemoveFromLibrary Remove content from the library
Librarian_ListLibraryContents List all stored content
Librarian_IndexDirectoryToLibrary Bulk import files from a directory

Optional Tools (enable with LIBRARIAN_ENABLE_OPTIONAL_TOOLS=true):

Tool Description
Librarian_GetLibrarySources List sources with document/chunk counts
Librarian_GetLibraryStats Overall library statistics
Librarian_GetLibraryStructure Filesystem structure of library sources
Librarian_GetLibrarySections Simplified view of available storage locations
Librarian_SuggestLibraryLocation AI-powered suggestions for where to store content

Configuration

Set via environment variables:

Variable Default Description
DOCUMENTS_PATH ./documents Root directory for files
DATABASE_PATH ~/.librarian/index.db SQLite database location
EMBEDDING_PROVIDER openai local or openai
EMBEDDING_MODEL all-MiniLM-L6-v2 Local model name
OPENAI_API_BASE http://localhost:7171/v1 OpenAI-compatible API URL
OPENAI_EMBEDDING_MODEL qwen3-embedding-06b API model name
CHUNK_SIZE 512 Max characters per chunk
CHUNK_OVERLAP 50 Overlap between chunks
SEARCH_LIMIT 10 Default results limit
MMR_LAMBDA 0.5 MMR diversity (0=diverse, 1=relevant)
HYBRID_ALPHA 0.7 Vector vs keyword weight (1=vector only)

Project Structure

librarian/
├── cli.py           # Command-line interface
├── server.py        # MCP server and tool definitions
├── config.py        # Configuration management
├── indexing.py      # Document indexing service
├── types.py         # Shared type definitions
├── storage/
│   ├── database.py  # SQLite operations
│   ├── vector_store.py  # sqlite-vec search
│   └── fts_store.py     # FTS5 search
├── processing/
│   ├── embed/       # Embedding providers
│   ├── parsers/     # Document parsers (md, code, pdf, image)
│   └── transform/   # Text chunking
├── retrieval/
│   └── search.py    # Hybrid search + MMR
└── utils/
    └── timeframe.py # Time filter utilities

Development

make install    # Install dependencies
make test       # Run tests
make lint       # Run linter
make format     # Format code
make typecheck  # Type checking
make check      # All checks
make evals      # Run evaluations

Resources

License

MIT License - see LICENSE for details.

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_library-0.11.0.tar.gz (142.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_library-0.11.0-py3-none-any.whl (101.2 kB view details)

Uploaded Python 3

File details

Details for the file agent_library-0.11.0.tar.gz.

File metadata

  • Download URL: agent_library-0.11.0.tar.gz
  • Upload date:
  • Size: 142.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agent_library-0.11.0.tar.gz
Algorithm Hash digest
SHA256 75391b49a6b07c77dc6675ba15b2970eac7df1eb73b9e3a54ba336b134380c7e
MD5 2da62587547b5f24107429463eb314b2
BLAKE2b-256 fa02f072f8f5caa8d6efc7ff18702894be1b6ecebcf105a3b8198816eaa5bd0d

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_library-0.11.0.tar.gz:

Publisher: release.yml on arcadeai-labs/librarian

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agent_library-0.11.0-py3-none-any.whl.

File metadata

  • Download URL: agent_library-0.11.0-py3-none-any.whl
  • Upload date:
  • Size: 101.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agent_library-0.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b68b1e91a980e9f8a371531885f7d3e83b2da3366884f619d3bcbe85798214ff
MD5 d14d07fca97a9c128c2701ee0c0bce3b
BLAKE2b-256 201fae003bd3db1500bc0e716369c76e00b54843eca408dd5f5e4259f94e5e6f

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_library-0.11.0-py3-none-any.whl:

Publisher: release.yml on arcadeai-labs/librarian

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page