Skip to main content

Utility package for interacting with vectorstores

Project description

Rakam System Vectorstore

The vectorstore package of Rakam Systems providing vector database solutions and document processing capabilities.

Overview

rakam-systems-vectorstore provides comprehensive vector storage, embedding models, and document loading capabilities. This package depends on rakam-systems-core.

Features

  • Configuration-First Design: Change your entire vector store setup via YAML - no code changes
  • Multiple Backends: PostgreSQL with pgvector and FAISS in-memory storage
  • Flexible Embeddings: Support for SentenceTransformers, OpenAI, and Cohere
  • Document Loaders: PDF, DOCX, HTML, Markdown, CSV, and more
  • Search Capabilities: Vector search, keyword search (BM25), and hybrid search
  • Chunking: Intelligent text chunking with context preservation
  • Configuration: Comprehensive YAML/JSON configuration support

๐ŸŽฏ Configuration Convenience

The vectorstore package's configurable design allows you to:

  • Switch embedding models without code changes (local โ†” OpenAI โ†” Cohere)
  • Change search algorithms instantly (BM25 โ†” ts_rank โ†” hybrid)
  • Adjust search parameters (similarity metrics, top-k, hybrid weights)
  • Toggle features (hybrid search, caching, reranking)
  • Tune performance (batch sizes, chunk sizes, connection pools)
  • Swap backends (FAISS โ†” PostgreSQL) by updating config

Example: Test different embedding models to find the best accuracy/cost balance - just update your YAML config file, no code changes needed!

Installation

# Requires core package
pip install -e ./rakam-systems-core

# Install vectorstore package
pip install -e ./rakam-systems-vectorstore

# With specific backends
pip install -e "./rakam-systems-vectorstore[postgres]"
pip install -e "./rakam-systems-vectorstore[faiss]"
pip install -e "./rakam-systems-vectorstore[all]"

Quick Start

FAISS Vector Store (In-Memory)

from rakam_systems_vectorstore.components.vectorstore.faiss_vector_store import FaissStore
from rakam_systems_vectorstore.core import Node, NodeMetadata

# Create store
store = FaissStore(
    name="my_store",
    base_index_path="./indexes",
    embedding_model="Snowflake/snowflake-arctic-embed-m",
    initialising=True
)

# Create nodes
nodes = [
    Node(
        content="Python is great for AI",
        metadata=NodeMetadata(source_file_uuid="doc1", position=0)
    )
]

# Add and search
store.create_collection_from_nodes("my_collection", nodes)
results, _ = store.search("my_collection", "AI programming", number=5)

PostgreSQL Vector Store

import os
import django
from django.conf import settings

# Configure Django (required)
if not settings.configured:
    settings.configure(
        INSTALLED_APPS=[
            'django.contrib.contenttypes',
            'rakam_systems_vectorstore.components.vectorstore',
        ],
        DATABASES={
            'default': {
                'ENGINE': 'django.db.backends.postgresql',
                'NAME': os.getenv('POSTGRES_DB', 'vectorstore_db'),
                'USER': os.getenv('POSTGRES_USER', 'postgres'),
                'PASSWORD': os.getenv('POSTGRES_PASSWORD', 'postgres'),
                'HOST': os.getenv('POSTGRES_HOST', 'localhost'),
                'PORT': os.getenv('POSTGRES_PORT', '5432'),
            }
        },
        DEFAULT_AUTO_FIELD='django.db.models.BigAutoField',
    )
    django.setup()

from rakam_systems_vectorstore import ConfigurablePgVectorStore, VectorStoreConfig

# Create configuration
config = VectorStoreConfig(
    embedding={
        "model_type": "sentence_transformer",
        "model_name": "Snowflake/snowflake-arctic-embed-m"
    },
    search={
        "similarity_metric": "cosine",
        "enable_hybrid_search": True
    }
)

# Create and use store
store = ConfigurablePgVectorStore(config=config)
store.setup()
store.add_nodes(nodes)
results = store.search("What is AI?", top_k=5)
store.shutdown()

Core Components

Vector Stores

  • ConfigurablePgVectorStore: PostgreSQL with pgvector, supports hybrid search and keyword search
  • FaissStore: In-memory FAISS-based vector search

Embeddings

  • ConfigurableEmbeddings: Supports multiple backends
    • SentenceTransformers (local)
    • OpenAI embeddings
    • Cohere embeddings

Document Loaders

  • AdaptiveLoader: Automatically detects and loads various file types
  • PdfLoader: Advanced PDF processing with Docling
  • PdfLoaderLight: Lightweight PDF to markdown conversion
  • DocLoader: Microsoft Word documents
  • OdtLoader: OpenDocument Text files
  • MdLoader: Markdown files
  • HtmlLoader: HTML files
  • EmlLoader: Email files
  • TabularLoader: CSV, Excel files
  • CodeLoader: Source code files

Chunking

  • TextChunker: Sentence-based chunking with Chonkie
  • AdvancedChunker: Context-aware chunking with heading preservation

Package Structure

rakam-systems-vectorstore/
โ”œโ”€โ”€ src/rakam_systems_vectorstore/
โ”‚   โ”œโ”€โ”€ core.py                  # Node, VSFile, NodeMetadata
โ”‚   โ”œโ”€โ”€ config.py                # VectorStoreConfig
โ”‚   โ”œโ”€โ”€ components/
โ”‚   โ”‚   โ”œโ”€โ”€ vectorstore/         # Store implementations
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ configurable_pg_vectorstore.py
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ faiss_vector_store.py
โ”‚   โ”‚   โ”œโ”€โ”€ embedding_model/     # Embedding models
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ configurable_embeddings.py
โ”‚   โ”‚   โ”œโ”€โ”€ loader/              # Document loaders
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ adaptive_loader.py
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ pdf_loader.py
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ pdf_loader_light.py
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ ... (other loaders)
โ”‚   โ”‚   โ””โ”€โ”€ chunker/             # Text chunkers
โ”‚   โ”‚       โ”œโ”€โ”€ text_chunker.py
โ”‚   โ”‚       โ””โ”€โ”€ advanced_chunker.py
โ”‚   โ”œโ”€โ”€ docs/                    # Package documentation
โ”‚   โ””โ”€โ”€ server/                  # MCP server
โ””โ”€โ”€ pyproject.toml

Search Capabilities

Vector Search

Semantic similarity search using embeddings:

results = store.search("machine learning algorithms", top_k=10)

Keyword Search (BM25)

Full-text search with BM25 ranking:

results = store.keyword_search(
    query="machine learning",
    top_k=10,
    ranking_algorithm="bm25"
)

Hybrid Search

Combines vector and keyword search:

results = store.hybrid_search(
    query="neural networks",
    top_k=10,
    alpha=0.7  # 70% vector, 30% keyword
)

Configuration

From YAML

# vectorstore_config.yaml
name: my_vectorstore

embedding:
  model_type: sentence_transformer
  model_name: Snowflake/snowflake-arctic-embed-m
  batch_size: 128
  normalize: true

database:
  host: localhost
  port: 5432
  database: vectorstore_db
  user: postgres
  password: postgres

search:
  similarity_metric: cosine
  default_top_k: 5
  enable_hybrid_search: true
  hybrid_alpha: 0.7

index:
  chunk_size: 512
  chunk_overlap: 50
config = VectorStoreConfig.from_yaml("vectorstore_config.yaml")
store = ConfigurablePgVectorStore(config=config)

Documentation

Detailed documentation is available in the src/rakam_systems_vectorstore/docs/ directory:

Loader-specific documentation:

Examples

See the examples/ai_vectorstore_examples/ directory in the main repository for complete examples:

  • Basic FAISS example
  • PostgreSQL example
  • Configurable vectorstore examples
  • PDF loader examples
  • Keyword search examples

Environment Variables

  • POSTGRES_HOST: PostgreSQL host (default: localhost)
  • POSTGRES_PORT: PostgreSQL port (default: 5432)
  • POSTGRES_DB: Database name (default: vectorstore_db)
  • POSTGRES_USER: Database user (default: postgres)
  • POSTGRES_PASSWORD: Database password
  • OPENAI_API_KEY: For OpenAI embeddings
  • COHERE_API_KEY: For Cohere embeddings
  • HUGGINGFACE_TOKEN: For private HuggingFace models

License

Apache 2.0

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rakam_systems_vectorstore-0.1.1rc14.tar.gz (344.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rakam_systems_vectorstore-0.1.1rc14-py3-none-any.whl (145.3 kB view details)

Uploaded Python 3

File details

Details for the file rakam_systems_vectorstore-0.1.1rc14.tar.gz.

File metadata

File hashes

Hashes for rakam_systems_vectorstore-0.1.1rc14.tar.gz
Algorithm Hash digest
SHA256 2e376e4f1021d8fd18c19c18f4f0147b7219e885d44cdc91ac22475865c81cc8
MD5 c3827333445e29bf237a43614abd91ea
BLAKE2b-256 18b058b0d62ec3fd5f0d174f1bef1a5d3461e84129b171a77d54bbe6c49a7c21

See more details on using hashes here.

File details

Details for the file rakam_systems_vectorstore-0.1.1rc14-py3-none-any.whl.

File metadata

File hashes

Hashes for rakam_systems_vectorstore-0.1.1rc14-py3-none-any.whl
Algorithm Hash digest
SHA256 52236cf7afd7b487ca34b2d5ee9955e30bade15bf73cda3bc2b416ed653e22e0
MD5 e777534ce00a4069f0f8c0eaed254fe9
BLAKE2b-256 ca45a928a4e8d40ad794c2470e69f83707594f13aada4217b676544fbefee84c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page