Skip to main content

Add your description here

Project description

Rakam System Vectorstore

The vectorstore package of Rakam Systems providing vector database solutions and document processing capabilities.

Overview

rakam-system-vectorstore provides comprehensive vector storage, embedding models, and document loading capabilities. This package depends on rakam-system-core.

Features

  • Configuration-First Design: Change your entire vector store setup via YAML - no code changes
  • Multiple Backends: PostgreSQL with pgvector and FAISS in-memory storage
  • Flexible Embeddings: Support for SentenceTransformers, OpenAI, and Cohere
  • Document Loaders: PDF, DOCX, HTML, Markdown, CSV, and more
  • Search Capabilities: Vector search, keyword search (BM25), and hybrid search
  • Chunking: Intelligent text chunking with context preservation
  • Configuration: Comprehensive YAML/JSON configuration support

๐ŸŽฏ Configuration Convenience

The vectorstore package's configurable design allows you to:

  • Switch embedding models without code changes (local โ†” OpenAI โ†” Cohere)
  • Change search algorithms instantly (BM25 โ†” ts_rank โ†” hybrid)
  • Adjust search parameters (similarity metrics, top-k, hybrid weights)
  • Toggle features (hybrid search, caching, reranking)
  • Tune performance (batch sizes, chunk sizes, connection pools)
  • Swap backends (FAISS โ†” PostgreSQL) by updating config

Example: Test different embedding models to find the best accuracy/cost balance - just update your YAML config file, no code changes needed!

Installation

# Requires core package
pip install -e ./rakam-system-core

# Install vectorstore package
pip install -e ./rakam-system-vectorstore

# With specific backends
pip install -e "./rakam-system-vectorstore[postgres]"
pip install -e "./rakam-system-vectorstore[faiss]"
pip install -e "./rakam-system-vectorstore[all]"

Quick Start

FAISS Vector Store (In-Memory)

from rakam_system_vectorstore.components.vectorstore.faiss_vector_store import FaissStore
from rakam_system_vectorstore.core import Node, NodeMetadata

# Create store
store = FaissStore(
    name="my_store",
    base_index_path="./indexes",
    embedding_model="Snowflake/snowflake-arctic-embed-m",
    initialising=True
)

# Create nodes
nodes = [
    Node(
        content="Python is great for AI",
        metadata=NodeMetadata(source_file_uuid="doc1", position=0)
    )
]

# Add and search
store.create_collection_from_nodes("my_collection", nodes)
results, _ = store.search("my_collection", "AI programming", number=5)

PostgreSQL Vector Store

import os
import django
from django.conf import settings

# Configure Django (required)
if not settings.configured:
    settings.configure(
        INSTALLED_APPS=[
            'django.contrib.contenttypes',
            'rakam_system_vectorstore.components.vectorstore',
        ],
        DATABASES={
            'default': {
                'ENGINE': 'django.db.backends.postgresql',
                'NAME': os.getenv('POSTGRES_DB', 'vectorstore_db'),
                'USER': os.getenv('POSTGRES_USER', 'postgres'),
                'PASSWORD': os.getenv('POSTGRES_PASSWORD', 'postgres'),
                'HOST': os.getenv('POSTGRES_HOST', 'localhost'),
                'PORT': os.getenv('POSTGRES_PORT', '5432'),
            }
        },
        DEFAULT_AUTO_FIELD='django.db.models.BigAutoField',
    )
    django.setup()

from rakam_system_vectorstore import ConfigurablePgVectorStore, VectorStoreConfig

# Create configuration
config = VectorStoreConfig(
    embedding={
        "model_type": "sentence_transformer",
        "model_name": "Snowflake/snowflake-arctic-embed-m"
    },
    search={
        "similarity_metric": "cosine",
        "enable_hybrid_search": True
    }
)

# Create and use store
store = ConfigurablePgVectorStore(config=config)
store.setup()
store.add_nodes(nodes)
results = store.search("What is AI?", top_k=5)
store.shutdown()

Core Components

Vector Stores

  • ConfigurablePgVectorStore: PostgreSQL with pgvector, supports hybrid search and keyword search
  • FaissStore: In-memory FAISS-based vector search

Embeddings

  • ConfigurableEmbeddings: Supports multiple backends
    • SentenceTransformers (local)
    • OpenAI embeddings
    • Cohere embeddings

Document Loaders

  • AdaptiveLoader: Automatically detects and loads various file types
  • PdfLoader: Advanced PDF processing with Docling
  • PdfLoaderLight: Lightweight PDF to markdown conversion
  • DocLoader: Microsoft Word documents
  • OdtLoader: OpenDocument Text files
  • MdLoader: Markdown files
  • HtmlLoader: HTML files
  • EmlLoader: Email files
  • TabularLoader: CSV, Excel files
  • CodeLoader: Source code files

Chunking

  • TextChunker: Sentence-based chunking with Chonkie
  • AdvancedChunker: Context-aware chunking with heading preservation

Package Structure

rakam-system-vectorstore/
โ”œโ”€โ”€ src/rakam_system_vectorstore/
โ”‚   โ”œโ”€โ”€ core.py                  # Node, VSFile, NodeMetadata
โ”‚   โ”œโ”€โ”€ config.py                # VectorStoreConfig
โ”‚   โ”œโ”€โ”€ components/
โ”‚   โ”‚   โ”œโ”€โ”€ vectorstore/         # Store implementations
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ configurable_pg_vectorstore.py
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ faiss_vector_store.py
โ”‚   โ”‚   โ”œโ”€โ”€ embedding_model/     # Embedding models
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ configurable_embeddings.py
โ”‚   โ”‚   โ”œโ”€โ”€ loader/              # Document loaders
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ adaptive_loader.py
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ pdf_loader.py
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ pdf_loader_light.py
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ ... (other loaders)
โ”‚   โ”‚   โ””โ”€โ”€ chunker/             # Text chunkers
โ”‚   โ”‚       โ”œโ”€โ”€ text_chunker.py
โ”‚   โ”‚       โ””โ”€โ”€ advanced_chunker.py
โ”‚   โ”œโ”€โ”€ docs/                    # Package documentation
โ”‚   โ””โ”€โ”€ server/                  # MCP server
โ””โ”€โ”€ pyproject.toml

Search Capabilities

Vector Search

Semantic similarity search using embeddings:

results = store.search("machine learning algorithms", top_k=10)

Keyword Search (BM25)

Full-text search with BM25 ranking:

results = store.keyword_search(
    query="machine learning",
    top_k=10,
    ranking_algorithm="bm25"
)

Hybrid Search

Combines vector and keyword search:

results = store.hybrid_search(
    query="neural networks",
    top_k=10,
    alpha=0.7  # 70% vector, 30% keyword
)

Configuration

From YAML

# vectorstore_config.yaml
name: my_vectorstore

embedding:
  model_type: sentence_transformer
  model_name: Snowflake/snowflake-arctic-embed-m
  batch_size: 128
  normalize: true

database:
  host: localhost
  port: 5432
  database: vectorstore_db
  user: postgres
  password: postgres

search:
  similarity_metric: cosine
  default_top_k: 5
  enable_hybrid_search: true
  hybrid_alpha: 0.7

index:
  chunk_size: 512
  chunk_overlap: 50
config = VectorStoreConfig.from_yaml("vectorstore_config.yaml")
store = ConfigurablePgVectorStore(config=config)

Documentation

Detailed documentation is available in the src/rakam_system_vectorstore/docs/ directory:

Loader-specific documentation:

Examples

See the examples/ai_vectorstore_examples/ directory in the main repository for complete examples:

  • Basic FAISS example
  • PostgreSQL example
  • Configurable vectorstore examples
  • PDF loader examples
  • Keyword search examples

Environment Variables

  • POSTGRES_HOST: PostgreSQL host (default: localhost)
  • POSTGRES_PORT: PostgreSQL port (default: 5432)
  • POSTGRES_DB: Database name (default: vectorstore_db)
  • POSTGRES_USER: Database user (default: postgres)
  • POSTGRES_PASSWORD: Database password
  • OPENAI_API_KEY: For OpenAI embeddings
  • COHERE_API_KEY: For Cohere embeddings
  • HUGGINGFACE_TOKEN: For private HuggingFace models

License

Apache 2.0

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rakam_system_vectorstore-0.1.2.post1.tar.gz (100.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rakam_system_vectorstore-0.1.2.post1-py3-none-any.whl (133.8 kB view details)

Uploaded Python 3

File details

Details for the file rakam_system_vectorstore-0.1.2.post1.tar.gz.

File metadata

  • Download URL: rakam_system_vectorstore-0.1.2.post1.tar.gz
  • Upload date:
  • Size: 100.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for rakam_system_vectorstore-0.1.2.post1.tar.gz
Algorithm Hash digest
SHA256 f2a04b9f90927474d51957f0cfff1e1f58a25bb48ca1d2a4ce13b0475f01cb52
MD5 98caa4e747fc567ce2abcac71e6b945a
BLAKE2b-256 ba4203fe4630716f538cdfaa38782066c5e0f82e614f46b3ef354a9712f25ece

See more details on using hashes here.

File details

Details for the file rakam_system_vectorstore-0.1.2.post1-py3-none-any.whl.

File metadata

  • Download URL: rakam_system_vectorstore-0.1.2.post1-py3-none-any.whl
  • Upload date:
  • Size: 133.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for rakam_system_vectorstore-0.1.2.post1-py3-none-any.whl
Algorithm Hash digest
SHA256 21108fff308aac059d4f698ca6488d0e626885082ea0ba70ca8265a64b87f070
MD5 7332734f7767b963bfbc730107872f6b
BLAKE2b-256 e96abb8c1223d601a09a306af995f9f3d88e397bf363beb4c5254a84751fe38b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page