Utility package for interacting with vectorstores
Project description
Rakam System Vectorstore
The vectorstore package of Rakam Systems providing vector database solutions and document processing capabilities.
Overview
rakam-systems-vectorstore provides comprehensive vector storage, embedding models, and document loading capabilities. This package depends on rakam-systems-core.
Features
- Configuration-First Design: Change your entire vector store setup via YAML - no code changes
- Multiple Backends: PostgreSQL with pgvector and FAISS in-memory storage
- Flexible Embeddings: Support for SentenceTransformers, OpenAI, and Cohere
- Document Loaders: PDF, DOCX, HTML, Markdown, CSV, and more
- Search Capabilities: Vector search, keyword search (BM25), and hybrid search
- Chunking: Intelligent text chunking with context preservation
- Configuration: Comprehensive YAML/JSON configuration support
๐ฏ Configuration Convenience
The vectorstore package's configurable design allows you to:
- Switch embedding models without code changes (local โ OpenAI โ Cohere)
- Change search algorithms instantly (BM25 โ ts_rank โ hybrid)
- Adjust search parameters (similarity metrics, top-k, hybrid weights)
- Toggle features (hybrid search, caching, reranking)
- Tune performance (batch sizes, chunk sizes, connection pools)
- Swap backends (FAISS โ PostgreSQL) by updating config
Example: Test different embedding models to find the best accuracy/cost balance - just update your YAML config file, no code changes needed!
Installation
# Requires core package
pip install -e ./rakam-systems-core
# Install vectorstore package
pip install -e ./rakam-systems-vectorstore
# With specific backends
pip install -e "./rakam-systems-vectorstore[postgres]"
pip install -e "./rakam-systems-vectorstore[faiss]"
pip install -e "./rakam-systems-vectorstore[all]"
Quick Start
FAISS Vector Store (In-Memory)
from rakam_systems_vectorstore.components.vectorstore.faiss_vector_store import FaissStore
from rakam_systems_vectorstore.core import Node, NodeMetadata
# Create store
store = FaissStore(
name="my_store",
base_index_path="./indexes",
embedding_model="Snowflake/snowflake-arctic-embed-m",
initialising=True
)
# Create nodes
nodes = [
Node(
content="Python is great for AI",
metadata=NodeMetadata(source_file_uuid="doc1", position=0)
)
]
# Add and search
store.create_collection_from_nodes("my_collection", nodes)
results, _ = store.search("my_collection", "AI programming", number=5)
PostgreSQL Vector Store
import os
import django
from django.conf import settings
# Configure Django (required)
if not settings.configured:
settings.configure(
INSTALLED_APPS=[
'django.contrib.contenttypes',
'rakam_systems_vectorstore.components.vectorstore',
],
DATABASES={
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': os.getenv('POSTGRES_DB', 'vectorstore_db'),
'USER': os.getenv('POSTGRES_USER', 'postgres'),
'PASSWORD': os.getenv('POSTGRES_PASSWORD', 'postgres'),
'HOST': os.getenv('POSTGRES_HOST', 'localhost'),
'PORT': os.getenv('POSTGRES_PORT', '5432'),
}
},
DEFAULT_AUTO_FIELD='django.db.models.BigAutoField',
)
django.setup()
from rakam_systems_vectorstore import ConfigurablePgVectorStore, VectorStoreConfig
# Create configuration
config = VectorStoreConfig(
embedding={
"model_type": "sentence_transformer",
"model_name": "Snowflake/snowflake-arctic-embed-m"
},
search={
"similarity_metric": "cosine",
"enable_hybrid_search": True
}
)
# Create and use store
store = ConfigurablePgVectorStore(config=config)
store.setup()
store.add_nodes(nodes)
results = store.search("What is AI?", top_k=5)
store.shutdown()
Core Components
Vector Stores
- ConfigurablePgVectorStore: PostgreSQL with pgvector, supports hybrid search and keyword search
- FaissStore: In-memory FAISS-based vector search
Embeddings
- ConfigurableEmbeddings: Supports multiple backends
- SentenceTransformers (local)
- OpenAI embeddings
- Cohere embeddings
Document Loaders
- AdaptiveLoader: Automatically detects and loads various file types
- PdfLoader: Advanced PDF processing with Docling
- PdfLoaderLight: Lightweight PDF to markdown conversion
- DocLoader: Microsoft Word documents
- OdtLoader: OpenDocument Text files
- MdLoader: Markdown files
- HtmlLoader: HTML files
- EmlLoader: Email files
- TabularLoader: CSV, Excel files
- CodeLoader: Source code files
Chunking
- TextChunker: Sentence-based chunking with Chonkie
- AdvancedChunker: Context-aware chunking with heading preservation
Package Structure
rakam-systems-vectorstore/
โโโ src/rakam_systems_vectorstore/
โ โโโ core.py # Node, VSFile, NodeMetadata
โ โโโ config.py # VectorStoreConfig
โ โโโ components/
โ โ โโโ vectorstore/ # Store implementations
โ โ โ โโโ configurable_pg_vectorstore.py
โ โ โ โโโ faiss_vector_store.py
โ โ โโโ embedding_model/ # Embedding models
โ โ โ โโโ configurable_embeddings.py
โ โ โโโ loader/ # Document loaders
โ โ โ โโโ adaptive_loader.py
โ โ โ โโโ pdf_loader.py
โ โ โ โโโ pdf_loader_light.py
โ โ โ โโโ ... (other loaders)
โ โ โโโ chunker/ # Text chunkers
โ โ โโโ text_chunker.py
โ โ โโโ advanced_chunker.py
โ โโโ docs/ # Package documentation
โ โโโ server/ # MCP server
โโโ pyproject.toml
Search Capabilities
Vector Search
Semantic similarity search using embeddings:
results = store.search("machine learning algorithms", top_k=10)
Keyword Search (BM25)
Full-text search with BM25 ranking:
results = store.keyword_search(
query="machine learning",
top_k=10,
ranking_algorithm="bm25"
)
Hybrid Search
Combines vector and keyword search:
results = store.hybrid_search(
query="neural networks",
top_k=10,
alpha=0.7 # 70% vector, 30% keyword
)
Configuration
From YAML
# vectorstore_config.yaml
name: my_vectorstore
embedding:
model_type: sentence_transformer
model_name: Snowflake/snowflake-arctic-embed-m
batch_size: 128
normalize: true
database:
host: localhost
port: 5432
database: vectorstore_db
user: postgres
password: postgres
search:
similarity_metric: cosine
default_top_k: 5
enable_hybrid_search: true
hybrid_alpha: 0.7
index:
chunk_size: 512
chunk_overlap: 50
config = VectorStoreConfig.from_yaml("vectorstore_config.yaml")
store = ConfigurablePgVectorStore(config=config)
Documentation
Detailed documentation is available in the src/rakam_systems_vectorstore/docs/ directory:
Loader-specific documentation:
Examples
See the examples/ai_vectorstore_examples/ directory in the main repository for complete examples:
- Basic FAISS example
- PostgreSQL example
- Configurable vectorstore examples
- PDF loader examples
- Keyword search examples
Environment Variables
POSTGRES_HOST: PostgreSQL host (default: localhost)POSTGRES_PORT: PostgreSQL port (default: 5432)POSTGRES_DB: Database name (default: vectorstore_db)POSTGRES_USER: Database user (default: postgres)POSTGRES_PASSWORD: Database passwordOPENAI_API_KEY: For OpenAI embeddingsCOHERE_API_KEY: For Cohere embeddingsHUGGINGFACE_TOKEN: For private HuggingFace models
License
Apache 2.0
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rakam_systems_vectorstore-0.1.1rc14.tar.gz.
File metadata
- Download URL: rakam_systems_vectorstore-0.1.1rc14.tar.gz
- Upload date:
- Size: 344.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e376e4f1021d8fd18c19c18f4f0147b7219e885d44cdc91ac22475865c81cc8
|
|
| MD5 |
c3827333445e29bf237a43614abd91ea
|
|
| BLAKE2b-256 |
18b058b0d62ec3fd5f0d174f1bef1a5d3461e84129b171a77d54bbe6c49a7c21
|
File details
Details for the file rakam_systems_vectorstore-0.1.1rc14-py3-none-any.whl.
File metadata
- Download URL: rakam_systems_vectorstore-0.1.1rc14-py3-none-any.whl
- Upload date:
- Size: 145.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52236cf7afd7b487ca34b2d5ee9955e30bade15bf73cda3bc2b416ed653e22e0
|
|
| MD5 |
e777534ce00a4069f0f8c0eaed254fe9
|
|
| BLAKE2b-256 |
ca45a928a4e8d40ad794c2470e69f83707594f13aada4217b676544fbefee84c
|