Simple embedded vector database for local AI development with automatic embeddings

These details have not been verified by PyPI

Project links

Project description

VittoriaDB Python SDK

VittoriaDB Python SDK is a client library for VittoriaDB, a high-performance embedded vector database built in Go. This SDK provides a clean, Pythonic interface to interact with VittoriaDB server instances, with automatic binary management and server lifecycle control.

🏗️ Architecture

VittoriaDB consists of two components:

🚀 VittoriaDB Server (Go binary): High-performance vector database engine
🐍 Python SDK (this package): Client library with automatic server management

The Python SDK can either:

Auto-manage the Go server binary (downloads, starts, stops automatically)
Connect to an existing VittoriaDB server instance

🚀 Key Features

🎯 Zero Configuration: Works immediately after installation with sensible defaults
🤖 Automatic Embeddings: Server-side text vectorization with multiple model support
📄 Document Processing: Built-in support for PDF, DOCX, TXT, MD, and HTML files
🔧 Auto Binary Management: Automatically downloads and manages VittoriaDB binaries
⚡ High Performance: HNSW indexing provides sub-millisecond search times
🐍 Pythonic API: Clean, intuitive Python interface with type hints
🔌 Dual Mode: Works with existing servers or auto-starts local instances
🤖 RAG-Ready (NEW v0.4.0): Built-in content storage for Retrieval-Augmented Generation

📦 Installation

pip install vittoriadb

The package automatically downloads the appropriate VittoriaDB binary for your platform during installation.

🚀 Server Management

Automatic Server Management (Recommended)

import vittoriadb

# SDK automatically downloads, starts, and manages the VittoriaDB server
db = vittoriadb.connect()  # auto_start=True by default
# ... use the database ...
db.close()  # Automatically stops the server

Manual Server Management

# Download VittoriaDB binary manually
# From: https://github.com/antonellof/VittoriaDB/releases

# Start server manually
./vittoriadb run --port 8080 --data-dir ./data

# In Python, connect to existing server
import vittoriadb
db = vittoriadb.connect(url="http://localhost:8080", auto_start=False)

Connection Options

# Auto-start with custom configuration
db = vittoriadb.connect(
    auto_start=True,
    port=9090,
    host="localhost", 
    data_dir="./my_vectors"
)

# Connect to remote server
db = vittoriadb.connect(
    url="http://remote-server:8080",
    auto_start=False
)

🚀 Quick Start

Basic Usage

import vittoriadb

# Auto-starts VittoriaDB server and connects
db = vittoriadb.connect()

# Create a collection
collection = db.create_collection(
    name="documents",
    dimensions=384,
    metric="cosine"
)

# Insert vectors with metadata
collection.insert(
    id="doc1",
    vector=[0.1, 0.2, 0.3] * 128,  # 384 dimensions
    metadata={"title": "My Document", "category": "tech"}
)

# Search for similar vectors
results = collection.search(
    vector=[0.1, 0.2, 0.3] * 128,
    limit=5,
    include_metadata=True
)

for result in results:
    print(f"ID: {result.id}, Score: {result.score:.4f}")
    print(f"Metadata: {result.metadata}")

# Close connection
db.close()

Automatic Text Embeddings (🚀 NEW!)

import vittoriadb
from vittoriadb.configure import Configure

# Connect to VittoriaDB
db = vittoriadb.connect()

# Create collection with automatic embeddings
collection = db.create_collection(
    name="smart_docs",
    dimensions=384,
    vectorizer_config=Configure.Vectors.auto_embeddings()  # 🎯 Server-side embeddings!
)

# Insert text directly - embeddings generated automatically!
collection.insert_text(
    id="article1",
    text="Artificial intelligence is transforming how we process data.",
    metadata={"category": "AI", "source": "blog"}
)

# Batch insert multiple texts
texts = [
    {
        "id": "article2",
        "text": "Machine learning enables computers to learn from data.",
        "metadata": {"category": "ML"}
    },
    {
        "id": "article3", 
        "text": "Vector databases provide efficient similarity search.",
        "metadata": {"category": "database"}
    }
]
collection.insert_text_batch(texts)

# Search with natural language queries
results = collection.search_text(
    query="artificial intelligence and machine learning",
    limit=3
)

for result in results:
    print(f"Score: {result.score:.4f}")
    print(f"Text: {result.metadata['text'][:100]}...")

db.close()

Document Upload and Processing

import vittoriadb
from vittoriadb.configure import Configure

db = vittoriadb.connect()

# Create collection with vectorizer for automatic processing
collection = db.create_collection(
    name="knowledge_base",
    dimensions=384,
    vectorizer_config=Configure.Vectors.auto_embeddings()
)

# Upload and process documents automatically
result = collection.upload_file(
    file_path="research_paper.pdf",
    chunk_size=600,
    chunk_overlap=100,
    metadata={"source": "research", "year": "2024"}
)

print(f"Processed {result['chunks_created']} chunks")
print(f"Inserted {result['chunks_inserted']} vectors")

# Search the uploaded content
results = collection.search_text(
    query="machine learning algorithms",
    limit=5
)

db.close()

🤖 RAG (Retrieval-Augmented Generation) Support (NEW v0.4.0)

VittoriaDB now includes built-in support for RAG systems by automatically storing original text content alongside vector embeddings.

Content Storage Features

✅ Automatic Content Preservation: Original text stored with vectors
✅ No External Storage Required: Self-contained RAG solution
✅ Configurable Limits: Control storage size and behavior
✅ Fast Retrieval: Single query returns both vectors and content

RAG-Optimized Collection

import vittoriadb
from vittoriadb import ContentStorageConfig
from vittoriadb.configure import Configure

db = vittoriadb.connect()

# Create RAG-optimized collection with content storage
collection = db.create_collection(
    name="rag_documents",
    dimensions=384,
    vectorizer_config=Configure.Vectors.auto_embeddings(),
    content_storage=ContentStorageConfig(
        enabled=True,           # Store original content
        field_name="_content",  # Metadata field name
        max_size=1048576,      # 1MB limit per document
        compressed=False       # Compression (future feature)
    )
)

# Insert documents - content automatically preserved
collection.insert_text(
    id="doc1",
    text="VittoriaDB is a high-performance vector database perfect for RAG applications...",
    metadata={"title": "VittoriaDB Guide", "category": "documentation"}
)

# Search with content retrieval for RAG
results = collection.search_text(
    query="vector database RAG",
    limit=5,
    include_content=True  # Retrieve original content for LLM context
)

# Use results for RAG
for result in results:
    print(f"Score: {result.score:.3f}")
    print(f"Content: {result.content}")  # Original text for LLM context
    print(f"Has content: {result.has_content()}")

db.close()

RAG Workflow Example

# 1. Store knowledge base
documents = [
    "VittoriaDB supports automatic embeddings...",
    "RAG systems combine retrieval and generation...",
    "Vector databases enable semantic search..."
]

for i, doc in enumerate(documents):
    collection.insert_text(f"kb_{i}", doc, {"type": "knowledge"})

# 2. Query with content for LLM
query = "How do vector databases work?"
results = collection.search_text(query, include_content=True)

# 3. Build context for LLM
context = "\n".join([r.content for r in results if r.has_content()])

# 4. Send to your LLM
# response = your_llm.generate(query, context)

🎛️ Vectorizer Configuration

VittoriaDB supports multiple vectorizer backends for automatic embedding generation:

Sentence Transformers (Default)

from vittoriadb.configure import Configure

config = Configure.Vectors.sentence_transformers(
    model="all-MiniLM-L6-v2",
    dimensions=384
)

OpenAI Embeddings

config = Configure.Vectors.openai_embeddings(
    api_key="your-openai-api-key",
    model="text-embedding-ada-002",
    dimensions=1536
)

HuggingFace Models

config = Configure.Vectors.huggingface_embeddings(
    api_key="your-hf-token",  # Optional for public models
    model="sentence-transformers/all-MiniLM-L6-v2",
    dimensions=384
)

Local Ollama

config = Configure.Vectors.ollama_embeddings(
    model="nomic-embed-text",
    dimensions=768,
    base_url="http://localhost:11434"
)

📄 Document Processing

VittoriaDB supports automatic processing of various document formats:

Format	Extension	Status	Features
Plain Text	`.txt`	✅ Fully Supported	Direct text processing
Markdown	`.md`	✅ Fully Supported	Frontmatter parsing
HTML	`.html`	✅ Fully Supported	Tag stripping, metadata
PDF	`.pdf`	✅ Fully Supported	Multi-page text extraction
DOCX	`.docx`	✅ Fully Supported	Properties, text extraction

# Upload multiple document types
for file_path in ["doc.pdf", "guide.docx", "readme.md"]:
    result = collection.upload_file(
        file_path=file_path,
        chunk_size=500,
        metadata={"batch": "docs_2024"}
    )
    print(f"Processed {file_path}: {result['chunks_inserted']} chunks")

🔧 Advanced Configuration

Collection Configuration

# High-performance HNSW configuration
collection = db.create_collection(
    name="large_dataset",
    dimensions=1536,
    metric="cosine",
    index_type="hnsw",
    config={
        "m": 32,                # HNSW connections per node
        "ef_construction": 400,  # Construction search width
        "ef_search": 100        # Search width
    },
    vectorizer_config=Configure.Vectors.openai_embeddings(api_key="your-key")
)

Connection Options

# Connect to existing server
db = vittoriadb.connect(
    url="http://localhost:8080",
    auto_start=False
)

# Auto-start with custom configuration
db = vittoriadb.connect(
    auto_start=True,
    port=9090,
    data_dir="./my_vectors"
)

Search with Filtering

# Search with metadata filters
results = collection.search(
    vector=query_vector,
    limit=10,
    filter={"category": "technology", "year": 2024},
    include_metadata=True
)

# Text search with filters
results = collection.search_text(
    query="machine learning",
    limit=5,
    filter={"source": "research"}
)

📊 Performance and Scalability

Insert Speed: >10,000 vectors/second with flat indexing, >5,000 with HNSW
Search Speed: Sub-millisecond search times for 1M vectors using HNSW
Memory Usage: <100MB for 100,000 vectors (384 dimensions)
Scalability: Tested up to 1 million vectors, supports up to 2,048 dimensions

📋 API Reference

Collection Class

insert(id, vector, metadata=None) - Insert single vector
insert_batch(vectors) - Insert multiple vectors
insert_text(id, text, metadata=None) - Insert text (auto-vectorized with content storage)
insert_text_batch(texts) - Insert multiple texts (auto-vectorized with content storage)
search(vector, limit=10, filter=None, include_content=False) - Vector similarity search
search_text(query, limit=10, filter=None, include_content=False) - Text search with content retrieval
upload_file(file_path, chunk_size=500, **kwargs) - Upload and process document
get(id) - Get vector by ID
delete(id) - Delete vector by ID
count() - Get total vector count

VittoriaDB Class (Enhanced v0.4.0)

connect(url=None, auto_start=True, **kwargs) - Connect to VittoriaDB
create_collection(name, dimensions, metric="cosine", vectorizer_config=None, content_storage=None) - Create collection with content storage
get_collection(name) - Get existing collection
list_collections() - List all collections
delete_collection(name) - Delete collection
health() - Get server health status
close() - Close connection

🤝 Contributing

We welcome contributions!

Users: Report issues and request features on GitHub Issues
Developers: See DEVELOPMENT.md for setup, building, and deployment instructions
General: Check our Contributing Guide for project guidelines

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🔗 Links

GitHub: https://github.com/antonellof/VittoriaDB
PyPI: https://pypi.org/project/vittoriadb/
Issues: https://github.com/antonellof/VittoriaDB/issues

🚀 What's Next?

🔍 Hybrid Search: Combine vector and keyword search
🔐 Authentication: User management and access control
🌐 Distributed Mode: Multi-node clustering support
📊 Analytics: Query performance monitoring and optimization
🎯 More Vectorizers: Support for additional embedding models
🗜️ Content Compression: Compress stored content to save space

📝 Changelog v0.4.0

🆕 New Features

🤖 Built-in Content Storage: Automatic preservation of original text for RAG applications
📋 ContentStorageConfig: Configurable content storage with size limits and field names
🔍 Enhanced Search: New include_content parameter for content retrieval
📊 Collection Info: Content storage configuration in collection metadata

🔧 Enhancements

⚡ Better RAG Support: No external storage required for RAG workflows
🎯 Improved API: Enhanced search methods with content retrieval options
📚 Updated Documentation: Comprehensive RAG examples and best practices

🔄 Backward Compatibility

All existing APIs work unchanged
Content storage is optional and configurable
Default behavior maintains compatibility with v0.3.x

Happy building with VittoriaDB! 🚀

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.7.0

Apr 19, 2026

0.6.3

Apr 19, 2026

0.6.2

Apr 19, 2026

0.6.1

Apr 19, 2026

0.6.0

Sep 26, 2025

0.5.0

Sep 25, 2025

This version

0.4.0

Sep 15, 2025

0.3.0

Sep 14, 2025

0.2.0

Sep 14, 2025

0.1.0

Sep 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vittoriadb-0.4.0.tar.gz (21.3 kB view details)

Uploaded Sep 15, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vittoriadb-0.4.0-py3-none-any.whl (16.8 kB view details)

Uploaded Sep 15, 2025 Python 3

File details

Details for the file vittoriadb-0.4.0.tar.gz.

File metadata

Download URL: vittoriadb-0.4.0.tar.gz
Upload date: Sep 15, 2025
Size: 21.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for vittoriadb-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`309fef18bf77e659fa57c61bdb54e4e86a8c28d2028fb6e66d0cab1936e91f6b`
MD5	`e4ddc83d5a4d00d5dc9a734c4224a62a`
BLAKE2b-256	`4680c59fe6a87421e012c67448d64f91301176e50cd044f28eb502ec87896482`

See more details on using hashes here.

File details

Details for the file vittoriadb-0.4.0-py3-none-any.whl.

File metadata

Download URL: vittoriadb-0.4.0-py3-none-any.whl
Upload date: Sep 15, 2025
Size: 16.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for vittoriadb-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ebe11083727cab65cd0b48d62934a02dc7dc3eaa77cf9889500a7fe68c26d1ce`
MD5	`e4183876b8bf9e810c70daedb571c4ab`
BLAKE2b-256	`0ab644f4adb2b1a27a38eed9aef1004cd0c57b518455a2ec3f9525c26ae3c132`

See more details on using hashes here.

vittoriadb 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

VittoriaDB Python SDK

🏗️ Architecture

🚀 Key Features

📦 Installation

🚀 Server Management

Automatic Server Management (Recommended)

Manual Server Management

Connection Options

🚀 Quick Start

Basic Usage

Automatic Text Embeddings (🚀 NEW!)

Document Upload and Processing

🤖 RAG (Retrieval-Augmented Generation) Support (NEW v0.4.0)

Content Storage Features

RAG-Optimized Collection

RAG Workflow Example

🎛️ Vectorizer Configuration

Sentence Transformers (Default)

OpenAI Embeddings

HuggingFace Models

Local Ollama

📄 Document Processing

🔧 Advanced Configuration

Collection Configuration

Connection Options

Search with Filtering

📊 Performance and Scalability

📋 API Reference

Collection Class

VittoriaDB Class (Enhanced v0.4.0)

🤝 Contributing

📄 License

🔗 Links

🚀 What's Next?

📝 Changelog v0.4.0

🆕 New Features

🔧 Enhancements

🔄 Backward Compatibility

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes