A custom vector storage and search solution
Project description
Dimensia
Dimensia
is a high-performance vector database designed for efficient semantic search and storage of vector embeddings. It supports adding documents, performing searches, and managing collections using customizable embedding models. Dimensia is ideal for use cases like information retrieval, recommendation systems, and other machine learning tasks that require fast and efficient access to high-dimensional vector data.
Features
- Collections: Create and manage multiple collections of documents with associated metadata.
- Similarity Search: Perform semantic search to find the most similar documents in a collection.
- Document Management: Add, retrieve, and manage documents by ID within collections.
- Embedding Model Support: Easily integrate with models from
sentence-transformers
for generating vector embeddings. - Efficient Indexing: Uses HNSW (Hierarchical Navigable Small World) index for fast nearest-neighbor search.
Installation
To install Dimensia, simply run the following command:
pip install dimensia
Usage
from dimensia import Dimensia
# Initialize the database
db = Dimensia(db_path="dimensia_db")
# Set the embedding model
db.set_embedding_model("sentence-transformers/paraphrase-MiniLM-L6-v2")
print("Embedding model set successfully.")
# Create collections
db.create_collection("collection_1", metadata_schema={"field1": "type1", "field2": "type2"})
db.create_collection("collection_2", metadata_schema={"field1": "type1", "field2": "type2"})
print("Collections created successfully.")
# Verify collections
collections = db.get_collections()
print(f"Collections: {collections}")
# Add documents to the collections
documents_1 = [
{"id": "1", "content": "This is a document about deep learning."},
{"id": "2", "content": "This document covers natural language processing."}
]
documents_2 = [
{"id": "3", "content": "This document is about reinforcement learning."},
{"id": "4", "content": "This document discusses machine learning in general."}
]
db.add_documents("collection_1", documents_1)
db.add_documents("collection_2", documents_2)
print("Documents added successfully.")
# Perform searches in collections
print("\nPerforming search in Collection 1:")
query_1 = "Tell me about NLP"
results_1 = db.search(query_1, "collection_1", top_k=2)
for result in results_1:
print(f"Document ID: {result['document']['id']}, Similarity: {result['score']}")
print("\nPerforming search in Collection 2:")
query_2 = "What is reinforcement learning?"
results_2 = db.search(query_2, "collection_2", top_k=2)
for result in results_2:
print(f"Document ID: {result['document']['id']}, Similarity: {result['score']}")
# Retrieve collection schema
schema_1 = db.get_collection_schema("collection_1")
print(f"Schema for Collection 1: {schema_1}")
# Retrieve a document by ID
doc_1 = db.get_document("collection_1", "1")
print(f"Retrieved Document from Collection 1: {doc_1}")
# Get vector size (dimension of the embedding)
vector_size = db.get_vector_size()
print(f"Vector size: {vector_size}")
Requirements
Dimensia
requires the following dependencies:
numpy==1.26.4
torch==2.2.2
sentence-transformers==3.3.1
Contributing
We welcome contributions to improve Dimensia! Please fork the repository, make your changes, and submit a pull request.
Support
If you encounter any issues or have questions, please don't hesitate to open an issue on our GitHub repository. We welcome feedback, bug reports, and feature requests!
We strive to respond as quickly as possible to all issues and questions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dimensia-0.1.1.tar.gz
.
File metadata
- Download URL: dimensia-0.1.1.tar.gz
- Upload date:
- Size: 8.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 83d6ce68596ddb4459380fe22cccc1beb993352fbf5831edf616995faee53693 |
|
MD5 | 99128aaacbaf3f31284d78e47aa6706d |
|
BLAKE2b-256 | d28927bd4b8340de95e60c77b8a42d434190cccf5e1d232db5918c4f6eb2126c |
File details
Details for the file Dimensia-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: Dimensia-0.1.1-py3-none-any.whl
- Upload date:
- Size: 9.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c4d3ba9f43ffa0af0f57bdd66c5edbe3ff7b6466c581be1f96e320ee893586ad |
|
MD5 | 4b63a147d96c6a275c448fb92a1aa0f9 |
|
BLAKE2b-256 | e80d6b9a93a6c85ba45f6b9aaa16e8ee9acca216f974c2fccfc38115924e83e3 |