Skip to main content

A custom vector storage and search solution

Project description

Dimensia

Dimensia is a high-performance vector database designed for efficient semantic search and storage of vector embeddings. It supports adding documents, performing searches, and managing collections using customizable embedding models. Dimensia is ideal for use cases like information retrieval, recommendation systems, and other machine learning tasks that require fast and efficient access to high-dimensional vector data.

Features

  • Collections: Create and manage multiple collections of documents with associated metadata.
  • Similarity Search: Perform semantic search to find the most similar documents in a collection.
  • Document Management: Add, retrieve, and manage documents by ID within collections.
  • Embedding Model Support: Easily integrate with models from sentence-transformers for generating vector embeddings.
  • Efficient Indexing: Uses HNSW (Hierarchical Navigable Small World) index for fast nearest-neighbor search.

Installation

To install Dimensia, simply run the following command:

pip install dimensia

Usage

from dimensia import Dimensia

# Initialize the database
db = Dimensia(db_path="dimensia_db")

# Set the embedding model
db.set_embedding_model("sentence-transformers/paraphrase-MiniLM-L6-v2")
print("Embedding model set successfully.")

# Create collections
db.create_collection("collection_1", metadata_schema={"field1": "type1", "field2": "type2"})
db.create_collection("collection_2", metadata_schema={"field1": "type1", "field2": "type2"})
print("Collections created successfully.")

# Verify collections
collections = db.get_collections()
print(f"Collections: {collections}")

# Add documents to the collections
documents_1 = [
    {"id": "1", "content": "This is a document about deep learning."},
    {"id": "2", "content": "This document covers natural language processing."}
]

documents_2 = [
    {"id": "3", "content": "This document is about reinforcement learning."},
    {"id": "4", "content": "This document discusses machine learning in general."}
]

db.add_documents("collection_1", documents_1)
db.add_documents("collection_2", documents_2)
print("Documents added successfully.")

# Perform searches in collections
print("\nPerforming search in Collection 1:")
query_1 = "Tell me about NLP"
results_1 = db.search(query_1, "collection_1", top_k=2)
for result in results_1:
    print(f"Document ID: {result['document']['id']}, Similarity: {result['score']}")

print("\nPerforming search in Collection 2:")
query_2 = "What is reinforcement learning?"
results_2 = db.search(query_2, "collection_2", top_k=2)
for result in results_2:
    print(f"Document ID: {result['document']['id']}, Similarity: {result['score']}")

# Retrieve collection schema
schema_1 = db.get_collection_schema("collection_1")
print(f"Schema for Collection 1: {schema_1}")

# Retrieve a document by ID
doc_1 = db.get_document("collection_1", "1")
print(f"Retrieved Document from Collection 1: {doc_1}")

# Get vector size (dimension of the embedding)
vector_size = db.get_vector_size()
print(f"Vector size: {vector_size}")

Requirements

Dimensia requires the following dependencies:

  • numpy==1.26.4
  • torch==2.2.2
  • sentence-transformers==3.3.1

Contributing

We welcome contributions to improve Dimensia! Please fork the repository, make your changes, and submit a pull request.

Support

If you encounter any issues or have questions, please don't hesitate to open an issue on our GitHub repository. We welcome feedback, bug reports, and feature requests!

We strive to respond as quickly as possible to all issues and questions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dimensia-0.1.1.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

Dimensia-0.1.1-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file dimensia-0.1.1.tar.gz.

File metadata

  • Download URL: dimensia-0.1.1.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for dimensia-0.1.1.tar.gz
Algorithm Hash digest
SHA256 83d6ce68596ddb4459380fe22cccc1beb993352fbf5831edf616995faee53693
MD5 99128aaacbaf3f31284d78e47aa6706d
BLAKE2b-256 d28927bd4b8340de95e60c77b8a42d434190cccf5e1d232db5918c4f6eb2126c

See more details on using hashes here.

File details

Details for the file Dimensia-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: Dimensia-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 9.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for Dimensia-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c4d3ba9f43ffa0af0f57bdd66c5edbe3ff7b6466c581be1f96e320ee893586ad
MD5 4b63a147d96c6a275c448fb92a1aa0f9
BLAKE2b-256 e80d6b9a93a6c85ba45f6b9aaa16e8ee9acca216f974c2fccfc38115924e83e3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page