Skip to main content

An embedded vector store for semantic data storage and retrieval

Project description

SemanticStore

Alpha Version GitHub last commit GitHub issues Join Slack

LIBRARY IS HIGHLY EXPERIMENTAL.

Twitter header - 1 (2)

A simple easy to use vector store for small hobby projects, might be the simplest vector database on github

A versatile vector store designed for multimodal search. This store seamlessly integrates with Faiss to provide efficient similarity search capabilities. Whether you're working with image, text, or audio data, SemanticStore has you covered.

Features

  • Multimodal Support: Handle a wide range of data types, including image, text, and audio vectors.
  • Faiss Integration: Utilize the speed and efficiency of Faiss for similarity search.
  • Custom Pipelines: Customizable pipelines for processing and indexing your data.

Getting Started

Follow these steps to get started with the SemanticStore:

  1. Install into environment
pip install semantic-store
  1. Install requirements

Will try to package CLIP in next release version

pip install git+https://github.com/openai/CLIP.git
  1. Get started in python
from SemanticStore import Store

store = Store()
store.connect('semantic.db')
store.insert('knowledge_base.txt')
store.commit()
res = store.search("what is meaning of life according to knowledge base ?", 5, modals=['text', 'image'])
print(res)

Use cases

1. Search Engines

A flask server that serves semantic search results.

from flask import Flask, request, jsonify
from SemanticStore import Store
store = Store()
store.connect('semantic.db')


# handles insertion of remote/network files, can be easily integrated with services like firebase, supabase.
store.insert('https://some/shutter/stock/image.jpeg')
store.insert('https://some/firebase/profile/photo.jpeg')
store.insert('https://some/remote/pdf.pdf')

@app.route("/search/<string:query>")
def recommend(query) :
   result = store.search(query=query, top_k = 5, modals=['image','text', 'audio'])
   search_results = []

   for image in results.images :
      search_results.append(image.file_path)
   
   for text in results.texts :
      search_results.append(text.file_path)
   
   for audio in results.audios :
      search_results.append(audio.file_path)
   
   return jsonify({'search_results' : search_results}) 

2. Building RAG and HyDE agents :

# A simple RAG agent.
from SemanticStore import Store
store = Store()
store.connect('semantic.db')

store.insert('some/notes.txt')
store.insert('some/lecture.mp3')
store.insert('some/research.pdf')

def LLM(prompt) :
   # Write your LLM connection and streaming logic here
   pass

query = input("Ask a query")
while query!="QUIT" :
    
    # handles retrival of most relevant chunks from inserted text and audio files
    query_results = store.search(query={query}, top_k=5, modals=['text', 'audio'])
    context = ' '.join(query_results.texts.chunks()) 
    context += ' '.join(query_results.audios.chunks()) 

    prompt = f"""
    GIVEN THIS CONTEXT : {context}

    ANSWER THE FOLLOWING QUERY : {query}

    """
    response = LLM(prompt)
    print(response)
    query = input()

3. Building Recommender Systems :

A simple flask server that recommends similar images to the requested image.

from flask import Flask, request, jsonify
from SemanticStore import Store
store = Store()
store.connect('semantic.db')


# handles insertion of remote/network files, can be easily integrated with services like firebase, supabase.
store.insert('https://some/shutter/stock/image.jpeg')
store.insert('https://some/firebase/profile/photo.jpeg')

@app.route("/recommend/<string:image_url>")
def recommend(image_url) :
   # as of now only image to image search is working, will be adding more modalities. Try contributing
   result = store.multimodal_search(path=image_url, top_k = 5, modals=['image'])
   
   recommendations = []
   for image in results.images :
      recommendations.append(image.file_path)
   
   return jsonify({'recommendations' : recommendations})   

Internal Architecture

To learn more click here

  • I need help in implementing the remove function.

  • If you are a senior and find this project interesting and have suggestions please mail me, suggestions will be greatly appreciated.

Models

SemanticStore uses various state-of-the-art models to process text, images and audio.

Pipelines Model 1 Model 2 Model 3 Model 4
Text multi-qa-MiniLM-L6-cos-v1 CLIP - -
Audio Whisper multi-qa-MiniLM-L6-cos-v1 CLIP -
Image CLIP BLIP multi-qa-MiniLM-L6-cos-v1 -
Video Whisper CLIP BLIP multi-qa-MiniLM-L6-cos-v1

Note : Models and pipeines in Italics are still to be implemented.

Contributing

Contributions are welcome! If you'd like to enhance the SemanticStore or fix issues, please follow these steps:

  1. Fork the repository.
  2. Create a branch: git checkout -b feature/your-feature or fix/your-fix.
  3. Commit your changes: git commit -m 'Add some feature' or git commit -m 'Fix some issue'.
  4. Push to the branch: git push origin feature/your-feature or git push origin fix/your-fix.
  5. Open a pull request

Note: This vector store is intended for small hobby projects and personal use. It may not be suitable for large-scale or production environments.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semantic-store-0.0.9.tar.gz (14.9 kB view hashes)

Uploaded Source

Built Distribution

semantic_store-0.0.9-py3-none-any.whl (16.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page