An embedded vector store for semantic data storage and retrieval
Project description
SemanticStore
LIBRARY IS HIGHLY EXPERIMENTAL.
A simple easy to use vector store for small hobby projects, might be the simplest vector database on github
A versatile vector store designed for multimodal search. This store seamlessly integrates with Faiss to provide efficient similarity search capabilities. Whether you're working with image, text, or audio data, SemanticStore has you covered.
Features
- Multimodal Support: Handle a wide range of data types, including image, text, and audio vectors.
- Faiss Integration: Utilize the speed and efficiency of Faiss for similarity search.
- Custom Pipelines: Customizable pipelines for processing and indexing your data.
Getting Started
Follow these steps to get started with the SemanticStore:
- Install into environment
pip install semantic-store
- Install requirements
Will try to package CLIP in next release version
pip install git+https://github.com/openai/CLIP.git
- Get started in python
from SemanticStore import Store
store = Store()
store.connect('semantic.db')
store.insert('knowledge_base.txt')
store.commit()
res = store.search("what is meaning of life according to knowledge base ?", 5, modals=['text', 'image'])
print(res)
Use cases
1. Search Engines
A flask server that serves semantic search results.
from flask import Flask, request, jsonify
from SemanticStore import Store
store = Store()
store.connect('semantic.db')
# handles insertion of remote/network files, can be easily integrated with services like firebase, supabase.
store.insert('https://some/shutter/stock/image.jpeg')
store.insert('https://some/firebase/profile/photo.jpeg')
store.insert('https://some/remote/pdf.pdf')
@app.route("/search/<string:query>")
def recommend(query) :
result = store.search(query=query, top_k = 5, modals=['image','text', 'audio'])
search_results = []
for image in results.images :
search_results.append(image.file_path)
for text in results.texts :
search_results.append(text.file_path)
for audio in results.audios :
search_results.append(audio.file_path)
return jsonify({'search_results' : search_results})
2. Building RAG and HyDE agents :
# A simple RAG agent.
from SemanticStore import Store
store = Store()
store.connect('semantic.db')
store.insert('some/notes.txt')
store.insert('some/lecture.mp3')
store.insert('some/research.pdf')
def LLM(prompt) :
# Write your LLM connection and streaming logic here
pass
query = input("Ask a query")
while query!="QUIT" :
# handles retrival of most relevant chunks from inserted text and audio files
query_results = store.search(query={query}, top_k=5, modals=['text', 'audio'])
context = ' '.join(query_results.texts.chunks())
context += ' '.join(query_results.audios.chunks())
prompt = f"""
GIVEN THIS CONTEXT : {context}
ANSWER THE FOLLOWING QUERY : {query}
"""
response = LLM(prompt)
print(response)
query = input()
3. Building Recommender Systems :
A simple flask server that recommends similar images to the requested image.
from flask import Flask, request, jsonify
from SemanticStore import Store
store = Store()
store.connect('semantic.db')
# handles insertion of remote/network files, can be easily integrated with services like firebase, supabase.
store.insert('https://some/shutter/stock/image.jpeg')
store.insert('https://some/firebase/profile/photo.jpeg')
@app.route("/recommend/<string:image_url>")
def recommend(image_url) :
# as of now only image to image search is working, will be adding more modalities. Try contributing
result = store.multimodal_search(path=image_url, top_k = 5, modals=['image'])
recommendations = []
for image in results.images :
recommendations.append(image.file_path)
return jsonify({'recommendations' : recommendations})
Internal Architecture
To learn more click here
-
I need help in implementing the
remove
function. -
If you are a senior and find this project interesting and have suggestions please mail me, suggestions will be greatly appreciated.
Models
SemanticStore uses various state-of-the-art models to process text, images and audio.
Pipelines | Model 1 | Model 2 | Model 3 | Model 4 |
---|---|---|---|---|
Text | multi-qa-MiniLM-L6-cos-v1 | CLIP | - | - |
Audio | Whisper | multi-qa-MiniLM-L6-cos-v1 | CLIP | - |
Image | CLIP | BLIP | multi-qa-MiniLM-L6-cos-v1 | - |
Video | Whisper | CLIP | BLIP | multi-qa-MiniLM-L6-cos-v1 |
Note : Models and pipeines in Italics are still to be implemented.
Contributing
Contributions are welcome! If you'd like to enhance the SemanticStore or fix issues, please follow these steps:
- Fork the repository.
- Create a branch: git checkout -b feature/your-feature or fix/your-fix.
- Commit your changes: git commit -m 'Add some feature' or git commit -m 'Fix some issue'.
- Push to the branch: git push origin feature/your-feature or git push origin fix/your-fix.
- Open a pull request
Note: This vector store is intended for small hobby projects and personal use. It may not be suitable for large-scale or production environments.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for semantic_store-0.0.9-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 587da90df18d02a42171c3a68bfd6223a4c1c62d6ffd7d2d6102a0f899ce5819 |
|
MD5 | 40e8c5d389b6d2347107547c22afeab1 |
|
BLAKE2b-256 | f800a2b3c85b9c55edea8c2c8cb4c0b59c053c41bcbda0e26f5bcb16b3028b54 |