ML toolkit for developers to build document, audio, and image similarity retrieval systems with pretrained and finetunable models—ready to use out of the box.
Project description
📚 Mono-Kit Library Documentation
mono-kit is a versatile machine learning library designed to help developers build advanced similarity retrieval systems such as Google Lens (image similarity retrieval), hum-to-search (audio similarity retrieval), and RAG-style retrieval systems. It supports document, audio, and image inputs, offering a suite of pretrained embedding models as well as finetunable custom built-in models. With mono-kit, you can perform similarity-based retrieval effortlessly—no need to implement complex pipelines. Everything you need comes ready to use, right out of the box.
Available Models
mono-kit comes with powerful, production-ready models tailored for each modality:
-
Image
- Default:
ResNet-50 - Custom: Finetunable customized
ResNet-50for domain-specific tasks
- Default:
-
Audio
- Default:
VGGish - Custom: Finetunable custom Siamese network with a custom loss function for enhanced similarity learning
- Default:
-
Document
- Default:
all-MiniLM-L6-v2– a compact and efficient transformer model ideal for semantic document embeddings
- Default:
📦 Installation
Install the library via pip:
pip install mono-kit
🔧 Initialization
mono-kit uses ChromaDB by default for embedding storage and retrieval.
Start by initializing a chromadb client:
import chromadb
client = chromadb.PersistentClient(path="path_to_save")
✅ You can use any
chromadbclient (e.g.,EphemeralClient,HttpClient, etc.), not justPersistentClient.
⚠️ Collection Name Constraint: Each of
mono_document,mono_audio, andmono_imagemust use unique collection names. You can reuse a collection name across default and custom models.
📝 Text Search: mono_document
1. Initialize Document Handler
mono_docs = mono_document(client, "unique_text_collection")
2. Text Splitting and Mounting
text = """Your long text block here..."""
docs = mono_docs.text_splitter(text, (150, 200), 20, False)
for id, doc in enumerate(docs):
mono_docs.mount_document(doc, str(id))
(150, 200): Min/max character chunk size20: Overlap in charactersFalse: IfTrue, will retain sentence boundaries (optional feature)
3. Semantic Search
result = mono_docs.find_similar_documents("search query here", k=3)
print(result)
🔊 Audio Search: mono_audio
1. Initialize Audio Handler
mono_aud = mono_audio(client, "unique_audio_collection")
2. Mount Audio Files
mono_aud.mount_audio("path/to/audio1.mp3")
mono_aud.mount_audio("path/to/audio2.mp3")
3. Batch Mounting
mono_aud.mount_audio_batch("path/to/audio_directory")
4. Find Similar Audio
result = mono_aud.find_similar_audio("path/to/query.mp3", k=3)
print(result)
✅ With Custom Audio Model
1. Train Custom Audio Model
x = "path/to/reference_audio"
y = "path/to/target_audio"
mono_aud.create_audio_model(directory_x=x, directory_y=y)
2. Mount and Search with Custom Model
model_path = "custom_trained_audio_embedding_model/audio_model.keras"
mono_aud.mount_audio("audio.mp3", model_path=model_path)
mono_aud.mount_audio_batch("audio_directory", model_path=model_path)
result = mono_aud.find_similar_audio("query.mp3", k=2, model_path=model_path)
print(result)
🖼️ Image Search: mobo_image
1. Initialize Image Handler
mono_img = mono_image(client, "unique_image_collection")
2. Mount Images
mono_img.mount_image("path/to/image.jpg")
3. Batch Mounting
mono_img.mount_image_batch("path/to/image_directory")
4. Find Similar Images
result = mono_img.find_similar_image("path/to/query_image.jpg", k=3)
print(result)
✅ With Custom Image Model
1. Train Custom Image Model
x = "path/to/reference_images"
y = "path/to/target_images"
mono_img.create_image_model(directory_x=x, directory_y=y)
2. Mount and Search with Custom Model
model = "/path/to/custom_trained_image_embedding_model/image_model.keras"
mono_img.mount_image_batch("image_directory", model_path=model)
result = mono_img.find_similar_image("query.jpg", k=3, model_path=model)
print(result)
✅ Summary of Key Functions
| Operation | Document | Audio | Image |
|---|---|---|---|
| Mount file | mount_document |
mount_audio |
mount_image |
| Mount batch | — | mount_audio_batch |
mount_image_batch |
| Similarity search | find_similar_documents |
find_similar_audio |
find_similar_image |
| Train custom model | — | create_audio_model |
create_image_model |
| Use custom model | — | via model_path |
via model_path |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mono_kit-0.1.4.tar.gz.
File metadata
- Download URL: mono_kit-0.1.4.tar.gz
- Upload date:
- Size: 13.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d550553a270b53760320bea618979ef9775deb1b49512396b893745087a9735e
|
|
| MD5 |
03062a296133adb733c153dd58b65bb6
|
|
| BLAKE2b-256 |
0262dfeec88800cb7598f2ee6bb9f7aedba7421e395a86d8be05f9a94f07b14d
|
File details
Details for the file mono_kit-0.1.4-py3-none-any.whl.
File metadata
- Download URL: mono_kit-0.1.4-py3-none-any.whl
- Upload date:
- Size: 12.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5b33ffccf0e56d07543be6fe9436a19ea3bf9d3d46371bf850ec333cf34ed53
|
|
| MD5 |
f7a39e8bee760569e7da30e1d7726532
|
|
| BLAKE2b-256 |
4439ff9ef41da66990c6894d9c94a3ec4e26bdd80eb744f5e1589c29c4bcba20
|