Muzlin: a filtering toolset for semantic machine learning
Project description
When a filter cloth 🏳️ is needed rather than a simple RAG 🏴☠
Deployment, Stats, & License
What is it?
Muzlin merges classical ML with advanced generative AI to efficiently filter text in the context of NLP and LLMs. It answers key questions in semantic-based workflows, such as:
Does a RAG/GraphRAG have the right context to answer a question?
Is the topk retrieved context too dense/sparse?
Does the generated response hallucinate or deviate from the provided context?
Should new extracted text be added to an existing RAG?
Can we detect inliers and outliers in collections of text embeddings (e.g. context, user question and answers, synthetic generated data, etc…)?
Note: While production-ready, Muzlin is still evolving and subject to significant changes!
Quickstart
Install Muzlin using pip:
pip install muzlin
Create text embeddings with a pre-trained model:
import numpy as np from muzlin.encoders import HuggingFaceEncoder encoder = HuggingFaceEncoder() vectors = encoder(texts) # texts is a list of strings vectors = np.array(vectors) np.save('vectors', vectors)
Build an anomaly detection model for filtering:
from muzlin.anomaly import OutlierDetector from pyod.models.pca import PCA vectors = np.load('vectors.npy') # Load pre-saved vectors od = PCA(contamination=0.02) clf = OutlierDetector(mlflow=False, detector=od) # Saves joblib moddel clf.fit(vectors)
Filter new text using the trained model:
from muzlin.anomaly import OutlierDetector from muzlin.encoders import HuggingFaceEncoder import numpy as np clf = OutlierDetector(model='outlier_detector.pkl') # Load the model encoder = HuggingFaceEncoder() vector = encoder(['Who was the first man to walk on the moon?']) vector = np.array(vector).reshape(1, -1) label = clf.predict(vector)
Integrations
Muzlin integrates with a wide array of libraries for anomaly detection, vector encoding, and graph-based setups.
Anomaly Detection |
Encoders |
Vector Index |
---|---|---|
|
|
|
Simple Schematic Implementation
Resources
Example Notebooks
Notebook |
Description |
---|---|
Basic semantic vector-based outlier detection |
|
Selecting optimal thresholds using various methods |
|
Cluster-based filtering for question answering |
|
Using graph-based anomaly detection for semantic graphs like GraphRAG |
What Else?
Looking for more? Check out other useful libraries like Semantic Router, CRAG, and Scikit-LLM
Contributing
Muzlin is still evolving! At the moment their are major changes being done and the structure of Muzlin is still being refined. For now, please leave a bug report and potential new code for any fixes or improvements. You will be added as a co-author if it is implemented.
Once this phase has been completed then ->
Anyone is welcome to contribute to Muzlin:
Please share your ideas and ask questions by opening an issue.
To contribute, first check the Issue list for the “help wanted” tag and comment on the one that you are interested in. The issue will then be assigned to you.
If the bug, feature, or documentation change is novel (not in the Issue list), you can either log a new issue or create a pull request for the new changes.
To start, fork the dev branch and add your improvement/modification/fix.
To make sure the code has the same style and standard, please refer to detector.py for example.
Create a pull request to the dev branch and follow the pull request template PR template
Please make sure that all code changes are accompanied with proper new/updated test functions. Automatic tests will be triggered. Before the pull request can be merged, make sure that all the tests pass.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file muzlin-0.0.2.tar.gz
.
File metadata
- Download URL: muzlin-0.0.2.tar.gz
- Upload date:
- Size: 29.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dcf2652984bf4b1cb3184ac896e59cda808c293a755d0dc3364d436b4d847564 |
|
MD5 | df50a2426be7f0c303e6fdf2751a33c2 |
|
BLAKE2b-256 | 76cd447745371d3f07103ab09ce763e37c3d51132172c8e381cb11e2581d45de |
File details
Details for the file muzlin-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: muzlin-0.0.2-py3-none-any.whl
- Upload date:
- Size: 35.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 55fec4704282542e58d4aecd64a172bcb22b08a1516cf47697cbf24eb36aa258 |
|
MD5 | bfdff35cfd5598313ad4702e8947550b |
|
BLAKE2b-256 | 61cd6fbf4b8aa007af5854e069da32db54cdbb66ebf9b2e99be34810d2a5c813 |