Skip to main content

Muzlin: a filtering toolset for semantic machine learning

Project description

Muzlin

When a filter cloth 🏳️ is needed rather than a simple RAG 🏴‍☠

Deployment, Stats, & License

PyPI version GitHub stars Downloads Python versions License


What is it?

Muzlin merges classical ML with advanced generative AI to efficiently filter text in the context of NLP and LLMs. It answers key questions in semantic-based workflows, such as:

  • Does a RAG/GraphRAG have the right context to answer a question?

  • Is the topk retrieved context too dense/sparse?

  • Does the generated response hallucinate or deviate from the provided context?

  • Should new extracted text be added to an existing RAG?

  • Can we detect inliers and outliers in collections of text embeddings (e.g. context, user question and answers, synthetic generated data, etc…)?

Note: While production-ready, Muzlin is still evolving and subject to significant changes!

Quickstart

  1. Install Muzlin using pip:

    pip install muzlin
  2. Create text embeddings with a pre-trained model:

    import numpy as np
    from muzlin.encoders import HuggingFaceEncoder # Ensure torch and transformers are installed
    
    encoder = HuggingFaceEncoder()
    vectors = encoder(texts)  # texts is a list of strings
    vectors = np.array(vectors)
    np.save('vectors', vectors)
  3. Build an anomaly detection model for filtering:

    from muzlin.anomaly import OutlierDetector
    from pyod.models.pca import PCA
    
    vectors = np.load('vectors.npy')  # Load pre-saved vectors
    
    od = PCA(contamination=0.02)
    
    clf = OutlierDetector(mlflow=False, detector=od) # Saves joblib moddel
    clf.fit(vectors)
  4. Filter new text using the trained model:

    from muzlin.anomaly import OutlierDetector
    from muzlin.encoders import HuggingFaceEncoder
    import numpy as np
    
    clf = OutlierDetector(model='outlier_detector.pkl')  # Load the model
    encoder = HuggingFaceEncoder()
    
    vector = encoder(['Who was the first man to walk on the moon?'])
    vector = np.array(vector).reshape(1, -1)
    
    label = clf.predict(vector)

Integrations

Muzlin integrates with a wide array of libraries for anomaly detection, vector encoding, and graph-based setups.

Anomaly Detection

Encoders

Vector Index

  • Scikit-Learn

  • PyOD (vector)

  • PyGOD (graph)

  • PyThresh (thresholding)

  • HuggingFace

  • OpenAI

  • Cohere

  • Azure

  • Google

  • Amazon Bedrock

  • Fastembed

  • LangChain

  • LlamaIndex

Simple Schematic Implementation

Muzlin Pipeline

Resources

Example Notebooks

Notebook

Description

Introduction

Basic semantic vector-based outlier detection

Optimal Threshold

Selecting optimal thresholds using various methods

Cluster-Based Filtering

Cluster-based filtering for question answering

Graph-Based Filtering

Using graph-based anomaly detection for semantic graphs like GraphRAG

What Else?

Looking for more? Check out other useful libraries like Semantic Router, CRAG, and Scikit-LLM


Contributing

Muzlin is still evolving! At the moment their are major changes being done and the structure of Muzlin is still being refined. For now, please leave a bug report and potential new code for any fixes or improvements. You will be added as a co-author if it is implemented.

Once this phase has been completed then ->

Anyone is welcome to contribute to Muzlin:

  • Please share your ideas and ask questions by opening an issue.

  • To contribute, first check the Issue list for the “help wanted” tag and comment on the one that you are interested in. The issue will then be assigned to you.

  • If the bug, feature, or documentation change is novel (not in the Issue list), you can either log a new issue or create a pull request for the new changes.

  • To start, fork the dev branch and add your improvement/modification/fix.

  • To make sure the code has the same style and standard, please refer to detector.py for example.

  • Create a pull request to the dev branch and follow the pull request template PR template

  • Please make sure that all code changes are accompanied with proper new/updated test functions. Automatic tests will be triggered. Before the pull request can be merged, make sure that all the tests pass.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

muzlin-0.0.4.tar.gz (31.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

muzlin-0.0.4-py3-none-any.whl (38.2 kB view details)

Uploaded Python 3

File details

Details for the file muzlin-0.0.4.tar.gz.

File metadata

  • Download URL: muzlin-0.0.4.tar.gz
  • Upload date:
  • Size: 31.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for muzlin-0.0.4.tar.gz
Algorithm Hash digest
SHA256 bc73c6ec8bf5a93f07b7d0d32f3580ed37e08d574b6c0c73e529a38a185faf43
MD5 49078c978ec6f8be8c073b91b1a90e98
BLAKE2b-256 9c284cdd2c7ad5a6eb25a57b768530c5eaa456bdba8284a847c2ae027d7227ef

See more details on using hashes here.

File details

Details for the file muzlin-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: muzlin-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 38.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for muzlin-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 99fa31e5793a9499c25aff35e914200c7ec8da81245c930f988793da8e40b606
MD5 9b807aa913971ceec23847b8b26bdc3a
BLAKE2b-256 9d0c34d4bdc4ef12c105041e676597a0d353311e61c353a274e102727eb289a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page