Skip to main content

MLX-Embeddings is a package for running Vision and Language Embedding models locally on your Mac using MLX.

Project description

MLX-Embeddings

image

MLX-Embeddings is a package for running Vision and Language Embedding models locally on your Mac using MLX.

  • Free software: GNU General Public License v3

Features

  • Generate embeddings for text using MLX models
  • Support for single-item and batch processing
  • Utilities for comparing text similarities

Installation

You can install mlx-embeddings using pip:

pip install mlx-embeddings

Usage

Single Item Embedding

To generate an embedding for a single piece of text:

import mlx.core as mx
from mlx_embeddings.utils import load

# Load the model and tokenizer
model, tokenizer = load("sentence-transformers/all-MiniLM-L6-v2")

# Prepare the text
text = "I like reading"

# Tokenize and generate embedding
input_ids = tokenizer.encode(text, return_tensors="mlx")
outputs = model(input_ids)
embeddings = outputs[0][:, 0, :]

Comparing Multiple Texts

To compare multiple texts using their embeddings:

from sklearn.metrics.pairwise import cosine_similarity
import matplotlib.pyplot as plt
import seaborn as sns
import mlx.core as mx
from mlx_embeddings.utils import load

# Load the model and tokenizer
model, tokenizer = load("sentence-transformers/all-MiniLM-L6-v2")

def get_embedding(text, model, tokenizer):
    input_ids = tokenizer.encode(text, return_tensors="mlx", padding=True, truncation=True, max_length=512)
    outputs = model(input_ids)
    embeddings = outputs[0][:, 0, :][0]
    return embeddings

# Sample texts
texts = [
    "I like grapes",
    "I like fruits",
    "The slow green turtle crawls under the busy ant."
]

# Generate embeddings
embeddings = [get_embedding(text, model, tokenizer) for text in texts]

# Compute similarity
similarity_matrix = cosine_similarity(embeddings)

# Visualize results
def plot_similarity_matrix(similarity_matrix, labels):
    plt.figure(figsize=(5, 4))
    sns.heatmap(similarity_matrix, annot=True, cmap='coolwarm', xticklabels=labels, yticklabels=labels)
    plt.title('Similarity Matrix Heatmap')
    plt.tight_layout()
    plt.show()

labels = [f"Text {i+1}" for i in range(len(texts))]
plot_similarity_matrix(similarity_matrix, labels)

Batch Processing

For processing multiple texts at once:

from sklearn.metrics.pairwise import cosine_similarity
import matplotlib.pyplot as plt
import seaborn as sns
import mlx.core as mx
from mlx_embeddings.utils import load

# Load the model and tokenizer
model, tokenizer = load("sentence-transformers/all-MiniLM-L6-v2")

def get_embedding(texts, model, tokenizer):
    inputs = tokenizer.batch_encode_plus(texts, return_tensors="mlx", padding=True, truncation=True, max_length=512)
    outputs = model(
        inputs["input_ids"],
        attention_mask=inputs["attention_mask"]
    )
    return outputs[0]

def compute_and_print_similarity(embeddings):
    B, Seq_len, dim = embeddings.shape
    embeddings_2d = embeddings.reshape(B, -1)
    similarity_matrix = cosine_similarity(embeddings_2d)

    print("Similarity matrix between sequences:")
    print(similarity_matrix)
    print("\n")

    for i in range(B):
        for j in range(i+1, B):
            print(f"Similarity between sequence {i+1} and sequence {j+1}: {similarity_matrix[i][j]:.4f}")

    return similarity_matrix

# Sample texts
texts = [
    "I like grapes",
    "I like fruits",
    "The slow green turtle crawls under the busy ant."
]

embeddings = get_embedding(texts, model, tokenizer)
similarity_matrix = compute_and_print_similarity(embeddings)

# Visualize results
labels = [f"Text {i+1}" for i in range(len(texts))]
plot_similarity_matrix(similarity_matrix, labels)

Supported Models Archictectures

MLX-Embeddings supports a variety of model architectures for text embedding tasks. Here's a breakdown of the currently supported architectures:

  • XLM-RoBERTa (Cross-lingual Language Model - Robustly Optimized BERT Approach)
  • BERT (Bidirectional Encoder Representations from Transformers)

We're continuously working to expand our support for additional model architectures. Check our GitHub repository or documentation for the most up-to-date list of supported models and their specific versions.

Contributing

Contributions to MLX-Embeddings are welcome! Please refer to our contribution guidelines for more information.

License

This project is licensed under the GNU General Public License v3.

Contact

For any questions or issues, please open an issue on the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_embeddings-0.0.1.tar.gz (21.7 kB view details)

Uploaded Source

Built Distribution

mlx_embeddings-0.0.1-py2.py3-none-any.whl (18.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file mlx_embeddings-0.0.1.tar.gz.

File metadata

  • Download URL: mlx_embeddings-0.0.1.tar.gz
  • Upload date:
  • Size: 21.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for mlx_embeddings-0.0.1.tar.gz
Algorithm Hash digest
SHA256 e49ae6c8de476c0fdcd8bfc43e52fd3120f3ede76dba033866e6fd29422c7b72
MD5 22badb4429de9d761a97f382b153d661
BLAKE2b-256 45764eeb8f5058fbb39bf5894904847421961681129d84602e17326b112e663b

See more details on using hashes here.

File details

Details for the file mlx_embeddings-0.0.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for mlx_embeddings-0.0.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 ef8a7ac73e1a68abc3aa4c873ccfa4d38c267c5703f46bb9dee9d95e401b717d
MD5 a74c10ddfcc6885f12ee096db6c5725b
BLAKE2b-256 95a7a37e4e2b4799f1429647bb6b066f333138c3e9e0140787ae68b22719e029

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page