Train Embedding Models on Apple silicon with MLX and the Hugging Face Hub

Project description

MLX-Embeddings-LoRA

With MLX-Embeddings-LoRA you can, train embedding models locally on Apple Silicon using MLX. Built on top of mlx-embeddings, supporting all models available in that package with contrastive learning algorithms optimized for semantic search, retrieval, and similarity tasks. Including:

Qwen3
XLM-RoBERTa
BERT
ModernBERT

Features

🚀 Efficient Training Methods
- LoRA: Low-Rank Adaptation for efficient fine-tuning
- DoRA: Weight-Decomposed Low-Rank Adaptation
- Full-precision: Train all model parameters
- Quantized training: QLoRA with 4-bit, 6-bit, or 8-bit quantization
📊 Contrastive Learning Algorithms
- InfoNCE Loss: Temperature-scaled contrastive loss with in-batch negatives
- Multiple Negatives Ranking Loss: Efficient ranking with batch negatives
- Triplet Loss: Margin-based triplet optimization
- NT-Xent Loss: Normalized temperature-scaled cross entropy (SimCLR-style)

So far only Text based embedding models and contrastive learning is supported, more features and algorythms are to come.

🔧 Flexible Dataset Support
- Hugging Face datasets
- JSONL files
- Optional negative examples (auto-generated from batch if not provided)
⚡ Apple Silicon Optimized
- Native MLX acceleration
- Memory-efficient training
- Gradient accumulation support

Installation

pip install -U mlx-embeddings-lora

Quick Start

Basic Training

mlx_embeddings_lora.train \
  --model mlx-community/all-MiniLM-L6-v2-4bit \
  --train \
  --data mlx-community/sentence-compression \
  --iters 600

With Configuration File

mlx_embeddings_lora.train --config config.yaml

Command-line flags will override corresponding values in the config file.

Dataset Format

Your dataset should contain anchor-positive pairs:

JSONL Format

{"anchor": "How do I reset my password?", "positive": "What's the process for password recovery?", "negative": "What's the weather today?"}
{"anchor": "Python tutorial for beginners", "positive": "Learn Python basics step by step"}
{"anchor": "Machine learning introduction", "positive": "Getting started with ML", "negative": "JavaScript frameworks overview"}

Note: The negative field is optional. If not provided, the training algorithm will automatically use in-batch negatives from other examples in the batch.

Key Parameters

Training Method

--train-type: Choose training method
- lora (default): Low-Rank Adaptation
- dora: Weight-Decomposed Low-Rank Adaptation
- full: Full parameter fine-tuning

LoRA Configuration

--lora-rank: Rank of LoRA matrices (default: 16)
--lora-alpha: LoRA scaling factor (default: 32)
--lora-dropout: Dropout probability (default: 0.05)

Quantization

--quantize: Enable quantized training (QLoRA)
--quantize-bits: Quantization bits (4, 6, or 8)

Loss Function

--loss-type: Contrastive loss algorithm
- infonce: InfoNCE with temperature scaling (recommended)
- mnr: Multiple Negatives Ranking Loss
- triplet: Triplet loss with margin
- nt_xent: NT-Xent (SimCLR-style)

Training Hyperparameters

--batch-size: Training batch size (default: 32)
--learning-rate: Learning rate (default: 5e-5)
--iters: Number of training iterations (default: 1000)
--max-seq-length: Maximum sequence length (default: 512)
--gradient-accumulation-steps: Accumulate gradients over multiple steps

Core Training Parameters

# Model and data
--model <model_path>              # Model path or HF repo
--data <data_path>                # Dataset path or HF dataset name
--train-type lora                 # lora, dora, or full
--train-mode infonce              # infonce, mnr, triplet, nt_xent

# Training schedule
--batch-size 4                    # Batch size
--iters 1000                      # Training iterations
--epochs 3                        # Training epochs (ignored if iters set)
--learning-rate 1e-5              # Learning rate
--gradient-accumulation-steps 1   # Gradient accumulation

# Model architecture
--num-layers 16                   # Layers to fine-tune (-1 for all)
--max-seq-length 2048            # Maximum sequence length

# LoRA parameters
--lora-parameters '{"rank": 8, "dropout": 0.0, "scale": 10.0}'

# Optimization
--optimizer adam                  # adam, adamw, qhadam, muon
--lr-schedule cosine             # Learning rate schedule
--grad-checkpoint                # Enable gradient checkpointing

# Quantization
--load-in-4bits                  # 4-bit quantization
--load-in-6bits                  # 6-bit quantization  
--load-in-8bits                  # 8-bit quantization

# Monitoring
--steps-per-report 10            # Steps between loss reports
--steps-per-eval 200             # Steps between validation
--val-batches 25                 # Validation batches (-1 for all)
--wandb project_name             # WandB logging

# Checkpointing
--adapter-path ./adapters        # Save/load path for adapters
--save-every 100                 # Save frequency
--resume-adapter-file <path>     # Resume from checkpoint
--fuse                           # Fuse and save trained model

Advanced Features

Automatic Negative Sampling

If your dataset doesn't include negative examples, the training will automatically use in-batch negatives:

{"anchor": "Query 1", "positive": "Relevant doc 1"}
{"anchor": "Query 2", "positive": "Relevant doc 2"}
{"anchor": "Query 3", "positive": "Relevant doc 3"}

For each anchor, positives from other examples in the batch serve as negatives.

Gradient Accumulation

For larger effective batch sizes with limited memory:

mlx_embeddings_lora.train \
  --model your-model \
  --batch-size 16 \
  --gradient-accumulation-steps 4  # Effective batch size: 64

Model Export

After training, export your fine-tuned model and upload to Hugging Face:

mlx_embeddings_lora.export \
  --model ./output/checkpoint-1000 \
  --output ./my-finetuned-model \
  --repo username/model-name

Performance Tips

Start with LoRA: More memory efficient than full fine-tuning
Use in-batch negatives: Skip explicit negatives for efficiency
Tune temperature: Lower (0.05-0.07) for harder negatives, higher (0.1-0.2) for softer
Batch size: Larger batches = more negatives = better performance
Gradient accumulation: Increase effective batch size without OOM
QLoRA for large models: Use 4-bit quantization for models >1B parameters

Citation

If you use mlx-embeddings-lora in your research, please cite:

@software{mlx_embeddings_lora,
  title = {mlx-embeddings-lora: Efficient Embedding Model Training on Apple Silicon},
  author = {Gökdneiz Gülmez},
  year = {2025},
  url = {https://github.com/Goekdeniz-Guelmez/mlx-embeddings-lora}
}

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

Acknowledgments

Built on MLX by Apple
Extends mlx-embeddings
Inspired by Sentence-Transformers

Project details

Release history Release notifications | RSS feed

1.0.5

Nov 13, 2025

1.0.4

Nov 13, 2025

1.0.3

Nov 13, 2025

1.0.2

Nov 13, 2025

This version

1.0.1

Nov 13, 2025

1.0.0

Nov 13, 2025

0.0.4

Nov 13, 2025

0.0.2

Nov 13, 2025

0.0.1

Nov 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_embeddings_lora-1.0.1.tar.gz (20.4 kB view details)

Uploaded Nov 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlx_embeddings_lora-1.0.1-py3-none-any.whl (20.2 kB view details)

Uploaded Nov 13, 2025 Python 3

File details

Details for the file mlx_embeddings_lora-1.0.1.tar.gz.

File metadata

Download URL: mlx_embeddings_lora-1.0.1.tar.gz
Upload date: Nov 13, 2025
Size: 20.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlx_embeddings_lora-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`cdba4925a085184eb9e14405f4601d5b0c44f07e1e10875438f55511685551c2`
MD5	`48ede053c676b4bdc517af0cb278a0b5`
BLAKE2b-256	`10181dab39ac387338d526eaa014f57f62d823af87c0d0ce4ef33055b963ef7d`

See more details on using hashes here.

File details

Details for the file mlx_embeddings_lora-1.0.1-py3-none-any.whl.

File metadata

Download URL: mlx_embeddings_lora-1.0.1-py3-none-any.whl
Upload date: Nov 13, 2025
Size: 20.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlx_embeddings_lora-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0382dd233e232d2d328e59fbd9871d2fdc2ed988222fce87296d2adfd046cbe2`
MD5	`9ef69b198b6337c6edf552ea3bb4bf4f`
BLAKE2b-256	`69117778e2e9fa771c51b913cd542d0907d0c27e34bd2ab54cb7b262b5218cdb`

See more details on using hashes here.

mlx-embeddings-lora 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

MLX-Embeddings-LoRA

Features

Installation

Quick Start

Basic Training

With Configuration File

Dataset Format

JSONL Format

Key Parameters

Training Method

LoRA Configuration

Quantization

Loss Function

Training Hyperparameters

Core Training Parameters

Advanced Features

Automatic Negative Sampling

Gradient Accumulation

Model Export

Performance Tips

Citation

Contributing

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes