Embedding selection: A tool for selecting the best embedding model for your use case
Project description
Embedding Selector Framework
- This framework helps you automatically select the most suitable text embedding model for a given downstream use case.
- It analyzes task requirements (e.g., retrieval, classification, summarization), matches them against available embedding models, and evaluates performance on relevant benchmarks.
Features
- Use Case–Driven Selection: Takes a natural-language description of a use case and extracts structured metadata (e.g., languages, token limits, complexity).
- Metadata Extraction: Uses advanced LLM models to normalize requirements into a standardized schema (parameters, memory, licensing, etc.).
- Model Matching: Filters embedding models based on attributes like size, efficiency, license, and language coverage.
- Task Alignment: Selects relevant evaluation tasks from MTEB (Massive Text Embedding Benchmark).
- Performance Evaluation: Loads benchmark results and computes average scores per candidate model.
How It Works
The pipeline runs in sequential steps:
- Use Case Selection Choose from predefined scenarios (chatbots, legal retrieval, recommendations, sentiment analysis, summarization, etc.) or provide your own description.
- Requirement Extraction (LLM Agent) GPT-4o parses the description into structured metadata, including: Supported languages Max token length Memory usage & parameter limits Task/domain classification
- Model Filtering Candidate models from MTEB are filtered according to the extracted attributes.
- Task Evaluation Candidate models are benchmarked on the most relevant MTEB tasks (retrieval, classification, summarization, etc.).
- Ranking & Export Models are ranked by performance (with ties broken by efficiency) and exported to CSV for inspection.
Usage
To use the tool, follow these steps:
pip install EmbedSelection
EmbedSelection
Contributing
Contributions to improve the tool are welcome! Feel free to open issues for bugs or feature requests, or submit pull requests for enhancements.
Acknowledgements
This project utilizes MTEB benchmark from huggingface : https://huggingface.co/spaces/mteb/leaderboard
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file embedselection-1.0.tar.gz.
File metadata
- Download URL: embedselection-1.0.tar.gz
- Upload date:
- Size: 11.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe681977f34c7c7b6b5be21cf92ea799072ed9f22564f3e61191d5ee34bc0232
|
|
| MD5 |
53d29c40339a3101ac625ad83181968a
|
|
| BLAKE2b-256 |
9358f56df2a462c9063d6f3b64ee346728e1b1f3f8087b9196d47826da27e98b
|
File details
Details for the file embedselection-1.0-py3-none-any.whl.
File metadata
- Download URL: embedselection-1.0-py3-none-any.whl
- Upload date:
- Size: 11.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
710360796aa25694d5755464d01a2622f8e1784a10b3129750e5eb89ef49585f
|
|
| MD5 |
31c88f5912a03fb9ab59685aeca26af2
|
|
| BLAKE2b-256 |
226d3119e980ed82fd1ed5e983c7aa17475c64e5d28e7fc676a4aa7c3561f333
|