Skip to main content

Embedding selection: A tool for selecting the best embedding model for your use case

Project description

Embedding Selector Framework

- This framework helps you automatically select the most suitable text embedding model for a given downstream use case.

- It analyzes task requirements (e.g., retrieval, classification, summarization), matches them against available embedding models, and evaluates performance on relevant benchmarks.


Features

  • Use Case–Driven Selection: Takes a natural-language description of a use case and extracts structured metadata (e.g., languages, token limits, complexity).
  • Metadata Extraction: Uses advanced LLM models to normalize requirements into a standardized schema (parameters, memory, licensing, etc.).
  • Model Matching: Filters embedding models based on attributes like size, efficiency, license, and language coverage.
  • Task Alignment: Selects relevant evaluation tasks from MTEB (Massive Text Embedding Benchmark).
  • Performance Evaluation: Loads benchmark results and computes average scores per candidate model.

How It Works

The pipeline runs in sequential steps:

  • Use Case Selection Choose from predefined scenarios (chatbots, legal retrieval, recommendations, sentiment analysis, summarization, etc.) or provide your own description.
  • Requirement Extraction (LLM Agent) GPT-4o parses the description into structured metadata, including: Supported languages Max token length Memory usage & parameter limits Task/domain classification
  • Model Filtering Candidate models from MTEB are filtered according to the extracted attributes.
  • Task Evaluation Candidate models are benchmarked on the most relevant MTEB tasks (retrieval, classification, summarization, etc.).
  • Ranking & Export Models are ranked by performance (with ties broken by efficiency) and exported to CSV for inspection.

Usage

To use the tool, follow these steps:

  pip install EmbedSelection

  EmbedSelection 

Contributing

Contributions to improve the tool are welcome! Feel free to open issues for bugs or feature requests, or submit pull requests for enhancements.

Acknowledgements

This project utilizes MTEB benchmark from huggingface : https://huggingface.co/spaces/mteb/leaderboard

Project details


Release history Release notifications | RSS feed

This version

1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embedselection-1.0.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embedselection-1.0-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file embedselection-1.0.tar.gz.

File metadata

  • Download URL: embedselection-1.0.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for embedselection-1.0.tar.gz
Algorithm Hash digest
SHA256 fe681977f34c7c7b6b5be21cf92ea799072ed9f22564f3e61191d5ee34bc0232
MD5 53d29c40339a3101ac625ad83181968a
BLAKE2b-256 9358f56df2a462c9063d6f3b64ee346728e1b1f3f8087b9196d47826da27e98b

See more details on using hashes here.

File details

Details for the file embedselection-1.0-py3-none-any.whl.

File metadata

  • Download URL: embedselection-1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for embedselection-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 710360796aa25694d5755464d01a2622f8e1784a10b3129750e5eb89ef49585f
MD5 31c88f5912a03fb9ab59685aeca26af2
BLAKE2b-256 226d3119e980ed82fd1ed5e983c7aa17475c64e5d28e7fc676a4aa7c3561f333

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page