Skip to main content

sentence-transformers plugin for embcli

Project description

embcli-sbert

PyPI GitHub Actions Workflow Status PyPI - Python Version

sbert (sentence-transformers) plugin for embcli, a command-line interface for embeddings.

Reference

Installation

pip install embcli-sbert

Quick Start

Try out the Embedding Models

# show general usage of emb command.
emb --help

# list all available models.
emb models
SentenceTransformerModel
    Vendor: sbert
    Models:
    * sentence-transformers (aliases: sbert)
    Default Local Model: all-MiniLM-L6-v2
    See https://sbert.net/docs/sentence_transformer/pretrained_models.html for available local models.
    Model Options:

# get an embedding for an input text by an original sentence-transformers model, the default is all-MiniLM-L6-v2.
# it'll take a while to download the model from Hugging Face Hub for the first time.
emb embed -m sbert "Embeddings are essential for semantic search and RAG apps."

# get an embedding for an input text by another model, all-mpnet-base-v2.
emb embed -m sbert/all-mpnet-base-v2 "Embeddings are essential for semantic search and RAG apps."

# get an embedding for an input by a community model.
emb embed -m sbert/intfloat/multilingual-e5-small "Embeddings are essential for semantic search and RAG apps."

# calculate similarity score between two texts by all-MiniLM-L6-v2. the default metric is cosine similarity.
emb simscore -m sbert "The cat drifts toward sleep." "Sleep dances in the cat's eyes."
0.8031787421988659

Document Indexing and Search

You can use the emb command to index documents and perform semantic search. emb uses chroma for the default vector database.

# index example documents in the current directory.
emb ingest-sample -m sbert -c catcafe --corpus cat-names-en

# or, you can give the path to your documents.
# the documents should be in a CSV file with two columns: id and text. the separator should be comma.
emb ingest -m sbert -c catcafe -f <path-to-your-documents>

# search for a query in the indexed documents.
emb search -m sbert -c catcafe -q "Who's the naughtiest one?"
Found 5 results:
Score: 0.3956756932171536, Document ID: 25, Text: Nala: Nala is a graceful and queenly cat, often a beautiful cream or light tan color. She moves with quiet dignity and observes her surroundings with intelligent eyes. Nala is affectionate but discerning, choosing her moments for cuddles, and her loyalty to her family is unwavering, a truly regal companion.
Score: 0.39523976965995117, Document ID: 12, Text: Leo: Leo, with his magnificent mane-like ruff, carries himself with regal confidence. He is a natural leader, often surveying his domain from the highest point in the room. Affectionate on his own terms, Leo enjoys a good chin scratch and will reward loyalty with his rumbling purr and majestic presence.
Score: 0.3918249967723957, Document ID: 32, Text: Max: Max is a quintessential friendly cat, often a sturdy tabby, who is easygoing and loves everyone. He is playful in a relaxed way, enjoying a good game of chase-the-string but equally happy to lounge nearby. Max is a dependable companion, always ready with a comforting purr and a friendly nuzzle.
Score: 0.3913900431393664, Document ID: 54, Text: Jasper (II): Jasper the Second, distinct from his predecessor, is a playful and highly energetic ginger tom. He loves to chase, tumble, and explore every nook and cranny with boundless enthusiasm. Jasper is also incredibly affectionate, always ready for a cuddle after a vigorous play session, a bundle of orange joy.
Score: 0.38631855385121966, Document ID: 36, Text: Oscar: Oscar is a distinguished and somewhat opinionated cat, often a grumpy-looking but secretly soft Persian. He has his routines and prefers things a certain way but is deeply affectionate with his family. Oscar enjoys luxurious naps and will reward his humans with rumbling purrs when properly pampered.

# multilingual search
emb search -m sbert -c catcafe -q "一番のいたずら者は誰?"
Found 5 results:
Score: 0.3771080195010235, Document ID: 68, Text: Xavi: Xavi is an intelligent and agile cat, perhaps a sleek black or Oriental breed, quick on his feet and sharp in mind. He enjoys interactive toys that challenge him and loves to explore high places. Xavi is affectionate with his family, often engaging them in playful banter or quiet cuddles.
Score: 0.376757642611273, Document ID: 95, Text: Yoshi: Yoshi is a playful and endearing cat, often with a slightly goofy charm that wins everyone over. He loves interactive toys, especially those he can chase and pounce on. Yoshi is very affectionate, always eager for a pet or a warm lap, his happy purrs filling the room.
Score: 0.37384416079962984, Document ID: 81, Text: Kai: Kai is a sleek and agile cat, perhaps with exotic origins, possessing a cool and composed demeanor. He is an excellent hunter of toys and enjoys surveying his domain from high perches. Kai is affectionate with his trusted humans, offering quiet companionship and a rumbling purr, a mysteriously charming feline.
Score: 0.373308241432645, Document ID: 48, Text: Winston: Winston is a distinguished and thoughtful cat, perhaps a British Shorthair, with a calm and composed demeanor. He enjoys observing his surroundings from a comfortable perch and appreciates a predictable routine. Winston is a loyal and affectionate companion, offering quiet comfort and steadfast friendship to his household.
Score: 0.37157731687555895, Document ID: 88, Text: Remi: Remi is a charming and artistic soul, perhaps a cat with unique markings or a flair for dramatic poses. He is playful and enjoys creative games, often inventing his own. Remi is also very affectionate, loving to cuddle and purr, bringing a touch of whimsy and love to his home.

Development

See the main README for general development instructions.

Run Tests

RUN_SBERT_TESTS=1 uv run --package embcli-sbert pytest packages/embcli-sbert/tests/

Run Linter and Formatter

uv run ruff check --fix packages/embcli-sbert
uv run ruff format packages/embcli-sbert

Run Type Checker

uv run --package embcli-sbert pyright packages/embcli-sbert

Build

uv build --package embcli-sbert

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embcli_sbert-0.1.0.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embcli_sbert-0.1.0-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file embcli_sbert-0.1.0.tar.gz.

File metadata

  • Download URL: embcli_sbert-0.1.0.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.9

File hashes

Hashes for embcli_sbert-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f0d56b09fa71f81f31f21749933d5bb0213c540400d8b045a548dca5d4d1833a
MD5 716a939e37ad2973efcf89eb5164f95f
BLAKE2b-256 eeaa07db53f2725368d64765bf68d5d0ffbbc58498de19f41f40b20e9c11c326

See more details on using hashes here.

File details

Details for the file embcli_sbert-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for embcli_sbert-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 23ab9b31360c92a8eecff7504b59cb0b48b68dc1b15c241660b678b97bf27854
MD5 abbd5c62ed918f15bcf7be2ce3d7167d
BLAKE2b-256 51c646ec3b0456ee792faec790121a1f522eb3723ae71ccf3246a55bee707f27

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page