Skip to main content

cohere plugin for embcli

Project description

embcli-cohere

PyPI GitHub Actions Workflow Status PyPI - Python Version

cohere plugin for embcli, a command-line interface for embeddings.

Reference

Installation

pip install embcli-cohere

Quick Start

You need Cohere API key to use this plugin. Set COHERE_API_KEY environment variable in .env file in the current directory. Or you can give the env file path by -e option.

cat .env
COHERE_API_KEY=<YOUR_COHERE_KEY>

Try out the Embedding Models

# show general usage of emb command.
emb --help

# list all available models.
emb models
CohereEmbeddingModel
    Vendor: cohere
    Models:
    * embed-v4.0 (aliases: embed-v4)
    * embed-english-v3.0 (aliases: embed-en-v3)
    * embed-english-light-v3.0 (aliases: embed-en-light-v3)
    * embed-multilingual-v3.0 (aliases: embed-multiling-v3)
    * embed-multilingual-light-v3.0 (aliases: embed-multiling-light-v3)
    Model Options:
    * input_type (str) - The type of input, affecting how the model processes it. Options include 'search_document', 'search_query', 'classification', 'clustering', 'image'.
    * embedding_type (str) - The type of embeddings to return. Options include 'float', 'int8', 'uint8', 'binary', 'ubinary'
    * truncate (str) - How to handle text inputs that exceed the model's token limit. Options include 'none', 'start', 'end', 'middle'.

# get an embedding for an input text by embed-v4.0 model.
emb embed -m embed-v4 "Embeddings are essential for semantic search and RAG apps."

# get an embedding for an input text by embed-v4.0 model with input_type=search_query.
emb embed -m embed-v4 "Embeddings are essential for semantic search and RAG apps." -o input_type search_query

# get an embedding for an input text by embed-v4.0 model with embedding_type=uint8.
emb embed -m embed-v4 "Embeddings are essential for semantic search and RAG apps." -o embedding_type uint8

# calculate similarity score between two texts by embed-v4.0 model. the default metric is cosine similarity.
emb simscore -m embed-v4 "The cat drifts toward sleep." "Sleep dances in the cat's eyes."
0.6656540804655765

Document Indexing and Search

You can use the emb command to index documents and perform semantic search. emb uses chroma for the default vector database.

# index example documents in the current directory.
emb ingest-sample -m embed-v4 -c catcafe --corpus cat-names-en

# or, you can give the path to your documents.
# the documents should be in a CSV file with two columns: id and text. the separator should be comma.
emb ingest -m embed-v4 -c catcafe -f <path-to-your-documents>

# search for a query in the indexed documents.
emb search -m embed-v4 -c catcafe -q "Who's the naughtiest one?"
Found 5 results:
Score: 0.4193700736149105, Document ID: 97, Text: Alfie: Alfie is a cheerful and mischievous little cat, always getting into playful trouble with a charming innocence. He loves exploring small spaces and batting at dangling objects. Alfie is incredibly affectionate, quick to purr and eager for cuddles, a delightful bundle of joy and entertainment for his humans.
Score: 0.4187830451687781, Document ID: 76, Text: Frankie: Frankie is a boisterous and playful cat, full of charm and mischief. He loves to zoom around the house and engage in energetic play sessions, especially with crinkly toys. Frankie is also very affectionate, often seeking out his humans for cuddles and purrs after his bursts of energy, a fun-loving friend.
Score: 0.41594965013771756, Document ID: 46, Text: Bandit: Bandit is a mischievous cat, often with mask-like markings, always on the lookout for his next playful heist of a toy or treat. He is clever and energetic, loving to chase and pounce. Despite his roguish name, Bandit is a loving companion who enjoys a good cuddle after his adventures.
Score: 0.41532520111462273, Document ID: 28, Text: Loki: Loki is a mischievous and clever cat, always finding new ways to entertain himself, sometimes at his humans' expense. He is a master of stealth and surprise attacks on toys. Despite his playful trickery, Loki is incredibly charming and affectionate, easily winning hearts with his roguish appeal.
Score: 0.4081888584294111, Document ID: 50, Text: Dexter: Dexter is a clever and sometimes quirky cat, always up to something interesting. He might have a fascination with running water or a particular toy he carries everywhere. Dexter is highly intelligent and enjoys interactive play, keeping his humans entertained with his unique personality and amusing antics, a truly engaging companion.

# multilingual search
emb search -m embed-v4 -c catcafe -q "一番のいたずら者は誰?"
Found 5 results:
Score: 0.4211751179260085, Document ID: 46, Text: Bandit: Bandit is a mischievous cat, often with mask-like markings, always on the lookout for his next playful heist of a toy or treat. He is clever and energetic, loving to chase and pounce. Despite his roguish name, Bandit is a loving companion who enjoys a good cuddle after his adventures.
Score: 0.41704963047944504, Document ID: 28, Text: Loki: Loki is a mischievous and clever cat, always finding new ways to entertain himself, sometimes at his humans' expense. He is a master of stealth and surprise attacks on toys. Despite his playful trickery, Loki is incredibly charming and affectionate, easily winning hearts with his roguish appeal.
Score: 0.3999017194050878, Document ID: 76, Text: Frankie: Frankie is a boisterous and playful cat, full of charm and mischief. He loves to zoom around the house and engage in energetic play sessions, especially with crinkly toys. Frankie is also very affectionate, often seeking out his humans for cuddles and purrs after his bursts of energy, a fun-loving friend.
Score: 0.3997923784831019, Document ID: 97, Text: Alfie: Alfie is a cheerful and mischievous little cat, always getting into playful trouble with a charming innocence. He loves exploring small spaces and batting at dangling objects. Alfie is incredibly affectionate, quick to purr and eager for cuddles, a delightful bundle of joy and entertainment for his humans.
Score: 0.3969699024640684, Document ID: 24, Text: Gizmo: Gizmo is an endearingly quirky cat, full of curious habits and playful antics. He might bat at imaginary foes or carry his favorite small toy everywhere. Gizmo is incredibly entertaining and loves attention, often performing his unique tricks for his amused human audience, always bringing a smile.

Development

See the main README for general development instructions.

Run Tests

You need to have a Cohere API key to run the tests for the embcli-cohere package. You can set it up as an environment variable:

COHERE_API_KEY=<YOUR_COHERE_KEY> RUN_COHERE_TESTS=1 uv run --package embcli-cohere pytest packages/embcli-cohere/tests/

Run Linter and Formatter

uv run ruff check --fix packages/embcli-cohere
uv run ruff format packages/embcli-cohere

Run Type Checker

uv run --package embcli-cohere pyright packages/embcli-cohere

Build

uv build --package embcli-cohere

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embcli_cohere-0.1.1.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embcli_cohere-0.1.1-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file embcli_cohere-0.1.1.tar.gz.

File metadata

  • Download URL: embcli_cohere-0.1.1.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.12

File hashes

Hashes for embcli_cohere-0.1.1.tar.gz
Algorithm Hash digest
SHA256 075e4fb58e3b9d87e3f889bff95e5525f18ceac80c2569adbfae43886ec4c286
MD5 bb793023570ef3f387acf3455de5c978
BLAKE2b-256 6f733f0b376d123e085f86d0d601ce2180e687fafc7bd5c2a8e181afc80dbd29

See more details on using hashes here.

File details

Details for the file embcli_cohere-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for embcli_cohere-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cf24cefb5fb4d7ead224dfcdbdcfdffa119b8780195a6c46c602702ea1fb5aab
MD5 26a7e30db54aa8b1f21f01135e6dd75a
BLAKE2b-256 3eca0a7f7dbfdecded372b95a47b5113bef5fa50621ac49c361504182ba891c2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page