Skip to main content

jina plugin for embcli

Project description

embcli-jina

PyPI GitHub Actions Workflow Status PyPI - Python Version

jina plugin for embcli, a command-line interface for embeddings.

Reference

Installation

pip install embcli-jina

Quick Start

You need Jina API key to use this plugin. Set JINA_API_KEY environment variable in .env file in the current directory. Or you can give the env file path by -e option.

cat .env
JINA_API_KEY=<YOUR_JINA_KEY>

Try out the Embedding Models

# show general usage of emb command.
emb --help

# list all available models.
emb models
JinaEmbeddingModel
    Vendor: jina
    Models:
    * jina-embeddings-v3 (aliases: jina-v3)
    * jina-colbert-v2 (aliases: colbert-v2)
    * jina-embeddings-v2-base-code (aliases: jina-v2-code)
    Model Options:
    * task (str) - Downstream task for which the embeddings are used. Supported tasks: 'text-matching', 'retrieval.query', 'retrieval.passage', 'separation', 'classification'. Only supported in jina-embeddings-v3.
    * late_chunking (bool) - Whether if the late chunking is applied. Only supported in jina-embeddings-v3.
    * truncate (bool) - When enabled, the model will automatically drop the tail that extends beyond the maximum context length allowed by the model instead of throwing an error. Only supported in jina-embeddings-v3.
    * dimensions (int) - The number of dimensions the resulting output embeddings should have. Only supported in jina-embeddings-v3 and jina-colbert-v2.
    * input_type (str) - The type of input to the model. Supported types: 'query', 'document' Only supported in jina-corebert-v2.
    * embedding_type (str) - The type of embeddings to return. Options include 'float', 'binary', 'ubinary'. Default is 'float'.
JinaMultiModalModel
    Vendor: jina
    Models:
    * jina-embeddings-v4 (aliases: jina-v4)
    * jina-clip-v2 (aliases: )
    Model Options:
    * task (str) - Downstream task for which the embeddings are used. Supported tasks: 'retrieval.query', 'retrieval.passage', 'text-matching', 'code.query', 'code.passage'.
    * late_chunking (bool) - Whether if the late chunking is applied. Only supported in jina-embeddings-v4.
    * truncate (bool) - When enabled, the model will automatically drop the tail that extends beyond the maximum context length allowed by the model instead of throwing an error. Only supported in jina-embeddings-v4.
    * dimensions (int) - The number of dimensions the resulting output embeddings should have.
    * embedding_type (str) - The type of embeddings to return. Options include 'float', 'binary', 'ubinary'. Default is 'float'.

# get an embedding for an input text by jina-embeddings-v3 model.
emb embed -m jina-v3 "Embeddings are essential for semantic search and RAG apps."

# get an embedding for an input text by jina-embeddings-v3 model model with dimensions=512.
emb embed -m jina-v3 "Embeddings are essential for semantic search and RAG apps." -o dimensions 512

# get an embedding for an input text by jina-embeddings-v3 model model with embedding_type=binary.
emb embed -m jina-v3 "Embeddings are essential for semantic search and RAG apps." -o embedding_type binary

# get an embedding for an image input by jina-embeddings-v4 model.
# assume you have an image file named `gingercat.jpg` in the current directory.
emb embed -m jina-v4 --image gingercat.jpeg

# calculate similarity score between two texts by jina-embeddings-v3 model model. the default metric is cosine similarity.
emb simscore -m jina-v3 "The cat drifts toward sleep." "Sleep dances in the cat's eyes."
0.708945856730407

Document Indexing and Search

You can use the emb command to index documents and perform search by an image. emb uses LanceDB for the default vector database.

# index example documents in the current directory.
emb ingest-sample -m jina-v3 -c catcafe --corpus cat-names-en

# or, you can give the path to your documents.
# the documents should be in a CSV file with two columns: id and text. the separator should be comma.
emb ingest -m jina-v3 -c catcafe -f <path-to-your-documents>

# search for a query in the indexed documents.
emb search -m jina-v3 -c catcafe -q "Who's the naughtiest one?"
Found 5 results:
Score: 0.45097012297560646, Document ID: 12, Text: Leo: Leo, with his magnificent mane-like ruff, carries himself with regal confidence. He is a natural leader, often surveying his domain from the highest point in the room. Affectionate on his own terms, Leo enjoys a good chin scratch and will reward loyalty with his rumbling purr and majestic presence.
Score: 0.4291541094385421, Document ID: 46, Text: Bandit: Bandit is a mischievous cat, often with mask-like markings, always on the lookout for his next playful heist of a toy or treat. He is clever and energetic, loving to chase and pounce. Despite his roguish name, Bandit is a loving companion who enjoys a good cuddle after his adventures.
Score: 0.4137949268906759, Document ID: 20, Text: Pepper: Pepper is a feisty and energetic grey tabby with a spicy personality. She is quick-witted and loves to engage in playful stalking and pouncing games. Pepper is also fiercely independent but will show her affection with sudden bursts of purring and head-butts, keeping her humans on their toes.
Score: 0.40369800611316564, Document ID: 35, Text: Lucy: Lucy is a sweet-natured and playful cat, often a ginger or calico, with a bright personality. She loves attention and will often seek out her humans for cuddles and playtime. Lucy is very expressive, using chirps and meows to communicate her desires, her joyful spirit lighting up the household.
Score: 0.4031877012247693, Document ID: 3, Text: Pippin (Pip): Pippin, or Pip, is a compact dynamo, brimming with mischievous charm and boundless curiosity. He’s an intrepid explorer, always finding new hideouts or investigating forbidden territories with a twinkle in his eye. Quite vocal, Pip will happily chat about his day, his playful antics making him an endearing little rascal.

# multilingual search
emb search -m jina-v3 -c catcafe -q "一番のいたずら者は誰?"
Found 5 results:
Score: 0.41762481997209167, Document ID: 12, Text: Leo: Leo, with his magnificent mane-like ruff, carries himself with regal confidence. He is a natural leader, often surveying his domain from the highest point in the room. Affectionate on his own terms, Leo enjoys a good chin scratch and will reward loyalty with his rumbling purr and majestic presence.
Score: 0.40111028920595193, Document ID: 46, Text: Bandit: Bandit is a mischievous cat, often with mask-like markings, always on the lookout for his next playful heist of a toy or treat. He is clever and energetic, loving to chase and pounce. Despite his roguish name, Bandit is a loving companion who enjoys a good cuddle after his adventures.
Score: 0.37882908929187215, Document ID: 20, Text: Pepper: Pepper is a feisty and energetic grey tabby with a spicy personality. She is quick-witted and loves to engage in playful stalking and pouncing games. Pepper is also fiercely independent but will show her affection with sudden bursts of purring and head-butts, keeping her humans on their toes.
Score: 0.3777527161730029, Document ID: 22, Text: Simba: Simba, true to his namesake, possesses a brave and noble spirit, often seen patrolling his territory. He is a confident and affectionate leader of his household pride. While he enjoys playful roughhousing, Simba is also a gentle giant, offering comforting purrs and loyal companionship to his beloved humans.
Score: 0.37738051225556507, Document ID: 3, Text: Pippin (Pip): Pippin, or Pip, is a compact dynamo, brimming with mischievous charm and boundless curiosity. He’s an intrepid explorer, always finding new hideouts or investigating forbidden territories with a twinkle in his eye. Quite vocal, Pip will happily chat about his day, his playful antics making him an endearing little rascal.

Development

See the main README for general development instructions.

Run Tests

You need to have a Jina API key to run the tests for the embcli-jina package. You can set it up as an environment variable:

JINA_API_KEY=<YOUR_JINA_KEY> RUN_JINA_TESTS=1 uv run --package embcli-jina pytest packages/embcli-jina/tests/

Run Linter and Formatter

uv run ruff check --fix packages/embcli-jina
uv run ruff format packages/embcli-jina

Run Type Checker

uv run --package embcli-jina pyright packages/embcli-jina

Build

uv build --package embcli-jina

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embcli_jina-0.1.3.tar.gz (246.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embcli_jina-0.1.3-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file embcli_jina-0.1.3.tar.gz.

File metadata

  • Download URL: embcli_jina-0.1.3.tar.gz
  • Upload date:
  • Size: 246.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.4

File hashes

Hashes for embcli_jina-0.1.3.tar.gz
Algorithm Hash digest
SHA256 a841ada4bac32f67fc0fd8b76c321e8e7c51e9fae928b1a8c70b59d5a0a1b0bf
MD5 638af8dfbcd26f3ad2bbff42a9d02ef2
BLAKE2b-256 8cd38f56a1299f6933b9845b2ec54758acc127ec0d4d17ac858544f6be5cdbef

See more details on using hashes here.

File details

Details for the file embcli_jina-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for embcli_jina-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 492d45ca2c4253eff48c3a1e2e726b123d18ac6626843df851650a295f24c9d7
MD5 31f8d2a1efc12d5d977ca33dcd97629d
BLAKE2b-256 ea863c6497258764eb372496f7039e3a294afac07527d818a71bbe34f2e69981

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page