Skip to main content

jina plugin for embcli

Project description

embcli-jina

PyPI GitHub Actions Workflow Status PyPI - Python Version

jina plugin for embcli, a command-line interface for embeddings.

Reference

Installation

pip install embcli-jina

Quick Start

You need Jina API key to use this plugin. Set JINA_API_KEY environment variable in .env file in the current directory. Or you can give the env file path by -e option.

cat .env
JINA_API_KEY=<YOUR_JINA_KEY>

Try out the Embedding Models

# show general usage of emb command.
emb --help

# list all available models.
emb models
JinaEmbeddingModel
    Vendor: jina
    Models:
    * jina-embeddings-v3 (aliases: jina-v3)
    * jina-colbert-v2 (aliases: colbert-v2)
    * jina-embeddings-v2-base-code (aliases: jina-v2-code)
    Model Options:
    * task (str) - Downstream task for which the embeddings are used. Supported tasks: 'text-matching', 'retrieval.query', 'retrieval.passage', 'separation', 'classification'. Only supported in jina-embeddings-v3.
    * late_chunking (bool) - Whether if the late chunking is applied. Only supported in jina-embeddings-v3.
    * truncate (bool) - When enabled, the model will automatically drop the tail that extends beyond the maximum context length allowed by the model instead of throwing an error. Only supported in jina-embeddings-v3.
    * dimensions (int) - The number of dimensions the resulting output embeddings should have. Only supported in jina-embeddings-v3 and jina-colbert-v2.
    * input_type (str) - The type of input to the model. Supported types: 'query', 'document' Only supported in jina-corebert-v2.
JinaClipModel
    Vendor: jina
    Models:
    * jina-clip-v2 (aliases: )
    Model Options:
    * task (str) - Downstream task for which the embeddings are used. Supported tasks: 'retrieval.query', 'retrieval.passage'.
    * dimensions (int) - The number of dimensions the resulting output embeddings should have.

# get an embedding for an input text by jina-embeddings-v3 model.
emb embed -m jina-v3 "Embeddings are essential for semantic search and RAG apps."

# get an embedding for an input text by jina-embeddings-v3 model model with dimensions=512.
emb embed -m jina-v3 "Embeddings are essential for semantic search and RAG apps." -o dimensions 512

# get an embedding for an image input by jina-clip-v2 model.
# assume you have an image file named `gingercat.jpg` in the current directory.
emb embed -m jina-clip-v2 --image gingercat.jpeg

# calculate similarity score between two texts by jina-embeddings-v3 model model. the default metric is cosine similarity.
emb simscore -m jina-v3 "The cat drifts toward sleep." "Sleep dances in the cat's eyes."
0.708945856730407

Document Indexing and Search

You can use the emb command to index documents and perform semantic search. emb uses chroma for the default vector database.

# index example documents in the current directory.
emb ingest-sample -m jina-v3 -c catcafe --corpus cat-names-en

# or, you can give the path to your documents.
# the documents should be in a CSV file with two columns: id and text. the separator should be comma.
emb ingest -m jina-v3 -c catcafe -f <path-to-your-documents>

# search for a query in the indexed documents.
emb search -m jina-v3 -c catcafe -q "Who's the naughtiest one?"
Found 5 results:
Score: 0.45097012297560646, Document ID: 12, Text: Leo: Leo, with his magnificent mane-like ruff, carries himself with regal confidence. He is a natural leader, often surveying his domain from the highest point in the room. Affectionate on his own terms, Leo enjoys a good chin scratch and will reward loyalty with his rumbling purr and majestic presence.
Score: 0.4291541094385421, Document ID: 46, Text: Bandit: Bandit is a mischievous cat, often with mask-like markings, always on the lookout for his next playful heist of a toy or treat. He is clever and energetic, loving to chase and pounce. Despite his roguish name, Bandit is a loving companion who enjoys a good cuddle after his adventures.
Score: 0.4137949268906759, Document ID: 20, Text: Pepper: Pepper is a feisty and energetic grey tabby with a spicy personality. She is quick-witted and loves to engage in playful stalking and pouncing games. Pepper is also fiercely independent but will show her affection with sudden bursts of purring and head-butts, keeping her humans on their toes.
Score: 0.40369800611316564, Document ID: 35, Text: Lucy: Lucy is a sweet-natured and playful cat, often a ginger or calico, with a bright personality. She loves attention and will often seek out her humans for cuddles and playtime. Lucy is very expressive, using chirps and meows to communicate her desires, her joyful spirit lighting up the household.
Score: 0.4031877012247693, Document ID: 3, Text: Pippin (Pip): Pippin, or Pip, is a compact dynamo, brimming with mischievous charm and boundless curiosity. He’s an intrepid explorer, always finding new hideouts or investigating forbidden territories with a twinkle in his eye. Quite vocal, Pip will happily chat about his day, his playful antics making him an endearing little rascal.

# multilingual search
emb search -m jina-v3 -c catcafe -q "一番のいたずら者は誰?"
Found 5 results:
Score: 0.41762481997209167, Document ID: 12, Text: Leo: Leo, with his magnificent mane-like ruff, carries himself with regal confidence. He is a natural leader, often surveying his domain from the highest point in the room. Affectionate on his own terms, Leo enjoys a good chin scratch and will reward loyalty with his rumbling purr and majestic presence.
Score: 0.40111028920595193, Document ID: 46, Text: Bandit: Bandit is a mischievous cat, often with mask-like markings, always on the lookout for his next playful heist of a toy or treat. He is clever and energetic, loving to chase and pounce. Despite his roguish name, Bandit is a loving companion who enjoys a good cuddle after his adventures.
Score: 0.37882908929187215, Document ID: 20, Text: Pepper: Pepper is a feisty and energetic grey tabby with a spicy personality. She is quick-witted and loves to engage in playful stalking and pouncing games. Pepper is also fiercely independent but will show her affection with sudden bursts of purring and head-butts, keeping her humans on their toes.
Score: 0.3777527161730029, Document ID: 22, Text: Simba: Simba, true to his namesake, possesses a brave and noble spirit, often seen patrolling his territory. He is a confident and affectionate leader of his household pride. While he enjoys playful roughhousing, Simba is also a gentle giant, offering comforting purrs and loyal companionship to his beloved humans.
Score: 0.37738051225556507, Document ID: 3, Text: Pippin (Pip): Pippin, or Pip, is a compact dynamo, brimming with mischievous charm and boundless curiosity. He’s an intrepid explorer, always finding new hideouts or investigating forbidden territories with a twinkle in his eye. Quite vocal, Pip will happily chat about his day, his playful antics making him an endearing little rascal.

Development

See the main README for general development instructions.

Run Tests

You need to have a Jina API key to run the tests for the embcli-jina package. You can set it up as an environment variable:

JINA_API_KEY=<YOUR_JINA_KEY> RUN_JINA_TESTS=1 uv run --package embcli-jina pytest packages/embcli-jina/tests/

Run Linter and Formatter

uv run ruff check --fix packages/embcli-jina
uv run ruff format packages/embcli-jina

Run Type Checker

uv run --package embcli-jina pyright packages/embcli-jina

Build

uv build --package embcli-jina

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embcli_jina-0.1.1.tar.gz (245.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embcli_jina-0.1.1-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file embcli_jina-0.1.1.tar.gz.

File metadata

  • Download URL: embcli_jina-0.1.1.tar.gz
  • Upload date:
  • Size: 245.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.9

File hashes

Hashes for embcli_jina-0.1.1.tar.gz
Algorithm Hash digest
SHA256 fe29c858c6217dc1c31b0401950cfa27ea54d35f943f71206c629c3d0868a1bc
MD5 bb995a57727b55880cefd18f93cf57c2
BLAKE2b-256 740105f2d26a0342d8cff2f60e9af08c48747cede4e8bc61d62f0ffa600874af

See more details on using hashes here.

File details

Details for the file embcli_jina-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for embcli_jina-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5c0c720a3e8e59ba952c7224c544308b23df64d4d1615b9ff95eff37a7486965
MD5 d6c220188079f01af5a6cd834f5e62c9
BLAKE2b-256 817c3ff6aca8865bb66c8879379807781a7c9ec50cca5dc2725a82047e36f3ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page