Skip to main content

Standalone CLI for OSS Vector operations with DashScope embeddings

Project description

Alibaba Cloud OSS Vectors Embed CLI

Alibaba Cloud OSS Vectors Embed CLI is a standalone command-line tool that simplifies the process of working with vector embeddings in OSS Vectors. You can create vector embeddings for your data using Alibaba Cloud DashScope and store and query them in your OSS vector index using single commands.

Alibaba Cloud OSS Vectors Embed CLI is in preview release and is subject to change.

Supported Commands

oss-vectors-embed put: Embed text, file content, or OSS objects and store them as vectors in an OSS vector index. You can create and ingest vector embeddings into an OSS vector index using a single put command. You specify the data input you want to create an embedding for, an Alibaba Cloud DashScope embeddings model ID, your OSS vector bucket name, and OSS vector index name. The command supports several input formats including text data, a local text or image file, an OSS image or text object or prefix. The command generates embeddings using the dimensions configured in your OSS vector index properties. If you are ingesting embeddings for several objects in an OSS prefix or local file path, it automatically uses batch processes to maximize throughput.

Note: Each file is processed as a single embedding. Document chunking is not currently supported.

oss-vectors-embed query: Embed a query input and search for similar vectors in an OSS vector index. You can perform similarity queries for vector embeddings in your OSS vector index using a single query command. You specify your query input, an Alibaba Cloud DashScope embeddings model ID, the vector bucket name, and vector index name. The command accepts several types of query inputs like a text string, an image file, or a single OSS text or image object. The command generates embeddings for your query using the input embeddings model and then performs a similarity search to find the most relevant matches. You can control the number of results returned, apply metadata filters to narrow your search, and choose whether to include similarity distance in the results for comprehensive analysis.

Supported Input Types

Note: This CLI has introduced a unified --dashscope-inference-params parameter for all model-specific parameters. Additionally, the query command uses the following separate parameters:

  • --text-value: Direct text query string (preferred for text queries)
  • --text: Text file path (local file or OSS URI)
  • --image: Image file path (local file or OSS URI or URI)
  • --video: Video file path (URI)

Installation and Configuration

Prerequisites

  • Python 3.9 or higher
  • To execute the CLI, you will need Alibaba Cloud credentials configured.
  • Update your Alibaba Cloud account with appropriate permissions to use Alibaba Cloud DashScope and OSS Vectors
  • Access to an Alibaba Cloud DashScope embedding model
  • Create an Alibaba Cloud OSS vector bucket and vector index to store your embeddings

Quick Install (Recommended)

pip install oss-vectors-embed-cli

Development Install

# Clone the repository
git clone https://github.com/aliyun/oss-vectors-embed-cli.git
cd oss-vectors-embed-cli

# Install in development mode
pip install -e .

Note: All dependencies are automatically installed when you install the package via pip.

Quick Start

Configure credentials

  1. Configure OSS credentials values from the environment variables:
export OSS_ACCESS_KEY_ID="your access key id"
export OSS_ACCESS_KEY_SECRET="your access key secrect"
  1. Configure DASHSCOPE API key from the environment variables:
export  DASHSCOPE_API_KEY="YOUR_DASHSCOPE_API_KEY"

Put Examples

** Examples for the Text-Embedding Model **

Note: There are four general text vector models: text-embedding-v1, text-embedding-v2, text-embedding-v3, and text-embedding-v4. Here, we use text-embedding-v4 as an example.

  1. Embed text and store them as vectors in your OSS vector index:
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id text-embedding-v4 \
  --text-value "Hello, world!"
  1. Process local text files:
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id text-embedding-v4 \
  --text "./documents/sample.txt"
  1. Process files from a local file path using wildcard characters:
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id text-embedding-v4 \
  --text "./documents/*.txt"
  1. Process files from an OSS general purpose bucket using wildcard characters:
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  put \
  --region cn-hangzhou \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id text-embedding-v4 \
  --text "oss://bucket/path/*"
  1. Add metadata alongside your vectors:
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  put \
  --region cn-hangzhou \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id text-embedding-v4 \
  --text "oss://my-bucket/sample.txt" \
  --metadata '{"category": "technology", "version": "1.0"}'
  1. Use custom model parameters:
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id text-embedding-v4 \
  --text-value "Custom parameters" \
  --dashscope-inference-params '{"output_type": "dense", "dimension": "1024"}'
  1. Use custom vector key:
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id text-embedding-v4 \
  --text-value "Custom vector key" \
  --key "text-1"
  1. Use OSS object key as vector key:
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  put \
  --region cn-hangzhou \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id text-embedding-v4 \
  --text "oss://my-bucket/sample.txt" \
  --filename-as-key
  1. Use filename as vector key for batch processing:
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id text-embedding-v4 \
  --text "./documents/*.txt" \
  --filename-as-key
  1. Use key prefix with custom key:
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id text-embedding-v4 \
  --text-value "Use key prefix with custom key" \
  --key "text-1" \
  --key-prefix "prefix-a/"
  1. Use key prefix with filename:
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id text-embedding-v4 \
  --text "./documents/sample.txt" \
  --filename-as-key \
  --key-prefix "prefix-doc/"

** Examples for the Multimodal-Embedding Model **

Note: There are four general multimodal vector models: multimodal-embedding-v1, tongyi-embedding-vision-flash, tongyi-embedding-vision-plus, and qwen2.5-vl-embedding. Here, we use qwen2.5-vl-embedding as an example.

  1. Embed multimodal and store them as vectors in your OSS vector index:
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id qwen2.5-vl-embedding \
  --text-value "Hello, world!"
  1. Process image files using a local file path:
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id qwen2.5-vl-embedding \
  --image "./images/photo.jpg"
  1. Process video files using an url:
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id qwen2.5-vl-embedding \
  --video "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250107/lbcemt/new+video.mp4"
  1. Process files from a local file path using wildcard characters:
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id qwen2.5-vl-embedding \
  --image "./documents/*.jpg"
  1. Process files from an OSS general purpose bucket using wildcard characters:
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  put \
  --region cn-hangzhou \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id qwen2.5-vl-embedding \
  --image "oss://bucket/path/*"
  1. Access video files in OSS using presign URL:
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  put \
  --region cn-hangzhou \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id qwen2.5-vl-embedding \
  --video "oss://bucket/path/example.mp4" \
  --presign-url

Query Examples

  1. Direct text query:
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  query \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id text-embedding-v4 \
  --text-value "query text" \
  --top-k 20
  1. Query using a local text file:
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  query \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id text-embedding-v4 \
  --text "./documents/query.txt" \
  --top-k 20 \
  --output table
  1. Query using an OSS text file:
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  query \
  --region cn-hangzhou \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id text-embedding-v4 \
  --text "oss://my-bucket/query.txt" \
  --top-k 20 
  1. Image query:
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  query \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id qwen2.5-vl-embedding \
  --image "./documents/image.jpg" \
  --top-k 20 
  1. Text: Query with metadata filters:
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  query \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id text-embedding-v4 \
  --text-value "query text" \
  --filter '{"category": {"$eq": "technology"}}' \
  --top-k 20 \
  --return-metadata
  1. Text: Query with multiple metadata filters (AND):
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  query \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id text-embedding-v4 \
  --text-value "query text" \
  --filter '{"$and": [{"category": "technology"}, {"version": "1.0"}]}' \
  --top-k 20 \
  --return-metadata
  1. Text: Query with multiple metadata filters (OR):
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  query \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id text-embedding-v4 \
  --text-value "query text" \
  --filter '{"$or": [{"category": "docs"}, {"category": "guides"}]}' \
  --top-k 20
  1. Text: Query with metadata filters (comparison operators):
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  query \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id text-embedding-v4 \
  --text-value "query text" \
  --filter '{"$and": [{"category": "tech"}, {"version": {"$eq": "1.0"}}]}' \
  --top-k 20
  1. Qwen2.5: Query with custom model parameters:
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  query \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id qwen2.5-vl-embedding \
  --text-value "search query with custom truncation" \
  --dashscope-inference-params '{"truncate": "END"}' \
  --top-k 20 \
  --return-distance
  1. Query in debug mode.:
oss-vectors-embed \
  --account-id 12***345 \
  --vectors-region cn-hangzhou \
  --debug \
  query \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id text-embedding-v4 \
  --text-value "query text" \
  --top-k 20

Command Parameters

Global Options

  • --debug: Enable debug mode with detailed logging for troubleshooting
  • --account-id: Alibaba Cloud account id
  • --vectors-region: OSS vectors bucket region name (config defaults)
  • --vectors-endpoint: The domain names that other services can use to access OSS vectors bucket

Put Command Parameters

Required:

  • --vector-bucket-name: Name of the OSS vector bucket
  • --index-name: Name of the vector index in your vector index to store the vector embeddings
  • --model-id: DashScope model ID to use for generating embeddings (e.g., text-embedding-v4, qwen2.5-vl-embedding)

Input Options (one required):

  • --text-value: Direct text input to embed
  • --text: Text input - supports multiple input types:
    • Local file: ./document.txt
    • Local files with wildcard characters: ./data/*.txt, ~/docs/*.md
    • OSS object: oss://bucket/path/file.txt
    • OSS path with wildcard characters: oss://bucket/path/* (prefix-based, not extension-based)
  • --image: Image input - supports multiple input types:
    • Local file: ./document.jpg
    • Local wildcard: ./data/*.jpg
    • OSS object: oss://bucket/path/file.jpg
    • URI: https://path/pic.jpg
    • OSS path with wildcard characters: oss://bucket/path/* (prefix-based, not extension-based)
  • --video: Video input - supports:
    • URI: https://path/video.mp4

Optional:

  • --region: OSS region name (effective in OSS path mode)
  • --key: Uniquely identifies each vector in the vector index (default: auto-generated UUID)
  • --key-prefix: Prefix to prepend to all vector keys (works with --key, --filename-as-key, and auto-generated UUIDs)
  • --filename-as-key: Use filename as vector key (mutually exclusive with --key)
  • --metadata: Additional metadata associated with the vector; provided as JSON string
  • --dashscope-inference-params: Model-specific parameters passed to DashScope (JSON format, e.g., '{"dimension": "1024"}')
  • --max-workers: Maximum parallel workers for batch processing (default: 4)
  • --batch-size: Number of vectors per OSS Vector put_vectors call (1-500, default: 500)
  • --output: Output format (json or table, default: json)

Query Command Parameters

Core Required Parameters:

  • --vector-bucket-name: Name of the OSS vector bucket
  • --index-name: Name of the vector index
  • --model-id:DashScope model ID to use for generating embeddings (e.g., text-embedding-v4, qwen2.5-vl-embedding)

Query Input Parameters (One Required):

  • --text-value: Direct text query string
  • --text: Text file path (local file or OSS URI)
  • --image: Image file path (local file or OSS URI or URI)
  • --video: Video file path (URI)

Optional Parameters:

  • --region: OSS region name (effective in OSS path mode)
  • --top-k: Number of results to return (default: 30)
  • --filter: Filter expression for metadata-based filtering (JSON format with Alibaba Cloud OSS Vectors API operators)
  • --dashscope-inference-params: Model-specific parameters passed to DashScope (JSON format, e.g., '{"truncate": "END"}')
  • --return-metadata: Include metadata in results (default: true)
  • --return-distance: Include similarity distance scores
  • --output: Output format (table or json, default: json)

Query Examples:

# Direct text query (preferred method)
oss-vectors-embed --account-id 12***345 --vectors-region cn-hangzhou query --vector-bucket-name my-bucket --index-name my-index \
  --model-id text-embedding-v4 --text-value "search text" --top-k 10

# Text file query
oss-vectors-embed --account-id 12***345 --vectors-region cn-hangzhou query --vector-bucket-name my-bucket --index-name my-index \
  --model-id text-embedding-v4 --text ./query.txt --top-k 5

# Image query
oss-vectors-embed --account-id 12***345 --vectors-region cn-hangzhou query --vector-bucket-name my-bucket --index-name my-index \
  --model-id text-embedding-v4 --image ./query-image.jpg --top-k 3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oss_vectors_embed_cli-0.1.1-py3-none-any.whl (40.7 kB view details)

Uploaded Python 3

File details

Details for the file oss_vectors_embed_cli-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for oss_vectors_embed_cli-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c7e99246e773b91c65a2aa2c669d3ef4c44acb0fb2af842857cd142bb9068153
MD5 cc5e51cff7fadf5be6c9bbb9515fa2c0
BLAKE2b-256 dac84fc80b9ac9762df49e1e294c14650682aab7f450bb007bc180b56f21ab81

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page