emb3d.co command line inteface to work with embeddings.
Project description
emb3d
emb3d
is a command-line utility that lets you generate embeddings using models from OpenAI, Cohere and HuggingFace.
Installation
pip install --upgrade emb3d
Quick Start โก๏ธ
Install the library
pip install -U emb3d
Prepare your input file
emb3d expects a JSONL file as input. Each line of the file should be a JSON object with a text
key. Example input file:
{"text": "I love my dog"}
{"text": "I love my cat"}
{"text": "I love my rabbit"}
Your files can optionally have other fields like ids, categorical labels etc.. and they are saved as-is in the final output file.
Compute embeddings
The default model is OpenAI's text-embedding-ada-002
. You can change the model by passing the --model-id
flag.
emb3d compute inputs.jsonl
You will need to have OPENAI_API_KEY set in your environment. You can also pass it as a flag (--api_key
) or set it in a config file.
emb3d config set openai_token YOUR-OPENAI-API-KEY
emb3d compute inputs.jsonl
emb3d compute inputs.jsonl --model-id embed-english-v2.0 --output-file cohere-embeddings.jsonl
For COHERE models, you will need to have COHERE_API_KEY set in your environment. You can also pass it as a flag (--api_key
) or set it in a config file with: emb3d config set cohere_token YOUR-COHERE-API-KEY
.
Visualize your embeddings ๐ฅ
The last step is to visualize your embeddings. This will open a browser window with a visualization of your last computed embeddings.
emb3d visualize
You can alternatively pass the path to the computed embeddings file:
emb3d visualize run-2020-embeddings.jsonl
Profit ๐ฐ
Usage
Usage: emb3d [OPTIONS] INPUT_FILE COMMAND [ARGS]...
Generate embeddings for fun and profit.
โญโ Arguments โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ * input_file PATH Path to the input file. [default: None] [required] โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ --model-id TEXT ID of the embedding model. โ
โ Default is โ
โ `text-embedding-ada-002`. โ
โ [default: None] โ
โ --output-file -out,-o PATH Path to the output file. If not โ
โ provided, a default path will be โ
โ suggested. โ
โ [default: None] โ
โ --api-key TEXT API key for the backend. If not โ
โ provided, it will be prompted or โ
โ fetched from environment โ
โ variables. โ
โ [default: None] โ
โ --remote --local Choose whether to do inference โ
โ locally or with an API token. โ
โ This choice is available for โ
โ sentence transformer and hugging โ
โ face models. If a model cannot be โ
โ run locally (ex: OpenAI models), โ
โ this flag is ignored. โ
โ [default: remote] โ
โ --max-concurrent-requests INTEGER (Remote Execution) Maximum number โ
โ of concurrent requests for the โ
โ embedding task. Default is 1000. โ
โ [default: 1000] โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
โญโ Commands โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ config Get or set a configuration value. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.