Skip to main content

CLIP inference with no big dependencies such as PyTorch, TensorFlow, Numpy or ONNX

Project description

Python bindings for clip.cpp

This package provides basic Python bindings for clip.cpp.

It requires no third-party libraries and no big dependencies such as PyTorch, TensorFlow, Numpy, ONNX etc.

Install

If you are on a X64 Linux distribution, you can simply Pip-install it:

pip install clip_cpp

Colab Notebook available for quick experiment :

Open In Colab

If you are on another operating system or architecture, or if you want to make use of support for instruction sets other than AVX2 (e.g., AVX512), you can build it from source. Se clip.cpp for more info.

All you need to do is to compile with the -DBUILD_SHARED_LIBS=ON option and copy libclip.so to examples/python_bindings/clip_cpp.

Usage

from clip_cpp import Clip

## you can either pass repo_id or .gguf file
## you can type `clip-cpp-models` in your terminal to see what models are available for download
## in case you pass repo_id and it has more than .bin file
## it's recommended to specify which file to download with `model_file`
repo_id = 'mys/ggml_CLIP-ViT-B-32-laion2B-s34B-b79K'
model_file = 'CLIP-ViT-B-32-laion2B-s34B-b79K_ggml-model-f16.gguf'

model = Clip(
    model_path_or_repo_id=repo_id,
    model_file=model_file,
    verbosity=2
)

text_2encode = 'cat on a Turtle'

tokens = model.tokenize(text_2encode)
text_embed = model.encode_text(tokens)

## load and extract embeddings of an image from the disk
image_2encode = '/path/to/cat.jpg'
image_embed = model.load_preprocess_encode_image(image_2encode)

## calculate the similarity between the image and the text
score = model.calculate_similarity(text_embed, image_embed)

# Alternatively, you can just do:
# score = model.compare_text_and_image(text, image_path)

print(f"Similarity score: {score}")

Clip Class

The Clip class provides a Python interface to clip.cpp, allowing you to perform various tasks such as text and image encoding, similarity scoring, and text-image comparison. Below are the constructor and public methods of the Clip class:

Constructor

def __init__(
    self, model_path_or_repo_id: str,
    model_file: Optional[str] = None,
    revision: Optional[str] = None,
    verbosity: int = 0):
  • Description: Initializes a Clip instance with the specified CLIP model file and optional verbosity level.
  • model_path_or_repo_id (str): The path to the CLIP model file file | HF repo_id.
  • model_file (str, optional): if model_path_or_repo_id is repo_id that has multiple .gguf files you can specify which .gguf file to download
  • verbosity (int, optional): An integer specifying the verbosity level (default is 0).

Public Methods

1. vision_config

@property
def vision_config(self) -> Dict[str, Any]:
  • Description: Retrieves the configuration parameters related to the vision component of the CLIP model.

2. text_config

@property
def text_config(self) -> Dict[str, Any]:
  • Description: Retrieves the configuration parameters related to the text component of the CLIP model.

3. tokenize

def tokenize(self, text: str) -> List[int]:
  • Description: Tokenizes a text input into a list of token IDs.
  • text (str): The input text to be tokenized.

4. encode_text

def encode_text(
    self, tokens: List[int], n_threads: int = os.cpu_count(), normalize: bool = True
) -> List[float]:
  • Description: Encodes a list of token IDs into a text embedding.
  • tokens (List[int]): A list of token IDs obtained through tokenization.
  • n_threads (int, optional): The number of CPU threads to use for encoding (default is the number of CPU cores).
  • normalize (bool, optional): Whether or not to normalize the output vector (default is True).

5. load_preprocess_encode_image

def load_preprocess_encode_image(
    self, image_path: str, n_threads: int = os.cpu_count(), normalize: bool = True
) -> List[float]:
  • Description: Loads an image, preprocesses it, and encodes it into an image embedding.
  • image_path (str): The path to the image file to be encoded.
  • n_threads (int, optional): The number of CPU threads to use for encoding (default is the number of CPU cores).
  • normalize (bool, optional): Whether or not to normalize the output vector (default is True).

6. calculate_similarity

def calculate_similarity(
    self, text_embedding: List[float], image_embedding: List[float]
) -> float:
  • Description: Calculates the similarity score between a text embedding and an image embedding.
  • text_embedding (List[float]): The text embedding obtained from encode_text.
  • image_embedding (List[float]): The image embedding obtained from load_preprocess_encode_image.

7. compare_text_and_image

def compare_text_and_image(
    self, text: str, image_path: str, n_threads: int = os.cpu_count()
) -> float:
  • Description: Compares a text input and an image file, returning a similarity score.
  • text (str): The input text.
  • image_path (str): The path to the image file for comparison.
  • n_threads (int, optional): The number of CPU threads to use for encoding (default is the number of CPU cores).

8. zero_shot_label_image

def zero_shot_label_image(
        self, image_path: str, labels: List[str], n_threads: int = os.cpu_count()
    ) -> Tuple[List[float], List[int]]:
  • Description: Zero-shot labels an image with given candidate labels, returning a tuple of sorted scores and indices.
  • image_path (str): The path to the image file to be labelled.
  • labels (List[str]): A list of candidate labels to be scored.
  • n_threads (int, optional): The number of CPU threads to use for encoding (default is the number of CPU cores).

9. __del__

def __del__(self):
  • Description: Destructor that frees resources associated with the Clip instance.

With the Clip class, you can easily work with the CLIP model for various natural language understanding and computer vision tasks.

Example

A basic example can be found in the clip.cpp examples.

python example_main.py --help
usage: clip [-h] -m MODEL [-fn FILENAME] [-v VERBOSITY] -t TEXT [TEXT ...] -i IMAGE                                     
                                                                                                                        
optional arguments:                                                                                                     
  -h, --help            show this help message and exit                                                                 
  -m MODEL, --model MODEL                                                                                               
                        path to GGML file or repo_id                                                                    
  -fn FILENAME, --filename FILENAME                                                                                     
                        path to GGML file in the Hugging face repo                                                      
  -v VERBOSITY, --verbosity VERBOSITY                                                                                   
                        Level of verbosity. 0 = minimum, 2 = maximum                                                    
  -t TEXT [TEXT ...], --text TEXT [TEXT ...]                                                                            
                        text to encode. Multiple values allowed. In this case, apply zero-shot labeling                 
  -i IMAGE, --image IMAGE                                                                                               
                        path to an image file                                                                           

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clip_cpp-0.5.0.tar.gz (354.1 kB view details)

Uploaded Source

Built Distribution

clip_cpp-0.5.0-py3-none-any.whl (354.0 kB view details)

Uploaded Python 3

File details

Details for the file clip_cpp-0.5.0.tar.gz.

File metadata

  • Download URL: clip_cpp-0.5.0.tar.gz
  • Upload date:
  • Size: 354.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for clip_cpp-0.5.0.tar.gz
Algorithm Hash digest
SHA256 c2cd0ffee9acd1e9521d82805295e7261e4160043d9e7a6f8b8ac2c75dccf52c
MD5 d5df2d2426a0501f5106d41761152977
BLAKE2b-256 041c35320fdebab41ee10bbffd7cbd449bf10f3b89b11a68fdadaa3e41499ebe

See more details on using hashes here.

File details

Details for the file clip_cpp-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: clip_cpp-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 354.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for clip_cpp-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ff05862e174aecbc1c7dd26db637373f28a3a3f6b11da2e899908d9851dace96
MD5 8be1dea5cc73d6a8af3ecf7bbe29b34b
BLAKE2b-256 97cbc43624299c31aa5a546d4a10d6480dbbdcb5c08a1e9b11c1e895efc081e4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page