Skip to main content

finding similar 1-token words on OpenAI's CLIP.

Project description

The clip_similarwords is the implementation of finding similar 1-token words of OpenAI's CLIP in less than one second.

OpenAI's CLIP uses text-image similarities so its text-text similarities may also be text's typical image similarities unlike WordNet or other synonym dictionaries.

Note that, for speed and storage reason (PyPI is limited to 60MB), the words composed by 2 or more tokens are not supported.

Installation

clip_similarwords is easily installable via pip command:

pip install clip_similarwords

or

pip install git+https://github.com/nazodane/clip_similarwords.git

Usage of the command

~/.local/bin/clip-similarwords [ word_fragment | --all ]

Usage of the module

from clip_similarwords import CLIPTextSimilarWords
clipsim = CLIPTextSimilarWords()
for key_token, sim_token, cos_similarity in clipsim("cat"):
    print("%s -> %s ( cos_similarity: %.2f )"%(key_token, sim_token, cos_similarity))

Requirements for model uses

  • Linux (should also works on other environmets)

no PyTorch nor CUDA are required.

Requirements for model generation

  • Linux
  • Python 3.10 or later
  • PyTorch 1.13 or later
  • CUDA 11.7 or later
  • DRAM 16GB or higher
  • RTX 3060 12GB or higher

The patches and informations on other enviroments are surely welcome!

License

The codes are under MIT License. The model was converted under Japanese law.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clip_similarwords-0.0.4.1.tar.gz (8.0 MB view details)

Uploaded Source

Built Distribution

clip_similarwords-0.0.4.1-py3-none-any.whl (8.2 MB view details)

Uploaded Python 3

File details

Details for the file clip_similarwords-0.0.4.1.tar.gz.

File metadata

  • Download URL: clip_similarwords-0.0.4.1.tar.gz
  • Upload date:
  • Size: 8.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.64.1 urllib3/1.26.12 CPython/3.10.6

File hashes

Hashes for clip_similarwords-0.0.4.1.tar.gz
Algorithm Hash digest
SHA256 ee5868804402b0c2708ef323b704e0861b50c88e6c546922aa8fef5c983e39a7
MD5 267f72fe2c541671b36ce4bf8f1185e2
BLAKE2b-256 c0cb2dd0e347be71e2f88c076203a24f002773275d8cf1bb6e8960a2469cd6ba

See more details on using hashes here.

File details

Details for the file clip_similarwords-0.0.4.1-py3-none-any.whl.

File metadata

  • Download URL: clip_similarwords-0.0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 8.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.64.1 urllib3/1.26.12 CPython/3.10.6

File hashes

Hashes for clip_similarwords-0.0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ba70a5003c1d547489846d371442e445c1a2355d809cfda67689cdaff85cfb87
MD5 4648d43f8619ae4775542e0de4bd14d1
BLAKE2b-256 f324076e9bf05d4030e97b2b28c280fcac714237e70d1ab57f80293a20e7a82a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page