Skip to main content

Topic Model Images

Project description

PyPI - Python PyPI - PyPi docs PyPI - License Open In Colab

Concept

Concept is a technique that leverages CLIP and BERTopic-based techniques to perform Concept Modeling on images.

Since topics are part of conversations and text, they do not represent the context of images well. Therefore, these clusters of images are referred to as 'Concepts' instead of the traditional 'Topics'.

Thus, Concept Modeling takes inspiration from topic modeling techniques to cluster images, find common concepts and model them both visually using images and textually using topic representations.

Installation

Installation, with sentence-transformers, can be done using pypi:

pip install concept

Quick Start

First, we need to download and extract 25.000 images from Unsplash used in the sentence-transformers example:

import os
import glob
import zipfile
from tqdm import tqdm
from sentence_transformers import util

# 25k images from Unsplash
img_folder = 'photos/'
if not os.path.exists(img_folder) or len(os.listdir(img_folder)) == 0:
    os.makedirs(img_folder, exist_ok=True)

    photo_filename = 'unsplash-25k-photos.zip'
    if not os.path.exists(photo_filename):   #Download dataset if does not exist
        util.http_get('http://sbert.net/datasets/'+photo_filename, photo_filename)

    #Extract all images
    with zipfile.ZipFile(photo_filename, 'r') as zf:
        for member in tqdm(zf.infolist(), desc='Extracting'):
            zf.extract(member, img_folder)
img_names = list(glob.glob('photos/*.jpg'))

Next, we only need to pass images to Concept:

from concept import ConceptModel
concept_model = ConceptModel()
concepts = concept_model.fit_transform(img_names)

The resulting concepts can be visualized through concept_model.visualize_concepts():

However, to get the full experience, we need to label the concept clusters with topics. To do this, we need to create a vocabulary. We are going to feed our model with 50.000 nouns from the English vocabulary:

import random
import nltk
nltk.download("wordnet")
from nltk.corpus import wordnet as wn

all_nouns = [word for synset in wn.all_synsets('n') for word in synset.lemma_names() if "_" not in word]
selected_nouns = random.sample(all_nouns, 50_000)

Then, we can pass in the resulting selected_nouns to Concept:

from concept import ConceptModel

concept_model = ConceptModel()
concepts = concept_model.fit_transform(img_names, docs=selected_nouns)

Again, the resulting concepts can be visualized. This time however, we can also see the generated topics through concept_model.visualize_concepts():

NOTE: Use Concept(embedding_model="clip-ViT-B-32-multilingual-v1") to select a model that supports 50+ languages.

Search Concepts

We can quickly search for specific concepts by embedding a search term and finding the cluster embeddings that best represent them. As an example, let us search for the term beach and see what we can find. To do this, we simply run the following:

>>> concept_model.find_concepts("beach")
[(100, 0.277577825349102),
 (53, 0.27431058773894657),
 (95, 0.25973751319723837),
 (77, 0.2560122597417548),
 (97, 0.25361988261846297)]

Each tuple contains two values, the first is the concept cluster and the second the similarity to the search term. The top 5 similar topics are returned.

Now, let us visualize those concepts to see how well the search function works:

concept_model.visualize_concepts(concepts=[100, 53, 95, 77, 97])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

concept-0.2.1.tar.gz (13.4 kB view details)

Uploaded Source

Built Distribution

concept-0.2.1-py2.py3-none-any.whl (13.1 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file concept-0.2.1.tar.gz.

File metadata

  • Download URL: concept-0.2.1.tar.gz
  • Upload date:
  • Size: 13.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6

File hashes

Hashes for concept-0.2.1.tar.gz
Algorithm Hash digest
SHA256 2bd8f7a0d69c9617015c01d87e34cb6c27b6bf73ae7065b6f7959c0c03d96dca
MD5 4f70baa3ce794ccf8d6c5fdcccd90b3a
BLAKE2b-256 f4311b222be032e5b23663010b237748c35b7bfe3483bde0b7e4b1add35c2933

See more details on using hashes here.

File details

Details for the file concept-0.2.1-py2.py3-none-any.whl.

File metadata

  • Download URL: concept-0.2.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 13.1 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6

File hashes

Hashes for concept-0.2.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 12ecc6d8bfb2cec48b88359e0aa8982cf874294d9377e8f36019e7111c53a3b7
MD5 1c19577b63ea5527b0c2b29baf13ef6d
BLAKE2b-256 619fef7ffca5c0b41faa1dbdee49251bb3a7a98c5af70d286324e8535ad24785

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page