A library for embedding, indexing, and applying semantic search for text and image data

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

harduex

These details have not been verified by PyPI

Project description

Deep Semantic Search

A Python library for embedding, indexing, and applying semantic search for text and image data.

Features

Multi-modal Semantic Search
- Embed and index text data using Sentence Transformers (paraphrase-multilingual-MiniLM-L12-v2)
- Embed and index image data using CLIP
- Search images by image or text queries
- Search text by semantic similarity
Clustering & Captioning
- Cluster image embeddings using PyTorch KMeans (GPU support)
- Caption images using BLIP
- Customizable LLM-powered topic labeling via callback
Retrieval-Augmented Generation (RAG)
- Answer questions based on text data
- Pluggable LLM via callback pattern

Installation

pip install deep-semantic-search

For development:

pip install deep-semantic-search[dev]

Quick Start

Image Search

from deep_semantic_search import LoadImageData, ImageIndexer, ImageSearcher

# Load images
loader = LoadImageData()
image_paths = loader.from_folder(["path/to/images"])

# Index images
indexer = ImageIndexer(image_paths)
indexer.run_index()

# Search by text
searcher = ImageSearcher(indexer)
results = searcher.search_by_text("cat on a sofa", n=5)
for path, score in results.items():
    print(f"{score:.3f}  {path}")

# Search by image
results = searcher.search_by_image("query.jpg", n=5)

Text Search

from deep_semantic_search import LoadTextData, TextEmbedder, TextSearch

# Load text data
loader = LoadTextData()
corpus = loader.from_folder("path/to/text/files")

# Embed
embedder = TextEmbedder()
embedder.embed(corpus)

# Search
search = TextSearch(embedder)
results = search.find_similar("your search query", top_n=5)
for r in results:
    print(f"Score: {r['score']:.3f}  {r['path']}")

Image Clustering

from deep_semantic_search import ImageIndexer, ImageClusterer, ImageCaptioner

indexer = ImageIndexer(image_paths)
indexer.run_index()

# Optional: use captioner for topic labels
captioner = ImageCaptioner()
clusterer = ImageClusterer(indexer)
result = clusterer.cluster(n_clusters=5, captioner=captioner)

# Save organized clusters to disk
clusterer.save_clusters("./output/clusters")

RAG (Question Answering)

from deep_semantic_search import ask_question

texts = ["Document 1 content...", "Document 2 content..."]
answer = ask_question(texts, "What is the main topic?")
print(answer)

# With a custom LLM
answer = ask_question(texts, "Summarize this.", llm_fn=my_custom_llm)

Custom Data Paths

By default, metadata is stored in ~/.deep-semantic-search/. Override per instance:

indexer = ImageIndexer(image_paths, metadata_dir="./my_project/index")
embedder = TextEmbedder(metadata_dir="./my_project/text_index")

API Reference

Image Module

LoadImageData — Load image paths from folders or CSV
ImageIndexer — CLIP embedding + FAISS indexing
ImageSearcher — Image/text similarity search
ImageClusterer — KMeans clustering with topic labeling
ImageCaptioner — BLIP image captioning

Text Module

LoadTextData — Load text from folders (.txt/.html) or CSV
TextEmbedder — Sentence Transformer embeddings
TextSearch — Cosine similarity search

RAG

ask_question() — RAG Q&A with pluggable LLM

Exceptions

DeepSemanticSearchError — Base exception
IndexNotFoundError, ModelLoadError, SearchError, EmbeddingError, ClusteringError

CLI Tool

The package includes dss, a command-line interface for all major features. After installing the package, the dss command is available globally.

General Usage

dss --help          # Show all commands
dss --version       # Show version
dss <command> --help  # Help for a specific command

Global flags: -v/--verbose for debug output, -q/--quiet to suppress progress.

Image Search

Search images by text query or by image similarity:

# Search by text
dss image-search --folder ./photos --query "sunset over the ocean" --top 5

# Search by image
dss image-search --folder ./photos --query ./photos/reference.jpg --top 10

# Multiple folders, JSON output
dss image-search -f ./photos -f ./vacation --query "mountains" --format json

# Force re-indexing
dss image-search -f ./photos --query "cat" --reindex

Text Search

Search text documents by semantic similarity:

dss text-search --folder ./documents "machine learning algorithms" --top 5

# CSV output
dss text-search -f ./docs "neural networks" --format csv

# Custom model
dss text-search -f ./docs "query" --model sentence-transformers/all-MiniLM-L6-v2

Image Clustering

Cluster images using KMeans on CLIP embeddings:

# Basic clustering
dss image-cluster --folder ./photos --clusters 5

# With BLIP captioning for topic labels
dss image-cluster -f ./photos -k 5 --caption

# Save clustered images into organized folders
dss image-cluster -f ./photos -k 8 --caption --save-dir ./output/clusters

# JSON output
dss image-cluster -f ./photos -k 3 --format json

RAG (Question Answering)

Ask questions over text documents using Retrieval-Augmented Generation:

dss ask --folder ./documents "What is the main conclusion?"

# Custom Ollama model
dss ask -f ./research "Summarize the findings" --model llama2:13b

# Adjust chunking
dss ask -f ./docs "question" --chunk-size 2000 --chunk-overlap 200

Configuration

The CLI respects environment variables:

OLLAMA_LLM_MODEL — LLM model for RAG (default: gemma4:e4b)
DEFAULT_SEARCH_FOLDER_PATH — Default folder path

All CLI flags override environment variables when provided.

Requirements

Python >= 3.10
PyTorch, Sentence Transformers, Transformers, FAISS, LangChain, and more (auto-installed)

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

harduex

These details have not been verified by PyPI

Release history Release notifications | RSS feed

3.0.3

Apr 13, 2026

3.0.2

Apr 13, 2026

3.0.1

Apr 13, 2026

3.0.0

Apr 13, 2026

2.0.0

Apr 13, 2026

This version

1.1.4

Apr 13, 2026

1.1.3

Apr 13, 2026

1.1.2

Apr 13, 2026

1.1.1

Apr 13, 2026

1.1.0

Apr 13, 2026

0.1.1

Mar 30, 2025

0.1.0

Mar 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deep_semantic_search-1.1.4.tar.gz (23.1 kB view details)

Uploaded Apr 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

deep_semantic_search-1.1.4-py3-none-any.whl (22.3 kB view details)

Uploaded Apr 13, 2026 Python 3

File details

Details for the file deep_semantic_search-1.1.4.tar.gz.

File metadata

Download URL: deep_semantic_search-1.1.4.tar.gz
Upload date: Apr 13, 2026
Size: 23.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for deep_semantic_search-1.1.4.tar.gz
Algorithm	Hash digest
SHA256	`1bf129b030ac04068f4ee47f236e46bda383d4ad79d6d7db5ba07cb07d597ff8`
MD5	`3eda91a189ab7f5efbedc5abf8f49f3f`
BLAKE2b-256	`0c28bdfdadbb73ac4776f6e5cd44a79f3b3432d52fa8e99294329de6474446dc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for deep_semantic_search-1.1.4.tar.gz:

Publisher: publish.yml on Harduex/deep-semantic-search

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: deep_semantic_search-1.1.4.tar.gz
- Subject digest: 1bf129b030ac04068f4ee47f236e46bda383d4ad79d6d7db5ba07cb07d597ff8
- Sigstore transparency entry: 1283246829
- Sigstore integration time: Apr 13, 2026
Source repository:
- Permalink: Harduex/deep-semantic-search@d3a6c90583da780338d71936b1e69e3c655a0425
- Branch / Tag: refs/tags/v1.1.4
- Owner: https://github.com/Harduex
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d3a6c90583da780338d71936b1e69e3c655a0425
- Trigger Event: push

File details

Details for the file deep_semantic_search-1.1.4-py3-none-any.whl.

File metadata

Download URL: deep_semantic_search-1.1.4-py3-none-any.whl
Upload date: Apr 13, 2026
Size: 22.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for deep_semantic_search-1.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3b142a3f37bbb512a0b1e1eb904f2c8524aaa02c4f0c9da8c1bc8b0ec172dd1e`
MD5	`6fd4322afb26f40b7286f9993ec3bea1`
BLAKE2b-256	`6de6923fe3ee5c8170f80508b199dcbb3626848f66caa0e8fc0617d990f639b6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for deep_semantic_search-1.1.4-py3-none-any.whl:

Publisher: publish.yml on Harduex/deep-semantic-search

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: deep_semantic_search-1.1.4-py3-none-any.whl
- Subject digest: 3b142a3f37bbb512a0b1e1eb904f2c8524aaa02c4f0c9da8c1bc8b0ec172dd1e
- Sigstore transparency entry: 1283246916
- Sigstore integration time: Apr 13, 2026
Source repository:
- Permalink: Harduex/deep-semantic-search@d3a6c90583da780338d71936b1e69e3c655a0425
- Branch / Tag: refs/tags/v1.1.4
- Owner: https://github.com/Harduex
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d3a6c90583da780338d71936b1e69e3c655a0425
- Trigger Event: push

deep-semantic-search 1.1.4

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Deep Semantic Search

Features

Installation

Quick Start

Image Search

Text Search

Image Clustering

RAG (Question Answering)

Custom Data Paths

API Reference

Image Module

Text Module

RAG

Exceptions

CLI Tool

General Usage

Image Search

Text Search

Image Clustering

RAG (Question Answering)

Configuration

Requirements

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance