Skip to main content

Python client for the imgcache REST API

Project description

imgcache-client

Python client for the imgcache REST API — a centralized image cache with perceptual hashing and duplicate detection.

Installation

Install directly from the local package:

pip install ./client

Or add it to a requirements.txt:

dwilson-imgcache-client>=0.2.0

Requires: Python 3.11+, httpx>=0.27.0


Quick start

from imgcache_client import ImgCacheClient

client = ImgCacheClient("http://localhost:8010")

# Store an image
with open("photo.jpg", "rb") as f:
    entry = client.store(url="https://example.com/photo.jpg", file_bytes=f.read(), client_name="my_scraper")

print(entry["content_hash"])   # BLAKE2b hash / storage key
print(entry["perceptual_hash"]) # pHash for similarity search

# Retrieve raw bytes
img_bytes = client.get_bytes(entry["content_hash"])

# Retrieve metadata only
meta = client.get_meta(entry["content_hash"])

# Look up by source URL
meta = client.lookup("https://example.com/photo.jpg")

# Search by URL substring
results = client.search(url_contains="example.com")

# Find visually similar images
similar = client.similar(perceptual_hash=entry["perceptual_hash"], max_hamming_distance=4)

# Delete
client.delete(entry["content_hash"])

client.close()  # or use as a context manager (see below)

Use as a context manager to close the HTTP connection automatically:

with ImgCacheClient("http://localhost:8010") as client:
    entry = client.store(url=url, file_bytes=img_bytes, client_name="my_scraper")

API reference

ImgCacheClient(base_url, timeout=30.0)

Parameter Type Description
base_url str Base URL of the imgcache service, e.g. http://localhost:8010
timeout float Request timeout in seconds (default 30.0)

store(url, file_bytes, client_name, lookup_time=None, filename=None) → dict

Store an image in the cache.

Parameter Type Description
url str Source URL of the image
file_bytes bytes Raw binary image data
client_name str Identifier for the scraper submitting the image
lookup_time datetime (optional) When the image was fetched from origin; defaults to utcnow()
filename str (optional) Original filename hint (used for Content-Type detection)

Returns the full entry metadata dict. HTTP 201 means newly stored; HTTP 200 means an identical image already existed.


get_bytes(content_hash) → bytes

Download the raw binary content of a stored image by its BLAKE2b content hash.


get_meta(content_hash) → dict | None

Retrieve metadata for a stored image without downloading the binary. Returns None if not found.


lookup(url) → dict | None

Retrieve metadata for the most recent cached entry matching an exact source URL. Returns None if not found.


search(url_contains) → list[dict]

Return metadata for all cached entries whose source URL contains the given substring. Useful for finding all cached variants of a URL.

# Matches https://example.com/products/img1.jpg, /img2.jpg, etc.
results = client.search(url_contains="example.com/products")
for entry in results:
    print(entry["url"], entry["content_hash"])

similar(perceptual_hash, max_hamming_distance=4) → list[dict]

Find visually similar images by comparing perceptual hashes. max_hamming_distance controls how strict the match is — lower values mean more similar results.

similar = client.similar(perceptual_hash="f8c0e0e0f0e0c080", max_hamming_distance=4)

delete(content_hash) → None

Delete a cached image and its associated storage file by content hash.


health() → dict

Check that the service is reachable. Returns {"status": "ok"} when healthy.


close() → None

Close the underlying HTTP connection. Called automatically when used as a context manager.


Entry schema

Fields returned by store, get_meta, lookup, search, and similar:

Field Type Description
url str Source URL the image was fetched from
content_hash str BLAKE2b hash of the raw image bytes (storage key)
content_type str MIME type, e.g. image/jpeg
file_size_bytes int Size of the stored image in bytes
original_filename str | null Filename hint supplied at store time
width int | null Image width in pixels
height int | null Image height in pixels
perceptual_hash str | null pHash for similarity comparisons
client_name str Scraper that submitted the entry
lookup_time str (ISO 8601) When the image was fetched from origin
created_at str (ISO 8601) When the entry was stored in the cache

get_bytes returns raw bytes rather than a metadata dict.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dwilson_imgcache_client-0.3.0.tar.gz (3.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dwilson_imgcache_client-0.3.0-py3-none-any.whl (3.9 kB view details)

Uploaded Python 3

File details

Details for the file dwilson_imgcache_client-0.3.0.tar.gz.

File metadata

  • Download URL: dwilson_imgcache_client-0.3.0.tar.gz
  • Upload date:
  • Size: 3.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for dwilson_imgcache_client-0.3.0.tar.gz
Algorithm Hash digest
SHA256 4f882c6e386167fcd8dd447c3a082a5c75a2d3aaf90a327948964a212f39031d
MD5 52e972fd5bfd69b464c00740386fb360
BLAKE2b-256 1ed89ad108307a67277bd393b902666f89ae61f9441a72bc1e8aa255284a46f0

See more details on using hashes here.

File details

Details for the file dwilson_imgcache_client-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dwilson_imgcache_client-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ce74ae0f2c13724ab0fdaf58932261ee40d978bd25ff096f21af16a4dfa7a0c3
MD5 2d2710e1c4239c35603d09c1dfa7a088
BLAKE2b-256 96a6ee421153a3ba10c10f7ea9a1621e097d1634293ef1a239185ffadbf8060b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page