Python client for the imgcache REST API
Project description
imgcache-client
Python client for the imgcache REST API — a centralized image cache with perceptual hashing and duplicate detection.
Installation
Install directly from the local package:
pip install ./client
Or add it to a requirements.txt:
dwilson-imgcache-client>=0.2.0
Requires: Python 3.11+, httpx>=0.27.0
Quick start
from imgcache_client import ImgCacheClient
client = ImgCacheClient("http://localhost:8010")
# Store an image
with open("photo.jpg", "rb") as f:
entry = client.store(url="https://example.com/photo.jpg", file_bytes=f.read(), client_name="my_scraper")
print(entry["content_hash"]) # BLAKE2b hash / storage key
print(entry["perceptual_hash"]) # pHash for similarity search
# Retrieve raw bytes
img_bytes = client.get_bytes(entry["content_hash"])
# Retrieve metadata only
meta = client.get_meta(entry["content_hash"])
# Look up by source URL
meta = client.lookup("https://example.com/photo.jpg")
# Search by URL substring
results = client.search(url_contains="example.com")
# Find visually similar images
similar = client.similar(perceptual_hash=entry["perceptual_hash"], max_hamming_distance=4)
# Delete
client.delete(entry["content_hash"])
client.close() # or use as a context manager (see below)
Use as a context manager to close the HTTP connection automatically:
with ImgCacheClient("http://localhost:8010") as client:
entry = client.store(url=url, file_bytes=img_bytes, client_name="my_scraper")
API reference
ImgCacheClient(base_url, timeout=30.0)
| Parameter | Type | Description |
|---|---|---|
base_url |
str |
Base URL of the imgcache service, e.g. http://localhost:8010 |
timeout |
float |
Request timeout in seconds (default 30.0) |
store(url, file_bytes, client_name, lookup_time=None, filename=None) → dict
Store an image in the cache.
| Parameter | Type | Description |
|---|---|---|
url |
str |
Source URL of the image |
file_bytes |
bytes |
Raw binary image data |
client_name |
str |
Identifier for the scraper submitting the image |
lookup_time |
datetime (optional) |
When the image was fetched from origin; defaults to utcnow() |
filename |
str (optional) |
Original filename hint (used for Content-Type detection) |
Returns the full entry metadata dict. HTTP 201 means newly stored; HTTP 200 means an identical image already existed.
get_bytes(content_hash) → bytes
Download the raw binary content of a stored image by its BLAKE2b content hash.
get_meta(content_hash) → dict | None
Retrieve metadata for a stored image without downloading the binary. Returns None if not found.
lookup(url) → dict | None
Retrieve metadata for the most recent cached entry matching an exact source URL. Returns None if not found.
search(url_contains) → list[dict]
Return metadata for all cached entries whose source URL contains the given substring. Useful for finding all cached variants of a URL.
# Matches https://example.com/products/img1.jpg, /img2.jpg, etc.
results = client.search(url_contains="example.com/products")
for entry in results:
print(entry["url"], entry["content_hash"])
similar(perceptual_hash, max_hamming_distance=4) → list[dict]
Find visually similar images by comparing perceptual hashes. max_hamming_distance controls how strict the match is — lower values mean more similar results.
similar = client.similar(perceptual_hash="f8c0e0e0f0e0c080", max_hamming_distance=4)
delete(content_hash) → None
Delete a cached image and its associated storage file by content hash.
health() → dict
Check that the service is reachable. Returns {"status": "ok"} when healthy.
close() → None
Close the underlying HTTP connection. Called automatically when used as a context manager.
Entry schema
Fields returned by store, get_meta, lookup, search, and similar:
| Field | Type | Description |
|---|---|---|
url |
str |
Source URL the image was fetched from |
content_hash |
str |
BLAKE2b hash of the raw image bytes (storage key) |
content_type |
str |
MIME type, e.g. image/jpeg |
file_size_bytes |
int |
Size of the stored image in bytes |
original_filename |
str | null |
Filename hint supplied at store time |
width |
int | null |
Image width in pixels |
height |
int | null |
Image height in pixels |
perceptual_hash |
str | null |
pHash for similarity comparisons |
client_name |
str |
Scraper that submitted the entry |
lookup_time |
str (ISO 8601) |
When the image was fetched from origin |
created_at |
str (ISO 8601) |
When the entry was stored in the cache |
get_bytes returns raw bytes rather than a metadata dict.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dwilson_imgcache_client-0.3.0.tar.gz.
File metadata
- Download URL: dwilson_imgcache_client-0.3.0.tar.gz
- Upload date:
- Size: 3.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f882c6e386167fcd8dd447c3a082a5c75a2d3aaf90a327948964a212f39031d
|
|
| MD5 |
52e972fd5bfd69b464c00740386fb360
|
|
| BLAKE2b-256 |
1ed89ad108307a67277bd393b902666f89ae61f9441a72bc1e8aa255284a46f0
|
File details
Details for the file dwilson_imgcache_client-0.3.0-py3-none-any.whl.
File metadata
- Download URL: dwilson_imgcache_client-0.3.0-py3-none-any.whl
- Upload date:
- Size: 3.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce74ae0f2c13724ab0fdaf58932261ee40d978bd25ff096f21af16a4dfa7a0c3
|
|
| MD5 |
2d2710e1c4239c35603d09c1dfa7a088
|
|
| BLAKE2b-256 |
96a6ee421153a3ba10c10f7ea9a1621e097d1634293ef1a239185ffadbf8060b
|