Skip to main content

No project description provided

Project description

Find Anything

Python 3.12+ Coverage Type Checked Ruff MIT License

Zero-shot object detection library that finds objects in rich images by matching against reference images.

Installation

Requires Python 3.12+

# With CUDA 12.8 support
pip install find-anything[cuda128] --extra-index-url https://download.pytorch.org/whl/cu128

# CPU only
pip install find-anything[cpu]

Warning

Due to ultralytics being broken - FastSAM dependency, outdated for torch - you might need to apply a fix if you are seeing this error:

_pickle.UnpicklingError: Weights only load failed. This file can still be loaded...

The fix - add weights_only=False in <venv_path>\site-packages\ultralytics\nn\tasks.py line 518:

# before
return torch.load(file, map_location='cpu'), file
# after
return torch.load(file, map_location='cpu', weights_only=False), file

Make sure to only load trusted model weights as this bypasses some safety checks.

Usage

Basic Example - DINOv2 + FastSAM

from PIL import Image
from find_anything import (
    ZeroShotObjectMatcher,
    DinoV2FeatureEncoder,
    FastSAMMaskGenerator,
    DenseFeatureMaskPooler,
    TopKMaskSelector,
    BaseEmbeddingRepository,
)

device = "cuda"  # or "cpu"

encoder = DinoV2FeatureEncoder(device=device)
mask_generator = FastSAMMaskGenerator(model_path="FastSAM-x.pt", device=device)
# Adjust min_mask_area as needed - smaller values allow smaller ROIs
mask_pooler = DenseFeatureMaskPooler(min_mask_area=1, device=device)
embedding_repository = BaseEmbeddingRepository(encoder=encoder)
mask_selector = TopKMaskSelector(base_embeddings=embedding_repository, top_k=5)

matcher = ZeroShotObjectMatcher(
    encoder=encoder,
    mask_generator=mask_generator,
    mask_pooler=mask_pooler,
    mask_selector=mask_selector,
    base_embeddings=embedding_repository,
    similarity_threshold=0.5,
)

reference_images = [
    Image.open("reference1.jpg"),
    Image.open("reference2.jpg"),
]
matcher.set_base_images(reference_images)

target_image = Image.open("target.jpg")
results = matcher.forward_from_image(target_image)

for result in results:
    print(f"Match found: similarity={result.similarity:.3f}, reference_idx={result.matched_base}")
    mask = result.mask  # torch.Tensor with segmentation mask

Architecture

The detector uses a two-stage coarse-to-fine matching approach that balances computational efficiency with matching accuracy.

Stage 1: Coarse Filtering (Pooled Dense Features)

graph TD
    A["Target Image"]
    B["Mask Generator model"]
    C["Instance Masks"]
    D["Feature encoder<br/>encode_dense"]
    E["Dense Feature Map<br/>HxWxD"]
    F["Pool Features per Mask"]
    G["Candidate Embeddings"]
    H["Reference Embeddings"]
    I["Top-K Selection"]

    A --> B
    B --> C
    A --> D
    D --> E
    C --> F
    E --> F
    F --> G
    H --> I
    G --> I

The first stage processes the entire image once through the encoder to produce a dense feature map where each spatial location has a feature vector. The mask generator simultaneously generates instance segmentation masks for all objects in the scene.

For each mask, features from the dense map are pooled within the masked region to produce a single embedding per candidate object. These pooled embeddings are compared against reference embeddings, and only the best candidates proceed to the next stage.

Stage 2: Fine Matching (Cropped Patch Encoding)

graph TD
    A["Top-K Masks"]
    B["Original Image"]
    C["Crop Bounding Box"]
    D["Feature encoder<br/>encode_mean"]
    E["Patch Embeddings"]
    F["Reference Embeddings"]
    G["Compare and Score"]
    H["Final Similarity Scores"]

    A --> C
    B --> C
    C --> D
    D --> E
    F --> G
    E --> G
    G --> H

The second stage takes each selected candidate mask, extracts the bounding box crop from the original image, and encodes it independently. This produces a more accurate embedding:

  1. The cropped patch is resized to the model's native resolution, giving the object more pixels and detail
  2. The encoding focuses entirely on the object without background interference from pooling
  3. Spatial information is better preserved compared to averaging over irregular mask shapes

Why Two Stages?

The coarse stage acts as a fast filter to eliminate obviously non-matching regions. Without it, every detected mask would require a separate forward pass through the encoder, which is expensive - with possibly dozen of masks generated, this would significantly slow down the process for rich scenes.

The fine stage provides the accuracy needed for reliable matching. Pooled dense features can miss details or be contaminated by background pixels that fall within the mask but don't belong to the object.

Together the stages achieve both speed (single dense encoding + K patch encodings) and accuracy (dedicated high-resolution encoding of important regions for final candidates).

For the concrete implementations provided, we use DINOv2 as the feature encoder and FastSAM as the mask generator - the latter's good performance is crucial to ensure high-quality region proposals for effective matching. I recommend using the "x" variant for best results.

Components

Component Role
FeatureEncoder Extracts semantic features
MaskGenerator Generates instance segmentation masks for all objects
MaskPooler Pools dense features within mask regions
MaskSelector Selects top mask candidates
EmbeddingRepository Stores reference image embeddings
ZeroShotObjectMatcher Orchestrates the full pipeline

The modular design allows swapping different encoders, mask generators, or pooling/selection strategies as needed.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

find_anything-1.3.0.tar.gz (12.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

find_anything-1.3.0-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file find_anything-1.3.0.tar.gz.

File metadata

  • Download URL: find_anything-1.3.0.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for find_anything-1.3.0.tar.gz
Algorithm Hash digest
SHA256 0e7e2d1be5cac75a0dd48802e676719916cc3eb717871be9b4f28a8d37d235af
MD5 215f734c4f4c5f0dce3746384b9da548
BLAKE2b-256 f0471908e3801c30776a9e18129c4605c81d494137c27f2b8c5233c212b28e31

See more details on using hashes here.

Provenance

The following attestation bundles were made for find_anything-1.3.0.tar.gz:

Publisher: deploy.yml on starswaterbrook/find-anything

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file find_anything-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: find_anything-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for find_anything-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7d2f089a2b0367b26f5eff77a5b7043e049c10b42f55e01ec4f31f2430acc2ad
MD5 db7d61823a70786d58e2fb6b148f2bdb
BLAKE2b-256 0edb827bec91e6b00e38fff98ca3cbc0c0f0f22e50cf93d46d3b8008ac6a9a57

See more details on using hashes here.

Provenance

The following attestation bundles were made for find_anything-1.3.0-py3-none-any.whl:

Publisher: deploy.yml on starswaterbrook/find-anything

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page