No project description provided

Project description

Find Anything

Zero-shot object detection library that finds objects in rich images by matching against reference images.

Installation

Requires Python 3.12+

# With CUDA 12.8 support
pip install find-anything[cuda128] --extra-index-url https://download.pytorch.org/whl/cu128

# CPU only
pip install find-anything[cpu]

Warning

Due to ultralytics being broken - FastSAM dependency, outdated for torch - you might need to apply a fix if you are seeing this error:

_pickle.UnpicklingError: Weights only load failed. This file can still be loaded...

The fix - add weights_only=False in <venv_path>\site-packages\ultralytics\nn\tasks.py line 518:

# before
return torch.load(file, map_location='cpu'), file
# after
return torch.load(file, map_location='cpu', weights_only=False), file

Make sure to only load trusted model weights as this bypasses some safety checks.

Usage

Basic Example - DINOv2 + FastSAM

from PIL import Image
from find_anything import (
    ZeroShotObjectMatcher,
    DinoV2FeatureEncoder,
    FastSAMMaskGenerator,
    DenseFeatureMaskPooler,
    TopKMaskSelector,
    BaseEmbeddingRepository,
)

device = "cuda"  # or "cpu"

encoder = DinoV2FeatureEncoder(device=device)
mask_generator = FastSAMMaskGenerator(model_path="FastSAM-x.pt", device=device)
# Adjust min_mask_area as needed - smaller values allow smaller ROIs
mask_pooler = DenseFeatureMaskPooler(min_mask_area=1, device=device)
embedding_repository = BaseEmbeddingRepository(encoder=encoder)
mask_selector = TopKMaskSelector(base_embeddings=embedding_repository, top_k=5)

matcher = ZeroShotObjectMatcher(
    encoder=encoder,
    mask_generator=mask_generator,
    mask_pooler=mask_pooler,
    mask_selector=mask_selector,
    base_embeddings=embedding_repository,
    similarity_threshold=0.5,
)

reference_images = [
    Image.open("reference1.jpg"),
    Image.open("reference2.jpg"),
]
matcher.set_base_images(reference_images)

target_image = Image.open("target.jpg")
results = matcher.forward_from_image(target_image)

for result in results:
    print(f"Match found: similarity={result.similarity:.3f}, reference_idx={result.matched_base}")
    mask = result.mask  # torch.Tensor with segmentation mask

Architecture

The detector uses a two-stage coarse-to-fine matching approach that balances computational efficiency with matching accuracy.

Stage 1: Coarse Filtering (Pooled Dense Features)

graph TD
    A["Target Image"]
    B["Mask Generator model"]
    C["Instance Masks"]
    D["Feature encoder<br/>encode_dense"]
    E["Dense Feature Map<br/>HxWxD"]
    F["Pool Features per Mask"]
    G["Candidate Embeddings"]
    H["Reference Embeddings"]
    I["Top-K Selection"]

    A --> B
    B --> C
    A --> D
    D --> E
    C --> F
    E --> F
    F --> G
    H --> I
    G --> I

The first stage processes the entire image once through the encoder to produce a dense feature map where each spatial location has a feature vector. The mask generator simultaneously generates instance segmentation masks for all objects in the scene.

For each mask, features from the dense map are pooled within the masked region to produce a single embedding per candidate object. These pooled embeddings are compared against reference embeddings, and only the best candidates proceed to the next stage.

Stage 2: Fine Matching (Cropped Patch Encoding)

graph TD
    A["Top-K Masks"]
    B["Original Image"]
    C["Crop Bounding Box"]
    D["Feature encoder<br/>encode_mean"]
    E["Patch Embeddings"]
    F["Reference Embeddings"]
    G["Compare and Score"]
    H["Final Similarity Scores"]

    A --> C
    B --> C
    C --> D
    D --> E
    F --> G
    E --> G
    G --> H

The second stage takes each selected candidate mask, extracts the bounding box crop from the original image, and encodes it independently. This produces a more accurate embedding:

The cropped patch is resized to the model's native resolution, giving the object more pixels and detail
The encoding focuses entirely on the object without background interference from pooling
Spatial information is better preserved compared to averaging over irregular mask shapes

Why Two Stages?

The coarse stage acts as a fast filter to eliminate obviously non-matching regions. Without it, every detected mask would require a separate forward pass through the encoder, which is expensive - with possibly dozen of masks generated, this would significantly slow down the process for rich scenes.

The fine stage provides the accuracy needed for reliable matching. Pooled dense features can miss details or be contaminated by background pixels that fall within the mask but don't belong to the object.

Together the stages achieve both speed (single dense encoding + K patch encodings) and accuracy (dedicated high-resolution encoding of important regions for final candidates).

For the concrete implementations provided, we use DINOv2 as the feature encoder and FastSAM as the mask generator - the latter's good performance is crucial to ensure high-quality region proposals for effective matching. I recommend using the "x" variant for best results.

Components

Component	Role
`FeatureEncoder`	Extracts semantic features
`MaskGenerator`	Generates instance segmentation masks for all objects
`MaskPooler`	Pools dense features within mask regions
`MaskSelector`	Selects top mask candidates
`EmbeddingRepository`	Stores reference image embeddings
`ZeroShotObjectMatcher`	Orchestrates the full pipeline

The modular design allows swapping different encoders, mask generators, or pooling/selection strategies as needed.

License

MIT

Project details

Release history Release notifications | RSS feed

This version

1.3.0

Jan 5, 2026

1.2.0

Dec 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

find_anything-1.3.0.tar.gz (12.7 kB view details)

Uploaded Jan 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

find_anything-1.3.0-py3-none-any.whl (14.2 kB view details)

Uploaded Jan 5, 2026 Python 3

File details

Details for the file find_anything-1.3.0.tar.gz.

File metadata

Download URL: find_anything-1.3.0.tar.gz
Upload date: Jan 5, 2026
Size: 12.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for find_anything-1.3.0.tar.gz
Algorithm	Hash digest
SHA256	`0e7e2d1be5cac75a0dd48802e676719916cc3eb717871be9b4f28a8d37d235af`
MD5	`215f734c4f4c5f0dce3746384b9da548`
BLAKE2b-256	`f0471908e3801c30776a9e18129c4605c81d494137c27f2b8c5233c212b28e31`

See more details on using hashes here.

Provenance

The following attestation bundles were made for find_anything-1.3.0.tar.gz:

Publisher: deploy.yml on starswaterbrook/find-anything

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: find_anything-1.3.0.tar.gz
- Subject digest: 0e7e2d1be5cac75a0dd48802e676719916cc3eb717871be9b4f28a8d37d235af
- Sigstore transparency entry: 790989155
- Sigstore integration time: Jan 5, 2026
Source repository:
- Permalink: starswaterbrook/find-anything@f4f6930451de0bb1001f703d8a1d9131239fb9ba
- Branch / Tag: refs/tags/v1.3.0
- Owner: https://github.com/starswaterbrook
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: deploy.yml@f4f6930451de0bb1001f703d8a1d9131239fb9ba
- Trigger Event: push

File details

Details for the file find_anything-1.3.0-py3-none-any.whl.

File metadata

Download URL: find_anything-1.3.0-py3-none-any.whl
Upload date: Jan 5, 2026
Size: 14.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for find_anything-1.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7d2f089a2b0367b26f5eff77a5b7043e049c10b42f55e01ec4f31f2430acc2ad`
MD5	`db7d61823a70786d58e2fb6b148f2bdb`
BLAKE2b-256	`0edb827bec91e6b00e38fff98ca3cbc0c0f0f22e50cf93d46d3b8008ac6a9a57`

See more details on using hashes here.

Provenance

The following attestation bundles were made for find_anything-1.3.0-py3-none-any.whl:

Publisher: deploy.yml on starswaterbrook/find-anything

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: find_anything-1.3.0-py3-none-any.whl
- Subject digest: 7d2f089a2b0367b26f5eff77a5b7043e049c10b42f55e01ec4f31f2430acc2ad
- Sigstore transparency entry: 790989164
- Sigstore integration time: Jan 5, 2026
Source repository:
- Permalink: starswaterbrook/find-anything@f4f6930451de0bb1001f703d8a1d9131239fb9ba
- Branch / Tag: refs/tags/v1.3.0
- Owner: https://github.com/starswaterbrook
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: deploy.yml@f4f6930451de0bb1001f703d8a1d9131239fb9ba
- Trigger Event: push

find-anything 1.3.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Find Anything

Installation

Warning

Usage

Basic Example - DINOv2 + FastSAM

Architecture

Stage 1: Coarse Filtering (Pooled Dense Features)

Stage 2: Fine Matching (Cropped Patch Encoding)

Why Two Stages?

Components

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance