Skip to main content

Few-shot classifier for detecting eye imaging datasets

Project description

envision-classifier

SetFit few-shot classifier for identifying eye imaging datasets from scientific metadata.

Part of the EyeACT project by the FAIR Data Innovations Hub.

Installation

pip install envision-classifier

Python API

from envision_classifier import EyeImagingClassifier

# Downloads model from HuggingFace on first use
clf = EyeImagingClassifier()

# Classify a single record
result = clf.classify("Retinal OCT dataset for diabetic retinopathy")
print(result)
# {'label': 'EYE_IMAGING', 'confidence': 0.98,
#  'probabilities': {'EYE_IMAGING': 0.98, 'NEGATIVE': 0.02}}

# Classify a batch
results = clf.classify_batch([
    "Retinal fundus photography dataset for glaucoma screening",
    "COVID-19 genome sequencing data",
    {"title": "OCT images", "description": "Macular degeneration scans"},
])

# Use a local model instead of downloading
clf = EyeImagingClassifier(model_path="./my_model")

CLI

After installing, the envision-classifier command is available:

# Classify a text string
envision-classifier classify --text "Retinal OCT dataset for diabetic retinopathy"

# Classify from a JSON file
envision-classifier classify records.json

# Pipe JSON via stdin
echo '{"title": "Fundus images", "description": "DR screening"}' | envision-classifier classify

# Train a new model from built-in training data
envision-classifier train --output ./my_model

# Show model info and training data counts
envision-classifier info

Classification Labels

Label Description
EYE_IMAGING Actual eye imaging datasets (fundus, OCT, OCTA, cornea)
NEGATIVE Everything else (software, non-imaging eye data, unrelated domains)

Model

  • Base model: sentence-transformers/all-mpnet-base-v2 (768-dim)
  • Training data: 891 curated examples (262 EYE_IMAGING, 629 NEGATIVE) from Zenodo, Figshare, Dryad, Kaggle, and NEI
  • Test accuracy: 0.961, EYE_IMAGING F1: 0.936
  • Spot-check: 30/33 (90.9%)
  • Model weights: fairdataihub/envision-eye-imaging-classifier

Multi-Repository Results

Applied across 6 repositories via envision-discovery:

Source EYE_IMAGING NEGATIVE Total
Zenodo 60 455 515
DataCite 752 1,084 1,836
Figshare 1,049 951 2,000
Kaggle 248 484 732
Dryad 32 57 89
NEI 686 976 1,662

Classification is based on metadata only (titles, descriptions, keywords, and file types inspected inside archives via HTTP Range requests) -- no dataset files are downloaded.

Related

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

envision_classifier-0.3.2.tar.gz (139.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

envision_classifier-0.3.2-py3-none-any.whl (139.7 kB view details)

Uploaded Python 3

File details

Details for the file envision_classifier-0.3.2.tar.gz.

File metadata

  • Download URL: envision_classifier-0.3.2.tar.gz
  • Upload date:
  • Size: 139.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.3 CPython/3.12.13 Linux/6.17.0-1008-azure

File hashes

Hashes for envision_classifier-0.3.2.tar.gz
Algorithm Hash digest
SHA256 2b14db1e26e2ec93fe778b32910738ae96e8ccd1877d217da15ce039dea82830
MD5 568f240bd2e1bb428f2a67d8f081c735
BLAKE2b-256 52d2ae89723106e850474afd43beafa69c9af8fdb664eb7173a8ad0ff2529886

See more details on using hashes here.

File details

Details for the file envision_classifier-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: envision_classifier-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 139.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.3 CPython/3.12.13 Linux/6.17.0-1008-azure

File hashes

Hashes for envision_classifier-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 325f7b20de980950fe855c5212a5a637c7d50f3945af06c78ef7f3d63db5a30e
MD5 f6a237624781d979ac5c6df638164f34
BLAKE2b-256 04394b6f818a2e902b006761c7c794bda7cbbf991c47a2a2ec56c05c70b1197c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page