Few-shot classifier for detecting eye imaging datasets
Project description
envision-classifier
SetFit few-shot classifier for identifying eye imaging datasets from scientific metadata.
Part of the EyeACT project by the FAIR Data Innovations Hub.
Installation
pip install envision-classifier
Python API
from envision_classifier import EyeImagingClassifier
# Downloads model from HuggingFace on first use
clf = EyeImagingClassifier()
# Classify a single record
result = clf.classify("Retinal OCT dataset for diabetic retinopathy")
print(result)
# {'label': 'EYE_IMAGING', 'confidence': 0.999, 'probabilities': {...}}
# Classify a batch
results = clf.classify_batch([
"Retinal fundus photography dataset for glaucoma screening",
"COVID-19 genome sequencing data",
{"title": "OCT images", "description": "Macular degeneration scans"},
])
# Use a local model instead of downloading
clf = EyeImagingClassifier(model_path="./my_model")
CLI
After installing, the envision-classifier command is available:
# Classify a text string
envision-classifier classify --text "Retinal OCT dataset for diabetic retinopathy"
# Classify from a JSON file
envision-classifier classify records.json
# Pipe JSON via stdin
echo '{"title": "Fundus images", "description": "DR screening"}' | envision-classifier classify
# Train a new model from built-in training data
envision-classifier train --output ./my_model
# Show model info and training data counts
envision-classifier info
Classification Labels
| Label | Description |
|---|---|
| EYE_IMAGING | Actual eye imaging datasets (fundus, OCT, OCTA, cornea) |
| EYE_SOFTWARE | Code, tools, models for eye imaging (no actual data) |
| OTHER_EYE_DATA | Eye research papers, reviews, non-imaging data |
| NEGATIVE | Not eye-related |
Model
- Base model:
sentence-transformers/all-mpnet-base-v2(768-dim) - Training data: 474 curated examples (77 EYE_IMAGING, 48 EYE_SOFTWARE, 79 OTHER_EYE_DATA, 270 NEGATIVE)
- Test accuracy: 0.937, macro F1: 0.902
- Spot-check: 29/33 (87.9%)
- Model weights: fairdataihub/envision-eye-imaging-classifier
Zenodo Classification Results
Applied to 515 Zenodo dataset records via envision-discovery:
| Class | Count |
|---|---|
| EYE_IMAGING | 120 |
| EYE_SOFTWARE | 66 |
| OTHER_EYE_DATA | 3 |
| NEGATIVE | 325 |
Classification is based on metadata only (titles, descriptions, keywords, and file types inspected inside archives via HTTP Range requests) — no dataset files are downloaded.
Related
- envision-discovery -- Full pipeline (scraping + classification + export)
- Model on HuggingFace
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file envision_classifier-0.1.2.tar.gz.
File metadata
- Download URL: envision_classifier-0.1.2.tar.gz
- Upload date:
- Size: 19.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.12.13 Linux/6.14.0-1017-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
113983d0bb9d5175f570797373b4ede1a71fac25c40cdbbbae14e171cb9a2f5c
|
|
| MD5 |
3d2474f52092fd3859940e392a503b1a
|
|
| BLAKE2b-256 |
fcb374e6a4cffc33ddaae2b0a653ff3a9deea1240edda41af218fd7b617ac4fe
|
File details
Details for the file envision_classifier-0.1.2-py3-none-any.whl.
File metadata
- Download URL: envision_classifier-0.1.2-py3-none-any.whl
- Upload date:
- Size: 20.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.12.13 Linux/6.14.0-1017-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
255c8ed47fd682160b20ae880c198053a5e12f845d40465a4fc794f19b5d6f99
|
|
| MD5 |
ea19ca746d0e2da99292ee1d53bd1417
|
|
| BLAKE2b-256 |
06278702e18e374ecd54ccc6e1d052e3350f3fa8b5e8c6f0edcff67f61caffc8
|