Easily benchmark Machine Learning models on selected tasks and datasets
sotabencheval is a framework-agnostic library that contains a collection of deep learning benchmarks you can use to benchmark your models. It can be used in conjunction with the sotabench service to record results for models, so the community can compare model performance on different tasks, as well as a continuous integration style service for your repository to benchmark your models on each commit.
- ADE20K (Semantic Segmentation)
- COCO (Object Detection)
- ImageNet (Image Classification)
- SQuAD (Question Answering)
- WikiText-103 (Language Modelling)
- WMT (Machine Translation)
PRs welcome for further benchmarks!
Requires Python 3.6+.
pip install sotabench-eval
Get Benching! 🏋️
Integration is lightweight. For example, if you are evaluating an ImageNet model, you initialize an Evaluator object and (optionally) link to any linked paper:
from sotabencheval.image_classification import ImageNetEvaluator evaluator = ImageNetEvaluator( model_name='FixResNeXt-101 32x48d', paper_arxiv_id='1906.06423')
Then for each batch of predictions your model makes on ImageNet, pass a dictionary of keys as image IDs and values as a
np.ndarrays of logits to the
The evaluation logic just needs to be written in a
sotabench.py file and sotabench will run it on each commit and record the results:
All contributions welcome!
Release history Release notifications
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size sotabencheval-0.0.36-py3-none-any.whl (48.7 kB)||File type Wheel||Python version py3||Upload date||Hashes View hashes|
|Filename, size sotabencheval-0.0.36.tar.gz (30.6 kB)||File type Source||Python version None||Upload date||Hashes View hashes|
Hashes for sotabencheval-0.0.36-py3-none-any.whl