Skip to main content

General purpose multi-task classification library

Project description

Tonks

Note 6/12/20: Our team previously had a tradition of naming projects with terms or characters from the Harry Potter series, but we are disappointed by J.K. Rowling’s persistent transphobic comments. In response, we will be renaming this repository, and are working to develop an inclusive solution that minimizes disruption to our users.

Tonks is a general purpose deep learning library developed by the ShopRunner Data Science team to train multi-task image, text, or ensemble (image + text) models.

What differentiates our library is that you can train a multi-task model with different datasets for each of your tasks. For example, you could train one model to label dress length for dresses and pants length for pants.

See the docs for more details.

To quickly get started, check out one of our tutorials in the notebooks folder. In particular, the synthetic_data tutorial provides a very quick example of how the code works.

Structure

  • notebooks
    • fashion_data: a set of notebooks demonstrating training Tonks models on an open source fashion dataset consisting of images and text descriptions
    • synthetic_data: a set of notebooks demonstrating training Tonks models on a set of generated color swatches. This is meant to be an easy fast demo of the library's capabilities that can be run on CPU's.
  • tonks
    • ensemble: code for ensemble models of text and vision models
    • text: code for text models with a BERT architecture
    • vision: code for vision models with ResNet50 architectures

Installation

pip install tonks

You may get an error from the tokenizer package if you do not have a Rust compiler installed; see https://github.com/huggingface/transformers/issues/2831#issuecomment-592724471.

Notes

Currently, this library supports ResNet50 and BERT models.

In some of our documentation the terms pretrained and vanilla appear. pretrained is our shorthand for Tonks models that have been trained at least once already so their weights have been tuned for a specific use case. vanilla is our shorthand for base weights coming from transformers or PyTorch for the out-of-the-box BERT and ResNet50 models.

For our examples using text models, we use the transformers repository managed by huggingface. The most recent version is called transformers. The huggingface repo is the appropriate place to check on BERT documentation and procedures.

Development

Want to add to or fix issues in Tonks? We welcome outside input and have tried to make it easier to test. You can run everything inside a docker container with the following:

# to build the container
# NOTE: this may take a while
nvidia-docker build -t tonks .
# nvidia-docker run : basic startup with nvidia docker to access gpu
# --rm : deletes container when closed
# -p : exposes ports (ex: for jupyter notebook to work)
# bash : opens bash in the container once it starts
# "pip install jupyter && bash" : install requirements-dev and bash
nvidia-docker run \
    -it \
    --rm \
    -v "${PWD}:/tonks" \
    -p 8888:8888 \
    -p 8000:8000 \
    tonks /bin/bash -c "pip install jupyter && bash"
# run jupyter notebook
jupyter notebook --ip 0.0.0.0 --no-browser --allow-root --NotebookApp.token='' --NotebookApp.password=''

License

Copyright (c) 2020, ShopRunner

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tonks-1.0.0.tar.gz (18.2 kB view details)

Uploaded Source

Built Distribution

tonks-1.0.0-py3-none-any.whl (24.2 kB view details)

Uploaded Python 3

File details

Details for the file tonks-1.0.0.tar.gz.

File metadata

  • Download URL: tonks-1.0.0.tar.gz
  • Upload date:
  • Size: 18.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.6

File hashes

Hashes for tonks-1.0.0.tar.gz
Algorithm Hash digest
SHA256 8ab96099e44ccc30cb1377171ebf1a6c09df2e7a03388e975561b581b1bcf5ea
MD5 9053f3ff386569233e30b55888e2cdf0
BLAKE2b-256 934ed35b48c373a1f7d329b899f6f169a80a9952074d3bd288c7750adbf93dff

See more details on using hashes here.

File details

Details for the file tonks-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: tonks-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 24.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.6

File hashes

Hashes for tonks-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d4182bd648c1fe81ce2ecd1daf90764cd723e57ca73932615f2a020ae8b676f3
MD5 bc60106105c5d35fe02a5649a4131434
BLAKE2b-256 dbba0d245484c48ddf9ecb493d1b5cfb8fd205eb017ee43e12b75b52d0454bf3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page