Skip to main content

Utilities for managing nlp models and for processing text-related data at Wellcome Data Labs

Project description

Build Status codecov

WellcomeML utils

This package contains common utility functions for usual tasks at Wellcome Data Labs. In particular:

modules description
io manipulating data, in and out S3, and processing
ml wrappers for processing texts, vectorisers and classifiers
spacy common utils for converting data form and to spacy/prodigy format
mis/viz any other utils, including Wellcome colour palletes

For more in depth information see the /examples folder and release notes.

1. Quickstart

Installing from PyPi

pip install wellcomeml

This will install the "vanilla" package. In order to install the deep-learning functionality (torch/transformers/spacy transformers):

pip install wellcomeml[deep-learning]

Installing from a release wheel

Download the wheel from aws and pip install it:

pip install wellcomeml-2020.1.0-py3-none-any.whl
pip install wellcomeml-2020.1.0-py3-none-any.whl[deep-learning]

2. Development

2.1 Build local virtualenv

make

2.2 Build the wheel (and upload to aws s3/pypi/github)

Create a github token with artifact write access and export it to the env variables:

export GITHUB_TOKEN=...

After making changes, in order to buil a new wheel, run:

make dist

2.3 (Optional) Installing from other locations

pip3 install <relative path to this folder>

2.4 Transformers

Some experimental features (currently wellcomeml.ml.SemanticEquivalenceClassifier) require a version of transformers that is not compatible with spacy-transformers. To develop those features:

export WELLCOMEML_ENV=development_transformers
pip install -r requirements_transformers.txt --upgrade

On OSX, ff you get a message complaining about the rust compiler, install and initialise it with:

brew install rustup
rustup-init

3. Example usage of some modules

Examples can be found in the subfolder examples.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wellcomeml-2020.5.1.tar.gz (28.0 kB view hashes)

Uploaded Source

Built Distribution

wellcomeml-2020.5.1-py3-none-any.whl (38.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page