Utilities for managing nlp models and for processing text-related data at Wellcome Data Labs
Project description
WellcomeML utils
This package contains common utility functions for usual tasks at Wellcome Data Labs. In particular:
modules | description |
---|---|
io | manipulating data, in and out S3, and processing |
ml | wrappers for processing texts, vectorisers and classifiers |
spacy | common utils for converting data form and to spacy/prodigy format |
mis/viz | any other utils, including Wellcome colour palletes |
For more in depth information see the /examples
folder and release notes.
1. Quickstart
Installing from PyPi
pip install wellcomeml
This will install the "vanilla" package. In order to install the deep-learning functionality (torch/transformers/spacy transformers):
pip install wellcomeml[deep-learning]
Installing from a release wheel
Download the wheel from aws and pip install it:
pip install wellcomeml-2020.1.0-py3-none-any.whl
pip install wellcomeml-2020.1.0-py3-none-any.whl[deep-learning]
2. Development
2.1 Build local virtualenv
make
2.2 Build the wheel (and upload to aws s3/pypi/github)
Create a github token with artifact write access and export it to the env variables:
export GITHUB_TOKEN=...
After making changes, in order to buil a new wheel, run:
make dist
2.3 (Optional) Installing from other locations
pip3 install <relative path to this folder>
2.4 Transformers
Some experimental features (currently wellcomeml.ml.SemanticEquivalenceClassifier
) require a version of transformers
that is not compatible with spacy-transformers
. To develop those features:
export WELLCOMEML_ENV=development_transformers
pip install -r requirements_transformers.txt --upgrade
On OSX, ff you get a message complaining about the rust compiler, install and initialise it with:
brew install rustup
rustup-init
3. Example usage of some modules
Examples can be found in the subfolder examples
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for wellcomeml-2020.5.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 39197c58ea4e0792462d83c24d415d90637f03091a33c3a7ab502faeaa619ed9 |
|
MD5 | d4fa454fa46cf0a3a90eeddce4cf1819 |
|
BLAKE2b-256 | bb5fb280b303b377bb96efa097a1a572712ad7309265e6a1ea8986e69e352f62 |