Utilities for managing nlp models and for processing text-related data at Wellcome Data Labs
Project description
WellcomeML utils
This package contains common utility functions for usual tasks at Wellcome Data Labs, in particular functionalities for processing, embedding and classifying text data. This includes
- An intuitive sklearn-like API wrapping text vectorizers, such as Doc2vec, Bert, Scibert
- Common API for off-the-shelf classifiers to allow quick iteration (e.g. Frequency Vectorizer, Bert, Scibert, basic CNN, BiLSTM)
- Utils to download and convert academic text datasets for benchmark
For more information read the official
1. Quickstart
Installing from PyPi
pip install wellcomeml
This will install the "vanilla" package. In order to install the deep-learning functionality (torch/transformers/spacy transformers):
pip install wellcomeml[deep-learning]
Installing from a release wheel
Download the wheel from aws and pip install it:
pip install wellcomeml-2020.1.0-py3-none-any.whl
pip install wellcomeml-2020.1.0-py3-none-any.whl[deep-learning]
1.1 Installing wellcomeml[deep-learning] on windows
Torch has a different installation for windows so it will not get automatically installed with wellcomeml[deeplearning]. It needs to be installed first (this is for machines with no CUDA parallel computing platform for those that do look here https://pytorch.org/ for correct installation):
pip install torch==1.5.1+cpu torchvision==0.6.1+cpu -f https://download.pytorch.org/whl/torch_stable.html
Then install wellcomeml[deep-learning]:
pip install wellcomeml[deep-learning]
2. Development
2.1 Build local virtualenv
make
2.2 Contributing to the docs
Make changes to the .rst files in /docs
(please do not change the ones starting by wellcomeml as those are generated automatically)
Navigate to the root repository and run
make update-docs
Verify that _build/html/index.html
has generated correctly and submit a PR.
2.3 Build the wheel (and upload to aws s3/pypi/github)
Create a github token with artifact write access and export it to the env variables:
export GITHUB_TOKEN=...
After making changes, in order to build a new release, run:
make dist
2.4 (Optional) Installing from other locations
pip3 install <relative path to this folder>
2.5 Transformers
Some experimental features (currently wellcomeml.ml.SemanticEquivalenceClassifier
) require a version of transformers
that is not compatible with spacy-transformers
. To develop those features:
export WELLCOMEML_ENV=development_transformers
pip install -r requirements_transformers.txt --upgrade
On OSX, if you get a message complaining about the rust compiler, install and initialise it with:
brew install rustup
rustup-init
3. Example usage of some modules
Examples can be found in the subfolder examples
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for wellcomeml-2020.7.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2669651e70976707c6b793decc37241b4852231eafb51fa2bee3864076e7fe4e |
|
MD5 | 905f9bbb0c929f0dca88b6d09e1f595e |
|
BLAKE2b-256 | 9a4fd8cd734a1b029dbc8e9524ad7773f3a6043f526ab98ab8ee5b5a99919a91 |