A system for quickly generating training data with multi-task weak supervision
Project description
Snorkel MeTaL (previously known as MuTS)
v0.1.5
Getting Started
- Quickly set up your environment
- Try out the tutorials
- View the developer guide
Motivation
This project builds on Snorkel in an attempt to understand how massively multi-task supervision and learning changes the way people program. Multitask learning (MTL) is an established technique that effectively pools samples by sharing representations across related tasks, leading to better performance with less training data (for a great primer of recent advances, see this survey). However, most existing multi-task systems rely on two or three fixed, hand-labeled training sets. Instead, weak supervision opens the floodgates, allowing users to add arbitrarily many weakly-supervised tasks. We call this setting massively multitask learning, and envision models with tens or hundreds of tasks with supervision of widely varying quality. Our goal with the Snorkel MeTaL project is to understand this new regime, and the programming model it entails.
More concretely, Snorkel MeTaL is a framework for using multi-task weak supervision (MTS), provided by users in the form of labeling functions applied over unlabeled data, to train multi-task models. Snorkel MeTaL can use the output of labeling functions developed and executed in Snorkel, or take in arbitrary label matrices representing weak supervision from multiple sources of unknown quality, and then use this to train auto-compiled MTL networks.
Snorkel MeTaL uses a new matrix approximation approach to learn the accuracies of diverse sources with unknown accuracies, arbitrary dependency structures, and structured multi-task outputs. This makes it significantly more scalable than our previous approaches.
References
- Best Reference: Training Complex Models with Multi-Task Weak Supervision [Technical Report]
- Snorkel MeTaL: Weak Supervision for Multi-Task Learning [SIGMOD DEEM 2018]
- Snorkel: Rapid Training Data Creation with Weak Supervision [VLDB 2018]
- Data Programming: Creating Large Training Sets, Quickly [NIPS 2016]
Sample Usage
This sample is for a single-task problem. For a multi-task example, see tutorials/Multitask.ipynb.
"""
n = # data points
m = # labeling functions
k = cardinality of the classification task
Load for each split:
L: an [n,m] scipy.sparse label matrix of noisy labels
Y: an n-dim numpy.ndarray of target labels
X: an n-dim iterable (e.g., a list) of end model inputs
"""
from metal.label_model import LabelModel, EndModel
# Train a label model and generate training labels
label_model = LabelModel(k)
label_model.train(L_train)
Y_train_pred = label_model.predict(L_train)
# Train a discriminative end model with the generated labels
end_model = EndModel([1000,10,2])
end_model.train(X_train, Y_train_pred, X_dev, Y_dev)
# Evaluate performance
score = end_model.score(X_test, Y_test)
Setup
[1] Install anaconda:
Instructions here: https://www.anaconda.com/download/
[2] Clone the repository:
git clone https://github.com/HazyResearch/metal.git
cd metal
[3] Create virtual environment:
conda env create -f environment.yml
source activate metal
[4] Run unit tests:
nosetests
If the tests run successfully, you should see 50+ dots followed by "OK".
Check out the tutorials to get familiar with the Snorkel MeTaL codebase!
Or, to use Snorkel Metal in another project, install it with pip (conda coming soon):
pip install snorkel-metal
Developer Guidelines
First, read the Snorkel MeTaL Commandments (a design doc).
If you are interested in contributing to Snorkel MeTaL (and we welcome whole-heartedly contributions via pull requests!), follow the setup guidelines above, then run the following additional command:
make dev
This will install a few additional tools that help to ensure that any commits or pull requests you submit conform with our established standards. We use the following packages:
After running make dev
to install the necessary tools, you can run make check
to see if any changes you've made violate the repo standards and make fix
to fix any related to isort/black. Fixes for flake8 violations will need to be made manually.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for snorkel_metal-0.1.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bbb3f40bea821fa15c918889f30792cbec1880ca3b482fe6c500b95496a6e1f7 |
|
MD5 | 6ff60fa30560c67506a29a778063d64d |
|
BLAKE2b-256 | 20d86a4b2a454affa459bcf67096088ee18a3b640a6719e2495a5b05fcbbdfe3 |