Skip to main content

SPEAR is a library for data programming with semi-supervision that provides facility to programmatically label and build training data

Project description

Lines of code visitors PyPI docs license website



Semi-Supervised Data Programming for Data Efficient Machine Learning

SPEAR is a library for data programming with semi-supervision. The package implements several recent data programming approaches including facility to programmatically label and build training data.

Pipeline

  • Design Labeling functions(LFs)
  • generate pickle file containing labels by passing raw data to LFs
  • Use one of the Label Aggregators(LA) to get final labels



SPEAR provides functionality such as

  • development of LFs/rules/heuristics for quick labeling
  • compare against several data programming approaches
  • compare against semi-supervised data programming approaches
  • use subset selection to make best use of the annotation efforts
  • facility to store and save data in pickle file

Labelling Functions (LFs)

  • discrete LFs - Users can define LFs that return discrete labels
  • continuous LFs - return continuous scores/confidence to the labels assigned

Approaches Implemented

You can read this paper to know about below approaches

  • Only-L
  • Learning to Reweight
  • Posterior Regularization
  • Imply Loss
  • CAGE
  • Joint Learning

Data folder for SMS & TREC can be found here. This folder needs to be placed in the same directory as notebooks folder is in, to run the notebooks or examples.

Direct download of the zip file can be done via wget using gdown library .

pip install gdown
gdown 1CJZ73nNa7Ho0BOSDgGx9CRvXoepVSpet

Installation

  • Install Submodlib library pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ submodlib

Method 1

To install latest version of SPEAR package using PyPI:

pip install decile-spear

Method 2

SPEAR requires Python 3.6 or later. First install submodlib. Then install SPEAR:

git clone https://github.com/decile-team/spear.git
cd spear
pip install -r requirements/requirements.txt

Citation

@inproceedings{abhishek-etal-2022-spear,
    title = "{SPEAR} : Semi-supervised Data Programming in Python",
    author = "Abhishek, Guttu  and
      Ingole, Harshad  and
      Laturia, Parth  and
      Dorna, Vineeth  and
      Maheshwari, Ayush  and
      Ramakrishnan, Ganesh  and
      Iyer, Rishabh",
    booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, UAE",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.emnlp-demos.12",
    pages = "121--127",
}

Quick Links

Acknowledgment

SPEAR takes inspiration, builds upon, and uses pieces of code from several open source codebases. These include Snorkel, Snuba & Imply Loss. Also, SPEAR uses SUBMODLIB for subset selection, which is provided by DECILE too.

Team

SPEAR is created and maintained by Ayush, Abhishek, Vineeth, Harshad, Parth, Pankaj, Rishabh Iyer, and Ganesh Ramakrishnan. We look forward to have SPEAR more community driven. Please use it and contribute to it for your research, and feel free to use it for your commercial projects. We will add the major contributors here.

Publications

[1] Abhishek et al. SPEAR : Semi-supervised Data Programming in Python, Demonstration Paper.

[2] Maheshwari et al. Learning to Robustly Aggregate Labeling Functions for Semi-supervised Data Programming, In Findings of ACL (Long Paper) 2022.

[3] Maheshwari, Ayush, et al. Data Programming using Semi-Supervision and Subset Selection, In Findings of ACL (Long Paper) 2021.

[4] Chatterjee, Oishik, Ganesh Ramakrishnan, and Sunita Sarawagi. Data Programming using Continuous and Quality-Guided Labeling Functions, In AAAI 2020.

[5] Sahay, Atul, et al. Rule augmented unsupervised constituency parsing, In Findings of ACL (Short Paper) 2021.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

decile-spear-1.0.8.tar.gz (60.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

decile_spear-1.0.8-py3-none-any.whl (74.7 kB view details)

Uploaded Python 3

File details

Details for the file decile-spear-1.0.8.tar.gz.

File metadata

  • Download URL: decile-spear-1.0.8.tar.gz
  • Upload date:
  • Size: 60.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for decile-spear-1.0.8.tar.gz
Algorithm Hash digest
SHA256 1c5b933a065a0eee2ad9ac442ea1c7567fddcc3edc8a6b4f760d07446088dee8
MD5 a4163757d7c26bb81428caaaee6d7447
BLAKE2b-256 3d4e2afb6dc74d6c3543ca02c33e50fb9c80baaa7fb68751590aff067151d5cf

See more details on using hashes here.

File details

Details for the file decile_spear-1.0.8-py3-none-any.whl.

File metadata

  • Download URL: decile_spear-1.0.8-py3-none-any.whl
  • Upload date:
  • Size: 74.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for decile_spear-1.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 95dd473ee6770f9e7c611dd817aeaafe75b3d32c32d7cfcbb125a93f7a5fabd5
MD5 7c99d486fd4c31aad480c1e8d2d9b62d
BLAKE2b-256 00f08e1f78a3f22794dfddf7e5e07dedc257aafc366692bbd3062bba5986e105

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page