Skip to main content

FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction

Project description

FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction

Introduction

FAMIE is a comprehensive and efficient active learning (AL) toolkit for multilingual information extraction (IE). FAMIE is designed to address a fundamental problem in existing AL frameworks where annotators need to wait for a long time between annotation batches due to the time-consuming nature of model training and data selection at each AL iteration. With a novel proxy AL mechanism and the integration of our SOTA multilingual toolkit Trankit, it takes FAMIE only a few hours to provide users with a labeled dataset and a ready-to-use model for different IE tasks over 100 languages.

If you use FAMIE in your research or products, please cite our following paper:

@misc{vannguyen2022famie,
      title={FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction}, 
      author={Nguyen, Minh Van and Ngo, Nghia Trung and Min, Bonan and Nguyen, Thien Huu},
      year={2022},
      eprint={2202.08316},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

FAMIE's technical paper: https://arxiv.org/pdf/2202.08316.pdf

FAMIE's documentation page: https://famie.readthedocs.io

FAMIE's demo website: http://nlp.uoregon.edu:9000/

Installation

FAMIE can be easily installed via one of the following methods:

Using pip

pip install famie

The command would install FAMIE and all dependent packages automatically.

From source

git clone https://github.com/nlp-uoregon/famie.git
cd famie
pip install -e .

This would first clone our github repo and install FAMIE.

Usage

FAMIE currently supports Named Entity Recognition and Event Detection for over 100 languages. Using FAMIE includes three following steps:

  • Start an annotation session.
  • Annotate data for a target task.
  • Access the labeled data and a ready-to-use model returned by FAMIE.

Starting an annotation session

Running on local machines

To start an annotation session, please use the following command:

famie start

This will run a server on users' local machines (no data or models will leave users' local machines), users can access FAMIE's web interface via the URL: http://127.0.0.1:9000/

Running on remote servers

To use FAMIE on a remote server for a local machine, users can run famie on the corresponding server and ssh-forward famie's port (9000) from their local machine:

# On remote
famie start
# On local
ssh -NL <local-port>:localhost:<famie-port> <remote-username>@<remote-address>
# Open localhost:9000 on local to access FAMIE's web interface.

Running on Google Colab

To use FAMIE on google colab, use colab-ssh to create an ssh connect to the Colab VM. Then follow the same remote-local process above to run FAMIE on the colab notebook through your local machine.

# On Colab Notebook
## Install colab_ssh and run ngrok to get ssh address and port
!pip install colab_ssh --upgrade
from colab_ssh import launch_ssh
launch_ssh('YOUR_NGROK_AUTH_TOKEN', 'SOME_PASSWORD')  # return a ssh-address and ssh-port
## Run FAMIE
famie start --port <famie-port>

# On local
ssh -NL <local-port>:localhost:<famie-port> root@<ssh-address> -p <ssh-port>
# Open localhost:<local-port> on local to access FAMIE's web interface.

As FAMIE is an AL framework, it provides different data selection algorithms that recommend users the most beneficial examples to label at each annotation iteration. This is done via passing an optional argument --selection [mnlp|badge|bertkm|random].

Annotating data

After initiating a new project and uploading an unlabeled dataset file with an entity types file (in text format), annotators can start the annotation process. Given one annotation batch in an iteration, annotators label one sentence at a time, annotating the word spans for each label by first choosing the label and then highlighting the appropriate spans.

After finishing each iteration, FAMIE then allows users to download the trained models and annotated data by clicking on the buttons DOWNLOAD LABELED DATA and DOWNLOAD TRAINED MODEL.

Accessing labeled data and trained models

FAMIE also provides a simple and intuitive code interface for interacting with the resulting labeled dataset and trained main models after the AL processes.

import famie

# access a project via its name
p = famie.get_project('named-entity-recognition') 

# access the project's labeled data
data = p.get_labeled_data() # a Python dictionary

# export the project's labeled data to a file
p.export_labeled_data('data.json')

# export the project's trained model to a file
p.export_trained_model('model.ckpt')

# access the project's trained model
model = p.get_trained_model()

# access a trained model from file
model = famie.load_model_from_file('model.ckpt')

# use the trained model to make predictions
model.predict('Oregon is a beautiful state!')
# ['B-Location', 'O', 'O', 'O', 'O']

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

famie-0.2.1.tar.gz (422.1 kB view details)

Uploaded Source

Built Distribution

famie-0.2.1-py3-none-any.whl (441.2 kB view details)

Uploaded Python 3

File details

Details for the file famie-0.2.1.tar.gz.

File metadata

  • Download URL: famie-0.2.1.tar.gz
  • Upload date:
  • Size: 422.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for famie-0.2.1.tar.gz
Algorithm Hash digest
SHA256 a452ea9d8772133fca781dc3aa24413ddeea6a002fc167b6e3354ecd35f4e886
MD5 d3675b522f529919b640ffb14f62e93e
BLAKE2b-256 bbd8fdf5b3f3276a08370ced388ad20ec2973d017793e2100e0201c1de595cfe

See more details on using hashes here.

File details

Details for the file famie-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: famie-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 441.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for famie-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 32e9a9eccfed7ea438456b5d661e39efde4d3aed1bdc1045a157d17e2bcade07
MD5 482b4e870e72f7a5d7ec1c4dee3d9510
BLAKE2b-256 abf5ad11a035769d911be3968b8f68768cbfa28f9e2263d5f5028ee08631e1ea

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page