Skip to main content

Adapt Transformer-based language models to new text domains

Project description

Transformers Domain Adaptation

DocumentationColab Guide

PyPI - Python Version PyPI version Python package Documentation Status

This toolkit improves the performance of HuggingFace transformer models on downstream NLP tasks, by domain-adapting models to the target domain of said NLP tasks (e.g. BERT -> LawBERT).

The overall Domain Adaptation framework can be broken down into three phases:

  1. Data Selection

    Select a relevant subset of documents from the in-domain corpus that is likely to be beneficial for domain pre-training (see below)

  2. Vocabulary Augmentation

    Extending the vocabulary of the transformer model with domain specific-terminology

  3. Domain Pre-Training

    Continued pre-training of transformer model on the in-domain corpus to learn linguistic nuances of the target domain

After a model is domain-adapted, it can be fine-tuned on the downstream NLP task of choice, like any pre-trained transformer model.

Components

This toolkit provides two classes, DataSelector and VocabAugmentor, to simplify the Data Selection and Vocabulary Augmentation steps respectively.

Installation

This package was developed on Python 3.6+ and can be downloaded using pip:

pip install transformers-domain-adaptation

Features

  • Compatible with the HuggingFace ecosystem:
    • transformers 4.x
    • tokenizers
    • datasets

Usage

Please refer to our Colab guide!

Open In Colab

Results

TODO

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transformers-domain-adaptation-0.3.1.tar.gz (14.6 kB view hashes)

Uploaded Source

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page