Skip to main content

Adapt Transformer-based language models to new text domains

Project description

Transformers Domain Adaptation

DocumentationColab Guide

PyPI - Python Version PyPI version Python package Documentation Status

This toolkit improves the performance of HuggingFace transformer models on downstream NLP tasks, by domain-adapting models to the target domain of said NLP tasks (e.g. BERT -> LawBERT).

The overall Domain Adaptation framework can be broken down into three phases:

  1. Data Selection

    Select a relevant subset of documents from the in-domain corpus that is likely to be beneficial for domain pre-training (see below)

  2. Vocabulary Augmentation

    Extending the vocabulary of the transformer model with domain specific-terminology

  3. Domain Pre-Training

    Continued pre-training of transformer model on the in-domain corpus to learn linguistic nuances of the target domain

After a model is domain-adapted, it can be fine-tuned on the downstream NLP task of choice, like any pre-trained transformer model.

Components

This toolkit provides two classes, DataSelector and VocabAugmentor, to simplify the Data Selection and Vocabulary Augmentation steps respectively.

Installation

This package was developed on Python 3.6+ and can be downloaded using pip:

pip install transformers-domain-adaptation

Features

  • Compatible with the HuggingFace ecosystem:
    • transformers 4.x
    • tokenizers
    • datasets

Usage

Please refer to our Colab guide!

Open In Colab

Results

TODO

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transformers-domain-adaptation-0.3.1.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

transformers_domain_adaptation-0.3.1-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file transformers-domain-adaptation-0.3.1.tar.gz.

File metadata

  • Download URL: transformers-domain-adaptation-0.3.1.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.9

File hashes

Hashes for transformers-domain-adaptation-0.3.1.tar.gz
Algorithm Hash digest
SHA256 10a1e32d6586d9204a1e768dfa2937a123260bfc56bf15f49dc56a4be548c01c
MD5 9c7edb6e445f9d51c6e9e08eb1271cdf
BLAKE2b-256 dcaf296e3fd7d68448de27996613819686d99a8d1bf7f396269b54065342b56b

See more details on using hashes here.

File details

Details for the file transformers_domain_adaptation-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: transformers_domain_adaptation-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 12.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.9

File hashes

Hashes for transformers_domain_adaptation-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 923d29a46cc4a94b5a6f6cbbfe593ee80b73de7cd782097858384863bca2daa1
MD5 353c9855c554c65f52475589cb4e1e55
BLAKE2b-256 42579aad30bea5bdd398861151d4b42f5287a517f0d175cdcfe236ab9743d496

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page