Adapt Transformer-based language models to new text domains
Project description
This toolkit improves the performance of HuggingFace transformer models on downstream NLP tasks, by domain-adapting models to the target domain of said NLP tasks (e.g. BERT -> LawBERT).
The overall Domain Adaptation framework can be broken down into three phases:
- Data Selection
Select a relevant subset of documents from the in-domain corpus that is likely to be beneficial for domain pre-training (see below)
- Vocabulary Augmentation
Extending the vocabulary of the transformer model with domain specific-terminology
- Domain Pre-Training
Continued pre-training of transformer model on the in-domain corpus to learn linguistic nuances of the target domain
After a model is domain-adapted, it can be fine-tuned on the downstream NLP task of choice, like any pre-trained transformer model.
Components
This toolkit provides two classes, DataSelector and VocabAugmentor, to simplify the Data Selection and Vocabulary Augmentation steps respectively.
Installation
This package was developed on Python 3.6+ and can be downloaded using pip:
pip install transformers-domain-adaptation
Features
- Compatible with the HuggingFace ecosystem:
transformers 4.xtokenizersdatasets
Usage
Please refer to our Colab guide!
Results
TODO
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file transformers-domain-adaptation-0.3.1.tar.gz.
File metadata
- Download URL: transformers-domain-adaptation-0.3.1.tar.gz
- Upload date:
- Size: 14.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
10a1e32d6586d9204a1e768dfa2937a123260bfc56bf15f49dc56a4be548c01c
|
|
| MD5 |
9c7edb6e445f9d51c6e9e08eb1271cdf
|
|
| BLAKE2b-256 |
dcaf296e3fd7d68448de27996613819686d99a8d1bf7f396269b54065342b56b
|
File details
Details for the file transformers_domain_adaptation-0.3.1-py3-none-any.whl.
File metadata
- Download URL: transformers_domain_adaptation-0.3.1-py3-none-any.whl
- Upload date:
- Size: 12.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
923d29a46cc4a94b5a6f6cbbfe593ee80b73de7cd782097858384863bca2daa1
|
|
| MD5 |
353c9855c554c65f52475589cb4e1e55
|
|
| BLAKE2b-256 |
42579aad30bea5bdd398861151d4b42f5287a517f0d175cdcfe236ab9743d496
|