A Simple Package to Train Bert-Like Model for Text Classification
Project description
The Augmented Social Scientist
This package allows to simply train BERT-like models for text classifications.
It comes with our article "The Augmented Social Scientist: Using Sequential Transfer Learning to Annotate Millions of Texts with Human-Level Accuracy" published on Sociological Methods & Research by Salomé Do, Étienne Ollion and Rubing Shen.
To install the package
- Use pip
pip install AugmentedSocialScientist
- Or from source
git clone https://github.com/rubingshen/AugmentedSocialScientist.git
pip install ./AugmentedSocialScientist
Import BERT model
from AugmentedSocialScientist import bert
The module bert
contains 3 main functions:
bert.encode()
to preprocess the data;bert.run_training()
to train, validate and save a model;bert.predict_with_model()
to make predictions with a saved model.
Tutorial
Check here for a Google Colab tutorial.
Other languages supported
The package also contains models for other languages:
camembert
for French;german_bert
for German;spanish_bert
for Spanish;xlmroberta
which is a multi-lingual model supporting 100 languages. To use them, simply import the corresponding model and replacebert
with the name of the imported model.
For example, to use the French language model camembert
:
- Import the model
camembert
:
from AugmentedSocialScientist import camembert
- Then use the functions
camembert.encode()
,camembert.run_training()
,camembert.predict_with_model()
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Close
Hashes for AugmentedSocialScientist-1.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4eced3a1a23ac20620cf2d7adbd6a7e935153c032dd13dfab950d9d09233287a |
|
MD5 | 32e91e42aaf24ead4ce2f6fd45249ab8 |
|
BLAKE2b-256 | 93a53c1bb73f2b171a8b2729b81cf17d87a22bed0ac305886f16ec7c67491beb |