Skip to main content

A Simple Package to Train Bert-Like Model for Text Classification

Project description

The Augmented Social Scientist

This package allows to simply train BERT-like models for text classifications.

It comes with our article "The Augmented Social Scientist: Using Sequential Transfer Learning to Annotate Millions of Texts with Human-Level Accuracy" published on Sociological Methods & Research by Salomé Do, Étienne Ollion and Rubing Shen.

To install the package

  • Use pip
pip install AugmentedSocialScientist
  • Or from source
git clone https://github.com/rubingshen/AugmentedSocialScientist.git  
pip install ./AugmentedSocialScientist

Import BERT model

from AugmentedSocialScientist import bert

The module bert contains 3 main functions:

  • bert.encode() to preprocess the data;
  • bert.run_training() to train, validate and save a model;
  • bert.predict_with_model() to make predictions with a saved model.

Tutorial

Check here for a Google Colab tutorial.

Languages supported

BERT is a pre-trained language model for the English language. The package also contains models for other languages:

  • camembert for French;
  • arabic_bert for Arabic;
  • chinese_bert for Chinese;
  • german_bert for German;
  • hindi_bert for Hindi;
  • italian_bert for Italian;
  • portuguese_bert for Portuguese;
  • russian_bert for Russian;
  • spanish_bert for Spanish;
  • swedish_bert for Swedish;
  • xlmroberta which is a multi-lingual model supporting 100 languages.

To use them, simply import the corresponding model and replace bert with the name of the imported model.

For example, to use the French language model camembert:

  1. Import the model camembert:
from AugmentedSocialScientist import camembert
  1. Then use the functions camembert.encode(), camembert.run_training(), camembert.predict_with_model()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

AugmentedSocialScientist-1.1.0.tar.gz (10.7 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page