Skip to main content

Social Media NLP package for pytorch and pytorch_lightning with pre-built models

Project description

A Social Media Natural Language Processing package for PyTorch & PyTorch Lightning.


Key FeaturesAbout MeHow To UseExamples

PyPI - Python Version PyPI Status license


PyTorch Gleam

PyTorch Gleam builds upon PyTorch Lightning for the specific use-case of Natural Language Processing on Social Media, such as Twitter. PyTorch Gleam strives to make Social Media NLP research

  • Easy to understand
  • Easy to use
  • Easy to extend

About Me

My name is Maxwell Weinzierl, and I am a Natural Language processing researcher at the Human Technology Research Institute (HLTRI) at the University of Texas at Dallas. I am currently working on my PhD, which focuses on COVID-19 and HPV vaccine misinformation, trust, and more on Social Media platforms such as Twitter. I have built PyTorch Gleam to enable easy reproducibility for my published research, and for my own quick iterations on research ideas.

How To Use

Step 0: Install

Simple installation from PyPI

pip install pytorch-gleam

Step 1: Create Experiment

Create a configs folder with a YAML experiment file. Gleam utilizes PyTorch Lightning's CLI tools to configure experiments from YAML files, which enables researchers to clearly look back and identify both hyper-parameters and model code used in their experiments. This example is from COVID-19 vaccine misinformation stance identification:

pg_examples/covid-stance.yaml

seed_everything: 0
model:
  class_path: pytorch_gleam.modeling.models.MultiClassFrameLanguageModel
  init_args:
    learning_rate: 5e-4
    pre_model_name: digitalepidemiologylab/covid-twitter-bert-v2
    label_map:
      No Stance: 0
      Accept: 1
      Reject: 2
    threshold:
      class_path: pytorch_gleam.modeling.thresholds.MultiClassThresholdModule
    metric:
      class_path: pytorch_gleam.modeling.metrics.F1PRMultiClassMetric
      init_args:
        mode: macro
        num_classes: 3
trainer:
  max_epochs: 10
  accumulate_grad_batches: 4
  check_val_every_n_epoch: 1
  deterministic: true
  num_sanity_val_steps: 1
  checkpoint_callback: false
  callbacks:
    - class_path: pytorch_gleam.callbacks.FitCheckpointCallback
data:
  class_path: pytorch_gleam.data.datasets.MultiClassFrameDataModule
  init_args:
    batch_size: 8
    max_seq_len: 128
    label_name: misinfo
    label_map:
      No Stance: 0
      Accept: 1
      Reject: 2
    tokenizer_name: digitalepidemiologylab/covid-twitter-bert-v2
    num_workers: 8
    frame_path:
      - covid19/misinfo.json
    train_path:
      - covid19/stance-train.jsonl
    val_path:
      - covid19/stance-dev.jsonl
    test_path:
      - covid19/stance-test.jsonl

More details about how to set up YAML experiment files, please see PyTorch Lightning's documentation.

Annotations for this example are provided in the VaccineLies repository under covid19 as the CoVaxLies collection: CoVaxLies. You will need to download the tweet texts from the tweet ids from the Twitter API.

Step 3: Run Experiment

Create a models folder for your saved TensorBoard logs and model weights. Determine the GPU ID for the GPU you would like to utilize (multi-gpu supported) and provide the ID in a list, with a comma at the end if it is a single GPU ID. You can also just specify an integer, such as 1, and PyTorch Lightning will try to find a single free GPU automatically. Run the following command to start training:

gleam-train \
  --config configs/covid-stance.yaml \
  --trainer.gpus 1 \
  --trainer.default_root_dir models/covid-stance

Your model will train, with TensorBoard logging all metrics, and a checkpoint will be saved upon completion.

Step 4: Evaluate Experiment

You can easily evaluate your system on a test collection as follows:

gleam-test \
  --config configs/covid-stance.yaml \
  --trainer.gpus 1 \
  --trainer.default_root_dir models/covid-stance

Examples

These are a work-in-progress, as my original research code is a bit messy, but they will be updated soon!

COVID-19 Vaccine Misinformation Detection on Twitter
COVID-19 Vaccine Misinformation Stance Identification on Twitter
COVID-19 Misinformation Stance Identification on Twitter
Vaccine Misinformation Transfer Learning
Vaccine Hesitancy Profiling on Twitter
  • TODO

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytorch-gleam-0.5.3.tar.gz (7.3 MB view details)

Uploaded Source

File details

Details for the file pytorch-gleam-0.5.3.tar.gz.

File metadata

  • Download URL: pytorch-gleam-0.5.3.tar.gz
  • Upload date:
  • Size: 7.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.26.0 requests-toolbelt/0.9.1 urllib3/1.26.6 tqdm/4.62.2 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.6

File hashes

Hashes for pytorch-gleam-0.5.3.tar.gz
Algorithm Hash digest
SHA256 4c1b9c3300ac6e0cb64d6ebae41e97f877211b76efe067891cf9e58c024d29d2
MD5 bcc4b6690f575e1c1cfa5e90791c4b1f
BLAKE2b-256 4ae43827d719f8ccf4bd7ab9504bdf5367cac1ba1850a32ede986874eb69a614

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page