Skip to main content

Cast NLP data as multiangular DeepA2 datasets and integrate these in training pipeline

Project description

unit tests code quality Code Climate maintainability PyPI version

Deep Argument Analysis (deepa2)

This project provides deepa2, which

  • 🥚 takes NLP data (e.g. NLI, argument mining) as ingredients;
  • 🎂 bakes DeepA2 datatsets conforming to the Deep Argument Analysis Framework;
  • 🍰 serves DeepA2 data as text2text datasets suitable for training language models.

There's a public collection of 🎂 DeepA2 datatsets baked with deepa2 at the HF hub.

The Documentation describes usage options and gives background info on the Deep Argument Analysis Framework.

Quickstart

Integrating deepa2 into Your Training Pipeline

  1. Install deepa2 into your ML project's virtual environment, e.g.:
source my-projects-venv/bin/activate 
python --version  # should be ^3.7
python -m pip install deepa2
  1. Add deepa2 preprocessor to your training pipeline. Your training script may look like, for example:
#!/bin/bash

# configure and activate environment
...

# download deepa2 datasets and 
# prepare for text2text training
deepa2 serve \
    --path some-deepa2-dataset \    # <<< 🎂
    --export_format csv \
    --export_path t2t \             # >>> 🍰

# run default training script, 
# e.g., with 🤗 Transformers
python .../run_summarization.py \
    --train_file t2t/train.csv \    # <<< 🍰
    --text_column "text" \
    --summary_column "target" \
    --...

# clean-up
rm -r t2t
  1. That's it.

Create DeepA2 datasets with deepa2 from existing NLP data

Install poetry.

Clone the repository:

git clone https://github.com/debatelab/deepa2-datasets.git

Install this package from within the repo's root folder:

poetry install

Bake a DeepA2 dataset, e.g.:

poetry run deepa2 bake \\
  --name esnli \\                   # <<< 🥚
  --debug-size 100 \\
  --export-path ./data/processed    # >>> 🎂  

Contribute a DeepA2Builder for another Dataset

We welcome contributions to this repository, especially scripts that port existing datasets to the DeepA2 Framework. Within this repo, a code module that transforms data into the DeepA2 format contains

  1. a Builder class that describes how DeepA2 examples will be constructed and that implements the abstract builder.Builder interface (such as, e.g., builder.entailmentbank_builder.EnBankBuilder);
  2. a DataLoader which provides a method for loading the raw data as a 🤗 Dataset object (such as, for example, builder.entailmentbank_builder.EnBankLoader) -- you may use deepa2.DataLoader as is in case the data is available in a way compatible with 🤗 Dataset;
  3. dataclasses which describe the features of the raw data and the preprocessed data, and which extend the dummy classes deepa2.RawExample and deepa2.PreprocessedExample;
  4. a collection of unit tests that check the concrete Builder's methods (such as, e.g., tests/test_enbank.py);
  5. a documentation of the pipeline (as for example in docs/esnli.md).

Consider suggesting to collaboratively construct such a pipeline by opening a new issue.

Citation

This repository builds on and extends the DeepA2 Framework originally presented in:

@article{betz2021deepa2,
      title={DeepA2: A Modular Framework for Deep Argument Analysis with Pretrained Neural Text2Text Language Models}, 
      author={Gregor Betz and Kyle Richardson},
      year={2021},
      eprint={2110.01509},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepa2-0.1.16.tar.gz (43.8 kB view details)

Uploaded Source

Built Distribution

deepa2-0.1.16-py3-none-any.whl (51.1 kB view details)

Uploaded Python 3

File details

Details for the file deepa2-0.1.16.tar.gz.

File metadata

  • Download URL: deepa2-0.1.16.tar.gz
  • Upload date:
  • Size: 43.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.1 CPython/3.8.5 Darwin/19.6.0

File hashes

Hashes for deepa2-0.1.16.tar.gz
Algorithm Hash digest
SHA256 c4cd593d9a46aca53d54e1233111332c96c67be36a85e9fb8c4614eb72647f87
MD5 97662d0785037727d5fa8e6618ffcdd5
BLAKE2b-256 1361d6608c8ec0510c4450bbe8a9ccda99cb03d345b6ee89bf0032fb238e6ba9

See more details on using hashes here.

File details

Details for the file deepa2-0.1.16-py3-none-any.whl.

File metadata

  • Download URL: deepa2-0.1.16-py3-none-any.whl
  • Upload date:
  • Size: 51.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.1 CPython/3.8.5 Darwin/19.6.0

File hashes

Hashes for deepa2-0.1.16-py3-none-any.whl
Algorithm Hash digest
SHA256 9a6121686b7fa0a38b986d9cb7a8105efb539171cf99369b2bcf978818f477df
MD5 5ee5ecfa70931039dd43285181a11c18
BLAKE2b-256 9508c10c174abec933356015b0c72008bd2256c3aba5f9c53fa41149d0e4be0e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page