Cast NLP data as multiangular DeepA2 datasets and integrate these in training pipeline
Project description
Deep Argument Analysis (deepa2
)
This project provides deepa2
, which
- 🥚 takes NLP data (e.g. NLI, argument mining) as ingredients;
- 🎂 bakes DeepA2 datatsets conforming to the Deep Argument Analysis Framework;
- 🍰 serves DeepA2 data as text2text datasets suitable for training language models.
There's a public collection of 🎂 DeepA2 datatsets baked with deepa2
at the HF hub.
The Documentation describes usage options and gives background info on the Deep Argument Analysis Framework.
Quickstart
Integrating deepa2
into Your Training Pipeline
- Install
deepa2
into your ML project's virtual environment, e.g.:
source my-projects-venv/bin/activate
python --version # should be ^3.7
python -m pip install deepa2
- Add
deepa2
preprocessor to your training pipeline. Your training script may look like, for example:
#!/bin/bash
# configure and activate environment
...
# download deepa2 datasets and
# prepare for text2text training
deepa2 serve \
--path some-deepa2-dataset \ # <<< 🎂
--export_format csv \
--export_path t2t \ # >>> 🍰
# run default training script,
# e.g., with 🤗 Transformers
python .../run_summarization.py \
--train_file t2t/train.csv \ # <<< 🍰
--text_column "text" \
--summary_column "target" \
--...
# clean-up
rm -r t2t
- That's it.
Create DeepA2 datasets with deepa2
from existing NLP data
Install poetry.
Clone the repository:
git clone https://github.com/debatelab/deepa2-datasets.git
Install this package from within the repo's root folder:
poetry install
Bake a DeepA2 dataset, e.g.:
poetry run deepa2 bake \\
--name esnli \\ # <<< 🥚
--debug-size 100 \\
--export-path ./data/processed # >>> 🎂
Contribute a DeepA2Builder for another Dataset
We welcome contributions to this repository, especially scripts that port existing datasets to the DeepA2 Framework. Within this repo, a code module that transforms data into the DeepA2 format contains
- a Builder class that describes how DeepA2 examples will be constructed and that implements the abstract
builder.Builder
interface (such as, e.g.,builder.entailmentbank_builder.EnBankBuilder
); - a DataLoader which provides a method for loading the raw data as a 🤗 Dataset object (such as, for example,
builder.entailmentbank_builder.EnBankLoader
) -- you may usedeepa2.DataLoader
as is in case the data is available in a way compatible with 🤗 Dataset; - dataclasses which describe the features of the raw data and the preprocessed data, and which extend the dummy classes
deepa2.RawExample
anddeepa2.PreprocessedExample
; - a collection of unit tests that check the concrete Builder's methods (such as, e.g.,
tests/test_enbank.py
); - a documentation of the pipeline (as for example in
docs/esnli.md
).
Consider suggesting to collaboratively construct such a pipeline by opening a new issue.
Citation
This repository builds on and extends the DeepA2 Framework originally presented in:
@article{betz2021deepa2,
title={DeepA2: A Modular Framework for Deep Argument Analysis with Pretrained Neural Text2Text Language Models},
author={Gregor Betz and Kyle Richardson},
year={2021},
eprint={2110.01509},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file deepa2-0.1.16.tar.gz
.
File metadata
- Download URL: deepa2-0.1.16.tar.gz
- Upload date:
- Size: 43.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.1 CPython/3.8.5 Darwin/19.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c4cd593d9a46aca53d54e1233111332c96c67be36a85e9fb8c4614eb72647f87 |
|
MD5 | 97662d0785037727d5fa8e6618ffcdd5 |
|
BLAKE2b-256 | 1361d6608c8ec0510c4450bbe8a9ccda99cb03d345b6ee89bf0032fb238e6ba9 |
File details
Details for the file deepa2-0.1.16-py3-none-any.whl
.
File metadata
- Download URL: deepa2-0.1.16-py3-none-any.whl
- Upload date:
- Size: 51.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.1 CPython/3.8.5 Darwin/19.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a6121686b7fa0a38b986d9cb7a8105efb539171cf99369b2bcf978818f477df |
|
MD5 | 5ee5ecfa70931039dd43285181a11c18 |
|
BLAKE2b-256 | 9508c10c174abec933356015b0c72008bd2256c3aba5f9c53fa41149d0e4be0e |