Paraphrase generation Toolbox and Benchmark
Project description
Catbird is an open source paraphrase generation toolkit based on PyTorch.
Quick Start
Requirements and Installation
The project is based on PyTorch 1.5+ and Python 3.6+.
Install Catbird
a. Clone the repository.
git clone https://github.com/AfonsoSalgadoSousa/catbird.git
b. Install dependencies. This project uses Poetry as its package manager. There should Make sure you have it installed. For more info check Poetry's official documentation. To install dependencies, simply run:
poetry install
Dataset Preparation
For now, we only work with the Quora Question Pairs dataset. It is recommended to download and extract the datasets somewhere outside the project directory and symlink the dataset root to $CATBIRD/data as below. If your folder structure is different, you may need to change the corresponding paths in config files.
catbird
├── catbird
├── tools
├── configs
├── data
│ ├── quora
│ │ ├── quora_duplicate_questions.tsv
We use the HuggingFace Datasets library to load the datasets.
Train
poetry run python tools/train.py ${CONFIG_FILE} [optional arguments]
Example:
- Train T5 on QQP.
$ python tools/train.py configs/t5_quora.yaml
Contributors
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file catbird-0.0.2.tar.gz.
File metadata
- Download URL: catbird-0.0.2.tar.gz
- Upload date:
- Size: 13.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.12 CPython/3.8.5 Linux/5.11.0-43-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2b505f6812a19653fa0f368e0477733b5f9d82d0e7b946d39498e9501b606d3
|
|
| MD5 |
c950c0b1c105f50a02a157cd11e420b5
|
|
| BLAKE2b-256 |
106426b8e0e01e90d4bec720f5d46d16f39695cf8b96c0f592acf84a637e1c52
|
File details
Details for the file catbird-0.0.2-py3-none-any.whl.
File metadata
- Download URL: catbird-0.0.2-py3-none-any.whl
- Upload date:
- Size: 17.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.12 CPython/3.8.5 Linux/5.11.0-43-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2779fa9cdcc843af096dcc006cdcb8dbf1c62ff56f2e40871a76fbc5f73c0069
|
|
| MD5 |
b7860fd8c49bd8884f6585afa259b0d6
|
|
| BLAKE2b-256 |
7d6246d08d17ace94f29d5a54fcbf9c81c97a87cb6a497bbfdd9357b869a91c0
|