Sentence-level Multilingual Augmentation
Project description
SMAUG: Sentence-level Multilingual AUGmentation
smaug
is a package for multilingual data augmentation. It offers transformations focused on changing specific aspects of sentences, such as Named Entities, Numbers, etc.
Usage
The smaug
package can be used as a command line interface (CLI) or by directly importing and calling the package Python API. To use smaug
, first install it by following these instructions.
Command Line Interface
The CLI offers a way to read, transform, validate and write perturbed sentences to files. For more information, see the full details.
Single transform
To apply a single transform to a set of sentences, execute the following command:
$ augment io-read-lines -p <input_file> -l <input_lang_code> <transf_name> io-write-json -p <output_file>
<transf_name>
is the name of the transform to apply (see this section for a list of available transforms).
<input_file>
is a text file with one sentence per line.
<input_lang_code>
is a two character language code for the input sentences.
<output_file>
is a json file to be created with the transformed sentences.
Multiple Transforms
To apply multiple transforms, just specify them in arbitrary order between the read and write operations:
$ augment io-read-lines -p <input_file> -l <input_lang_code> <transf_name_1> <transf_name_2> ... io-write-json -p <output_file>
Multiple Input Files
To read from multiple input files, also specify them in arbitrary order:
$ augment io-read-lines -p <input_file_1> -l <input_lang_code_1> read-lines -p <input_file_2> -l <input_lang_code_2> ... <transf_name_1> <transf_name_2> ... io-write-json -p <output_file>
Configuration File
To facilitate the previous operations, it is possible to specify the entire pipeline from a configuration file:
$ augment --cfg <path_to_config_file>
TODO
Install
To install this package, execute the following steps:
-
Install the poetry tool for dependency management.
-
Clone this git repository and install the project.
$ git clone https://github.com/DuarteMRAlves/smaug.git
$ cd smaug
$ poetry install
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for unbabel_smaug-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d82a6d818df45ff017218d5eccc8d581408dbc141d67104c3fee27aebef105d6 |
|
MD5 | db3db0415a776d2353a0aca29443ae73 |
|
BLAKE2b-256 | f3e3baafa6f51611b079aa6289ade0a5d01b3f6f1ad2d54c53bec424d64d26eb |