Skip to main content

Data augmentation for NLP

Project description

SC4001 NLarge

Purpose of Project

NLarge is a project focused on exploring and implementing various data augmentation techniques for Natural Language Processing (NLP) tasks. The primary goal is to enhance the diversity and robustness of training datasets, thereby improving the performance and generalization capabilities of NLP models. This project includes traditional data augmentation methods such as synonym replacement and random substitution, as well as advanced techniques using Large Language Models (LLMs).

Initializing Virtual Environment

We use Poetry in this project for dependency management. To get started, you will need to install Poetry.

pip install poetry

Afterwards, you can install the needed packages from Python with the help of Poetry using the command below:

poetry install

Repository Contents

  • report.tex: The LaTeX document containing the detailed report of the project, including methodology, experiments, results, and analysis.
  • example/: Contains example scripts for data augmentation and model training.
  • NLarge/: The main package containing the data augmentation and model implementation.

Usage

To run the models and experiments, you can use the python notebooks in the example/ directory. The notebooks contain detailed explanations and code snippets for data augmentation and model training.

Contributing

Contributions to this project are welcome. If you have any suggestions or improvements, please create a pull request or open an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlarge-0.2.0.tar.gz (15.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nlarge-0.2.0-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file nlarge-0.2.0.tar.gz.

File metadata

  • Download URL: nlarge-0.2.0.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.13.0 Darwin/24.0.0

File hashes

Hashes for nlarge-0.2.0.tar.gz
Algorithm Hash digest
SHA256 cfdc39aba006297f09b1d61ad0448661f1bad1037df19433163a2bf00bb60bf2
MD5 ae529f0771e4feaf68a615b9b5dee499
BLAKE2b-256 b8bb2570147e2273294b339f595523aa5c93637d17a68e0bf9bbbba885f4f9d8

See more details on using hashes here.

File details

Details for the file nlarge-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: nlarge-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 12.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.13.0 Darwin/24.0.0

File hashes

Hashes for nlarge-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d01f635201576822683bbc7f1a4341b9fefc32351b65a5dd3378d9515474657d
MD5 555c9e1e2604fee229d573c3272b0cf6
BLAKE2b-256 7590000f29570328c8d9c1314da7a3126d676ae84b68c244f34a3c9faeaa29a8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page