Skip to main content

Data augmentation for NLP

Project description

SC4001 NLarge

Purpose of Project

NLarge is a project focused on exploring and implementing various data augmentation techniques for Natural Language Processing (NLP) tasks. The primary goal is to enhance the diversity and robustness of training datasets, thereby improving the performance and generalization capabilities of NLP models. This project includes traditional data augmentation methods such as synonym replacement and random substitution, as well as advanced techniques using Large Language Models (LLMs).

Initializing Virtual Environment

We use Poetry in this project for dependency management. To get started, you will need to install Poetry.

pip install poetry

Afterwards, you can install the needed packages from Python with the help of Poetry using the command below:

poetry install

Repository Contents

  • report.tex: The LaTeX document containing the detailed report of the project, including methodology, experiments, results, and analysis.
  • example/: Contains example scripts for data augmentation and model training.
  • NLarge/: The main package containing the data augmentation and model implementation.

Usage

To run the models and experiments, you can use the python notebooks in the example/ directory. The notebooks contain detailed explanations and code snippets for data augmentation and model training.

Website

You can access the PiPy page of the project from the link here: pypi page

Our github repository can be found here: github page

Contributing

Contributions to this project are welcome. If you have any suggestions or improvements, please create a pull request or open an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlarge-0.2.1.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nlarge-0.2.1-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file nlarge-0.2.1.tar.gz.

File metadata

  • Download URL: nlarge-0.2.1.tar.gz
  • Upload date:
  • Size: 15.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.13.0 Darwin/24.0.0

File hashes

Hashes for nlarge-0.2.1.tar.gz
Algorithm Hash digest
SHA256 f6b845ae5c312899ef7b6fd88a8c8defdd7a34cb62837b6a35cad6a82314cca9
MD5 6f099c7e95cfe145712228468e2c8a8c
BLAKE2b-256 2546bf27d6b2565666b85054d1b0d1eaed14416527791d390d188156ea254e42

See more details on using hashes here.

File details

Details for the file nlarge-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: nlarge-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 12.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.13.0 Darwin/24.0.0

File hashes

Hashes for nlarge-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 62dd3f2e56584b0b4a567c0f7576355e3bdad166c167b3032706bc79b2fee19f
MD5 ccd7a698f773394ae818e1f6167b72c9
BLAKE2b-256 01657a11345431b7ee3db256d9831f73fd6efcbb40fe94431cc6993824014586

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page