Skip to main content

Data augmentation for NLP

Project description

SC4001 NLarge

Purpose of Project

NLarge is a project focused on exploring and implementing various data augmentation techniques for Natural Language Processing (NLP) tasks. The primary goal is to enhance the diversity and robustness of training datasets, thereby improving the performance and generalization capabilities of NLP models. This project includes traditional data augmentation methods such as synonym replacement and random substitution, as well as advanced techniques using Large Language Models (LLMs).

Initializing Virtual Environment

We use Poetry in this project for dependency management. To get started, you will need to install Poetry.

pip install poetry

Afterwards, you can install the needed packages from Python with the help of Poetry using the command below:

poetry install

Repository Contents

  • report.tex: The LaTeX document containing the detailed report of the project, including methodology, experiments, results, and analysis.
  • example/: Contains example scripts for data augmentation and model training.
  • NLarge/: The main package containing the data augmentation and model implementation.

Usage

To run the models and experiments, you can use the python notebooks in the example/ directory. The notebooks contain detailed explanations and code snippets for data augmentation and model training.

Website

You can access the PiPy page of the project from the link here: pypi page

Our github repository can be found here: github page

Contributing

Contributions to this project are welcome. If you have any suggestions or improvements, please create a pull request or open an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlarge-0.2.3.tar.gz (23.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nlarge-0.2.3-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file nlarge-0.2.3.tar.gz.

File metadata

  • Download URL: nlarge-0.2.3.tar.gz
  • Upload date:
  • Size: 23.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.13.0 Darwin/24.0.0

File hashes

Hashes for nlarge-0.2.3.tar.gz
Algorithm Hash digest
SHA256 3a797c5697db67f00c3bca46bba84dd20e06fd86c1b250b1321f83dde9916f0b
MD5 03c0448fbcf9d5a300297a4ecff8e625
BLAKE2b-256 090857ff7b05f3cc7f9c2207fe6c16b52d07173ff45de2e82f4c4717ef87bcb6

See more details on using hashes here.

File details

Details for the file nlarge-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: nlarge-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 18.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.13.0 Darwin/24.0.0

File hashes

Hashes for nlarge-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 82da19b3fefab0308c750d7026222b77478a8f60914118f29924f4237dcd0971
MD5 af589c90b55883930c46c9e753422075
BLAKE2b-256 93c8be6d07d8f773751cb280c8e6bbc6a47daa4fc8d2360a2d833ddc26969560

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page