Skip to main content

Data augmentation for NLP

Project description

SC4001 NLarge

Purpose of Project

NLarge is a project focused on exploring and implementing various data augmentation techniques for Natural Language Processing (NLP) tasks. The primary goal is to enhance the diversity and robustness of training datasets, thereby improving the performance and generalization capabilities of NLP models. This project includes traditional data augmentation methods such as synonym replacement and random substitution, as well as advanced techniques using Large Language Models (LLMs).

Initializing Virtual Environment

We use Poetry in this project for dependency management. To get started, you will need to install Poetry.

pip install poetry

Afterwards, you can install the needed packages from Python with the help of Poetry using the command below:

poetry install

Repository Contents

  • report.tex: The LaTeX document containing the detailed report of the project, including methodology, experiments, results, and analysis.
  • example/: Contains example scripts for data augmentation and model training.
  • NLarge/: The main package containing the data augmentation and model implementation.

Usage

To run the models and experiments, you can use the python notebooks in the example/ directory. The notebooks contain detailed explanations and code snippets for data augmentation and model training.

Website

You can access the PiPy page of the project from the link here: pypi page

Our github repository can be found here: github page

Contributing

Contributions to this project are welcome. If you have any suggestions or improvements, please create a pull request or open an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlarge-0.2.2.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nlarge-0.2.2-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file nlarge-0.2.2.tar.gz.

File metadata

  • Download URL: nlarge-0.2.2.tar.gz
  • Upload date:
  • Size: 15.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.13.0 Darwin/24.0.0

File hashes

Hashes for nlarge-0.2.2.tar.gz
Algorithm Hash digest
SHA256 478bf438d6a60c92625441892d8e5ac288db28902e15a9a8a7be570240db6e06
MD5 d6318eff4395eeb6bffd2c8c9bee967e
BLAKE2b-256 deb113ba582b97d3cc9d9b6fcb9dec22538e1ef3878740507b8fa317b8aa167b

See more details on using hashes here.

File details

Details for the file nlarge-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: nlarge-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 12.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.13.0 Darwin/24.0.0

File hashes

Hashes for nlarge-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c98f5c8e9945b62115098d0a121dc7eaaf91c1cb70a6fea76f727000f77ffca0
MD5 b668b20a7cd4181fb0013ace88d40163
BLAKE2b-256 89f5a5c34823f80de003a982ad9581bcd53c399f73fa39d5e614ad2aaa738bb5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page