Data augmentation for NLP
Project description
SC4001 NLarge
Purpose of Project
NLarge is a project focused on exploring and implementing various data augmentation techniques for Natural Language Processing (NLP) tasks. The primary goal is to enhance the diversity and robustness of training datasets, thereby improving the performance and generalization capabilities of NLP models. This project includes traditional data augmentation methods such as synonym replacement and random substitution, as well as advanced techniques using Large Language Models (LLMs).
Initializing Virtual Environment
We use Poetry in this project for dependency management. To get started, you will need to install Poetry.
pip install poetry
Afterwards, you can install the needed packages from Python with the help of Poetry using the command below:
poetry install
Repository Contents
report.tex: The LaTeX document containing the detailed report of the project, including methodology, experiments, results, and analysis.example/: Contains example scripts for data augmentation and model training.NLarge/: The main package containing the data augmentation and model implementation.
Usage
To run the models and experiments, you can use the python notebooks in the example/ directory. The notebooks contain detailed explanations and code snippets for data augmentation and model training.
Website
You can access the PiPy page of the project from the link here: pypi page
Our github repository can be found here: github page
Contributing
Contributions to this project are welcome. If you have any suggestions or improvements, please create a pull request or open an issue.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nlarge-0.2.1.tar.gz.
File metadata
- Download URL: nlarge-0.2.1.tar.gz
- Upload date:
- Size: 15.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.13.0 Darwin/24.0.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f6b845ae5c312899ef7b6fd88a8c8defdd7a34cb62837b6a35cad6a82314cca9
|
|
| MD5 |
6f099c7e95cfe145712228468e2c8a8c
|
|
| BLAKE2b-256 |
2546bf27d6b2565666b85054d1b0d1eaed14416527791d390d188156ea254e42
|
File details
Details for the file nlarge-0.2.1-py3-none-any.whl.
File metadata
- Download URL: nlarge-0.2.1-py3-none-any.whl
- Upload date:
- Size: 12.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.13.0 Darwin/24.0.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62dd3f2e56584b0b4a567c0f7576355e3bdad166c167b3032706bc79b2fee19f
|
|
| MD5 |
ccd7a698f773394ae818e1f6167b72c9
|
|
| BLAKE2b-256 |
01657a11345431b7ee3db256d9831f73fd6efcbb40fe94431cc6993824014586
|