Databalancer is the python library dedicated to balance the imbalanced text classification datasets before the model training in machine learning applications
Project description
Databalancer
Databalancer is the python library using in machine learning applications to balance the imbalanced text classification datasets before the model training
Features
- Databalancer is able to balance any imbalanced text classification datasets
- If the given dataset is imbalanced then while balancing no existing data will remove but new data will be generated and added to the dataset
- For a particular class the newly generated data will be the paraphrases of the existing data in that particular class
- By default these paraphrases are generated using the ramsrigouthamg/t5_paraphraser model (You can read more about the model from Huggingface official documentation)
- Databalancer also provides another method called classCountVisualization to show the dataset class count distribution
Installation
Install the databalancer
package with pip
pip install databalancer
Compatibility
Databalancer is only compatable with python 3.6.9 or above.
Quick Start
The library databalancer provides two different functionalities
1 - classCountVisualization
2 - balanceDataset
classCountVisualization
#Import the classCountVisualization from the 'databalancer' module
from databalancer import classCountVisualization
#Pass the required datasetname(here traindata.csv) to the function
classCountVisualization("traindata.csv")
Output
balanceDataset
#Import the balanceDataset from the 'databalancer' module
from databalancer import balanceDataset
#Pass the dataset name which is to be balanced(here traindata.csv) to the balanceDataset function
balanceDataset("traindata.csv")
The above code will balance the dataset and store the balanced dataset('balanced_data.csv') in the local machine.
To show the balanced dataset class count distribution, run the code below
from databalancer import classCountVisualization
classCountVisualization("balanced_data.csv")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for databalancer-0.0.7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e1386c0085efba8a139b886a46a8b84abc007f2af60cfdedf5933dd289cf3bf8 |
|
MD5 | e1b98036eb153a449487f640e872e49d |
|
BLAKE2b-256 | 0df950bf6ac02a88b3b014c8fbe49a65df5200bd816796d696c1e8c44cc5cc50 |