ArrowTextClassifier is a simple text classification tool written in pytorch that allows you to train, summarize, and use text classification models for various tasks.
Project description
ArrowTextClassifier
ArrowTextClassifier is a Python package for text classification tasks, offering functionalities to train, summarize, and classify text using convolutional neural network (CNN) architecture.
Installation
You can install ArrowTextClassifier via pip:
pip install ArrowTextClassifier
How it Works
ArrowTextClassifier implements a convolutional neural network (CNN) architecture for text classification. It tokenizes input text, embeds the tokens, applies convolutional filters over the embedded tokens to extract features, and then classifies the text into predefined categories.
Usage
Training
To train a text classification model, you can utilize the train_model
method provided by the Model
class:
from ArrowTextClassifier import Model
model = Model(name="your_model_name")
model.train_model(dataset)
How to make a dataset
To make your own custom dataset for training you need to create a parquet file with the following format:
Example Parquet File
{"label":"normal","example":"Hey there!"}
{"label":"normal","example":"Hi!"}
{"label":"toxic","example":"You suck!"}
After you have created the parquet file with the data in the format above, you can provide to the dataset to start training the model.
Summarization
To summarize a trained model, you can use the summarize
method:
model.summarize(
model_path="path_to_your_model",
hyperparams_path="path_to_hyperparameters_file",
vocabulary_path="path_to_vocabulary_file",
modelSummary_write_path="path_to_write_model_summary"
)
Classification
For classifying text using the trained model:
result = model.classify(
model_path="path_to_your_model",
hyperparams_path="path_to_hyperparameters_file",
text="your_input_text",
vocabulary_path="path_to_vocabulary_file"
)
print(result)
Getting Started
This package provides tools for text classification tasks. You can explore and customize it according to your requirements. Refer to the documentation for detailed usage instructions. We have also made our own colab notebook to help you train a custom offensive language classifier using this.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contact
For any questions or feedback, please contact technologypower24@gmail.com or you can contact me at discord - techpowerb.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ArrowTextClassifier-1.0.3.tar.gz
.
File metadata
- Download URL: ArrowTextClassifier-1.0.3.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d128a1210cc580c66fb0b6e2f98a27b9d117193945d5c6fbc26b53f93d041697 |
|
MD5 | 80c29ad861f574fe7e106975be132599 |
|
BLAKE2b-256 | 6c1a0010a3aef31d2ce95efdf9d42bc66475060b9c9c5d57887a4446b3b79846 |
File details
Details for the file ArrowTextClassifier-1.0.3-py3-none-any.whl
.
File metadata
- Download URL: ArrowTextClassifier-1.0.3-py3-none-any.whl
- Upload date:
- Size: 9.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3433b196ff044e80e4c5fc016e9726ae01a133dc9d8fc3b4deecbb083b1f22af |
|
MD5 | cb8b0d04dff09bf09616d16a6e4a0c5b |
|
BLAKE2b-256 | 2bd2e6a1111141a1abed2d53209fb18315c298b7754d049173bf12e850f64644 |