Classify genre and score Vietnamese poems
Project description
Vietnamese poem classification and evaluation 📜🔍
A Vietnamese poem classifer using BertForSequenceClassification with the accuracy of 99.7%
This is a side project during the making of our Vietnamese poem generator
Features
- Classify Vietnamese poem into categories of
4 chu,5 chu,7 chu,luc batand8 chu - Score the quality of each poem, based soldly on its conformation to the rigid rule of various types of Vietnamese poem. Using 3 criterias: Length, Tone and Rhyme as follow:
score = L/10 + 3T/10 + 6R/10
The rules for each genre are defined below:
| Genre | Length | Tone | Rhyme |
|---|---|---|---|
| 4 chu | - 4 words per line - 4 lines per stanza (optional) |
For each line: - If the 2nd word is uneven (trắc), the 4th word is even (bằng) - Vice versa |
Last word (4th) of each line: - Continuous rhyme (gieo vần tiếp) - Alternating rhyme (gieo vần tréo) - Three-line rhyme (gieo vần ba) |
| 5 chu | - 5 words per line - 4 lines per stanza (optional) |
Same as "4 chu" | Same as "4 chu" |
| 7 chu | - 7 words per line - 4 lines per stanza (optional) |
For each line: - If the 2nd word is uneven (trắc), the 4th word is even (bằng), the 6th word is uneven (trắc) - 5th word and last word (7th) must have different tone |
The last word of 1st, 2nd, 4th line per stanza must have same tone and rhyme |
| luc bat | - 6 words in odd line - 8 words in even line - 4 lines per stanza (optional) |
For 6-word line: - If the 2nd word is uneven (trắc) the 4th word is even (bằng), the 6th word is uneven (trắc) For 8-word line: - Must be same as previous 6-word line - The last word (8th) mut have same tone as 6th word but different accent |
The last word (6th) in 6-word line must rhyme with the 6th word in the next 8-word line and the 8th word in the previous 8-word line |
| 8 chu | - 8 words per line - 4 lines per stanza (optional) |
For each line: - If the 3rd word is uneven (trắc), the 5th word is even (bằng), the 8th word is uneven (trắc) |
Same as "4 chu" |
Data
A collection of 171188 Vietnamese poems with different genres: luc-bat, 5-chu, 7-chu, 8-chu, 4-chu. Download here
For more detail, refer to the Acknowledgments section
Training
Training code is in our repo Vietnamese poem generator
Run:
python poem_classifier_training.py
Installation
pip install vietnamese-poem-classifier
Or
pip install git+https://github.com/Anshler/vietnamese-poem-classifier
Inference
from vietnamese_poem_classifier.poem_classifier import PoemClassifier
classifier = PoemClassifier()
poem = '''Người đi theo gió đuổi mây
Tôi buồn nhặt nhạnh tháng ngày lãng quên
Em theo hú bóng kim tiền
Bần thần tôi ngẫm triền miên thói đời.'''
classifier.predict(poem)
#>> [{'label': 'luc bat', 'confidence': 0.9999017715454102, 'poem_score': 0.75, 'l_score': 1.0, 't_score': 1.0, 'r_score': 0.5833333333333333}]
Model
The model's weights are published at Huggingface Anshler/vietnamese-poem-classifier
Acknowledgments
This project was inspired by the evaluation method from fsoft-ailab's SP-GPT2 Poem-Generator
Dataset also taken from their repo
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vietnamese-poem-classifier-0.1.6.tar.gz.
File metadata
- Download URL: vietnamese-poem-classifier-0.1.6.tar.gz
- Upload date:
- Size: 16.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
05addeba052fc601826b5d553e031a0275c19536e588f97890e4d6a125813852
|
|
| MD5 |
d48457e41d4d3bde955fa27a99ef1223
|
|
| BLAKE2b-256 |
2ec274ef2e459db7a7889613acb1153bdb27304a19f3437dadfaaa794a992e57
|
File details
Details for the file vietnamese_poem_classifier-0.1.6-py3-none-any.whl.
File metadata
- Download URL: vietnamese_poem_classifier-0.1.6-py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a592bd483f2076455a3e4b224a4cf6af0c422e9613ecb6e5f2f55ff9d781bc49
|
|
| MD5 |
3e768971a8ce6885b7d2ce21c3e9b710
|
|
| BLAKE2b-256 |
5a8f02ea475ba531ad6d7245d35a90086dc51320142289b3d39b1c64d4b80e5d
|