Skip to main content

Classify genre and score Vietnamese poems

Project description

Vietnamese poem classification and evaluation 📜🔍

A Vietnamese poem classifer using BertForSequenceClassification with the accuracy of 99.7%

This is a side project during the making of our Vietnamese poem generator

Features

  • Classify Vietnamese poem into categories of 4 chu, 5 chu, 7 chu, luc bat and 8 chu
  • Score the quality of each poem, based soldly on its conformation to the rigid rule of various types of Vietnamese poem. Using 3 criterias: Length, Tone and Rhyme as follow: score = L/10 + 3T/10 + 6R/10

The rules for each genre are defined below:

Genre Length Tone Rhyme
4 chu - 4 words per line
- 4 lines per stanza (optional)
For each line:
- If the 2nd word is uneven (trắc), the 4th word is even (bằng)
- Vice versa
Last word (4th) of each line:
- Continuous rhyme (gieo vần tiếp)
- Alternating rhyme (gieo vần tréo)
- Three-line rhyme (gieo vần ba)
5 chu - 5 words per line
- 4 lines per stanza (optional)
Same as "4 chu" Same as "4 chu"
7 chu - 7 words per line
- 4 lines per stanza (optional)
For each line:
- If the 2nd word is uneven (trắc), the 4th word is even (bằng), the 6th word is uneven (trắc)
- 5th word and last word (7th) must have different tone
The last word of 1st, 2nd, 4th line per stanza must have same tone and rhyme
luc bat - 6 words in odd line
- 8 words in even line
- 4 lines per stanza (optional)
For 6-word line:
- If the 2nd word is uneven (trắc) the 4th word is even (bằng), the 6th word is uneven (trắc)

For 8-word line:
- Must be same as previous 6-word line
- The last word (8th) mut have same tone as 6th word
The last word (6th) in 6-word line must rhyme with the 6th word in the next 8-word line and the 8th word in the previous 8-word line
8 chu - 8 words per line
- 4 lines per stanza (optional)
For each line:
- If the 3rd word is uneven (trắc), the 5th word is even (bằng), the 8th word is uneven (trắc)
Same as "4 chu"

Data

A colelction of 171188 Vietnamese poems with different genres: luc-bat, 5-chu, 7-chu, 8-chu, 4-chu. Download here

For more detail, refer to the Acknowledgments section

Training

Training code is in our repo Vietnamese poem generator

Run:

python poem_classifier_training.py

Installation

pip install vietnamese-poem-classifier

Or

pip install git+https://github.com/Anshler/vietnamese-poem-classifier

Inference

from vietnamese_poem_classifier.poem_classifier import PoemClassifier

classifier = PoemClassifier()

poem = '''Người đi theo gió đuổi mây
          Tôi buồn nhặt nhạnh tháng ngày lãng quên
          Em theo hú bóng kim tiền
          Bần thần tôi ngẫm triền miên thói đời.'''

classifier.predict(poem)

#>> [{'label': 'luc bat', 'confidence': 0.9999017715454102, 'poem_score': 0.75}]

Model

The model's weights are published at Huggingface Anshler/vietnamese-poem-classifier

Acknowledgments

This project was inspired by the evaluation method from fsoft-ailab's_ SP-GPT2 Poem-Generator

Dataset also taken from their repo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vietnamese-poem-classifier-0.1.1.tar.gz (16.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vietnamese_poem_classifier-0.1.1-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file vietnamese-poem-classifier-0.1.1.tar.gz.

File metadata

File hashes

Hashes for vietnamese-poem-classifier-0.1.1.tar.gz
Algorithm Hash digest
SHA256 3a0fa0be3ddd55c3fbd12f7480b01a4a9674cb461c04bb9fac6a6f401e2ce154
MD5 663282ef938c2203c9f2982a5896bf98
BLAKE2b-256 74b1f7fb668b7404192548c37fbee1acd348018079661bdfc57c6719f888e43c

See more details on using hashes here.

File details

Details for the file vietnamese_poem_classifier-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for vietnamese_poem_classifier-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 daa2ece946d6701dc8633f70bd41ee848d30f745e6b3dde610476c0a7ae6de10
MD5 5c8b7c3f6fa96eb704df473db57358dd
BLAKE2b-256 2a0022c1c3b0879b4a45573b16ba14c3c9dc390d2080a429c43143e38aa2ee8c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page