Skip to main content

Classify genre and score Vietnamese poems

Project description

Vietnamese poem classification and evaluation 📜🔍

A Vietnamese poem classifer using BertForSequenceClassification with the accuracy of 99.7%

This is a side project during the making of our Vietnamese poem generator

Features

  • Classify Vietnamese poem into categories of 4 chu, 5 chu, 7 chu, luc bat and 8 chu
  • Score the quality of each poem, based soldly on its conformation to the rigid rule of various types of Vietnamese poem. Using 3 criterias: Length, Tone and Rhyme as follow: score = L/10 + 3T/10 + 6R/10

The rules for each genre are defined below:

Genre Length Tone Rhyme
4 chu - 4 words per line
- 4 lines per stanza (optional)
For each line:
- If the 2nd word is uneven (trắc), the 4th word is even (bằng)
- Vice versa
Last word (4th) of each line:
- Continuous rhyme (gieo vần tiếp)
- Alternating rhyme (gieo vần tréo)
- Three-line rhyme (gieo vần ba)
5 chu - 5 words per line
- 4 lines per stanza (optional)
Same as "4 chu" Same as "4 chu"
7 chu - 7 words per line
- 4 lines per stanza (optional)
For each line:
- If the 2nd word is uneven (trắc), the 4th word is even (bằng), the 6th word is uneven (trắc)
- 5th word and last word (7th) must have different tone
The last word of 1st, 2nd, 4th line per stanza must have same tone and rhyme
luc bat - 6 words in odd line
- 8 words in even line
- 4 lines per stanza (optional)
For 6-word line:
- If the 2nd word is uneven (trắc) the 4th word is even (bằng), the 6th word is uneven (trắc)

For 8-word line:
- Must be same as previous 6-word line
- The last word (8th) mut have same tone as 6th word but different accent
The last word (6th) in 6-word line must rhyme with the 6th word in the next 8-word line and the 8th word in the previous 8-word line
8 chu - 8 words per line
- 4 lines per stanza (optional)
For each line:
- If the 3rd word is uneven (trắc), the 5th word is even (bằng), the 8th word is uneven (trắc)
Same as "4 chu"

Data

A collection of 171188 Vietnamese poems with different genres: luc-bat, 5-chu, 7-chu, 8-chu, 4-chu. Download here

For more detail, refer to the Acknowledgments section

Training

Training code is in our repo Vietnamese poem generator

Run:

python poem_classifier_training.py

Installation

pip install vietnamese-poem-classifier

Or

pip install git+https://github.com/Anshler/vietnamese-poem-classifier

Inference

from vietnamese_poem_classifier.poem_classifier import PoemClassifier

classifier = PoemClassifier()

poem = '''Người đi theo gió đuổi mây
          Tôi buồn nhặt nhạnh tháng ngày lãng quên
          Em theo hú bóng kim tiền
          Bần thần tôi ngẫm triền miên thói đời.'''

classifier.predict(poem)

#>> [{'label': 'luc bat', 'confidence': 0.9999017715454102, 'poem_score': 0.75, 'l_score': 1.0, 't_score': 1.0, 'r_score': 0.5833333333333333}]

Model

The model's weights are published at Huggingface Anshler/vietnamese-poem-classifier

Acknowledgments

This project was inspired by the evaluation method from fsoft-ailab's SP-GPT2 Poem-Generator

Dataset also taken from their repo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vietnamese-poem-classifier-0.1.6.tar.gz (16.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vietnamese_poem_classifier-0.1.6-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file vietnamese-poem-classifier-0.1.6.tar.gz.

File metadata

File hashes

Hashes for vietnamese-poem-classifier-0.1.6.tar.gz
Algorithm Hash digest
SHA256 05addeba052fc601826b5d553e031a0275c19536e588f97890e4d6a125813852
MD5 d48457e41d4d3bde955fa27a99ef1223
BLAKE2b-256 2ec274ef2e459db7a7889613acb1153bdb27304a19f3437dadfaaa794a992e57

See more details on using hashes here.

File details

Details for the file vietnamese_poem_classifier-0.1.6-py3-none-any.whl.

File metadata

File hashes

Hashes for vietnamese_poem_classifier-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 a592bd483f2076455a3e4b224a4cf6af0c422e9613ecb6e5f2f55ff9d781bc49
MD5 3e768971a8ce6885b7d2ce21c3e9b710
BLAKE2b-256 5a8f02ea475ba531ad6d7245d35a90086dc51320142289b3d39b1c64d4b80e5d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page