Skip to main content

A simple collocation-driven recognition of rhymes

Project description

DOI

RhymeTagger

A simple collocation-driven recognition of rhymes. Contains pre-trained models for Czech, English, French, German, Italian, Portuguese, Russian, Slovene, and Spanish poetry.

Details in P. Plecháč (2018). A Collocation-Driven Method of Discovering Rhymes (in Czech, English, and French Poetry). In Taming the Corpus: From Inflection and Lexis to Interpretation. Cham: Springer, 79-95.

! Requires eSpeak NG to be installed

Installation

pip install rhymetagger

or

pip3 install rhymetagger

Usage

To annotate poems with one of the pre-trained models:

from rhymetagger import RhymeTagger

poem = [
	"Tell me not, in mournful numbers,",
	"Life is but an empty dream!",
	"For the soul is dead that slumbers,",
	"And things are not what they seem.",
	"Life is real! Life is earnest!",
	"And the grave is not its goal;",
	"Dust thou art, to dust returnest,",
	"Was not spoken of the soul.",
	"Not enjoyment, and not sorrow,",
	"Is our destined end or way;",
	"But to act, that each tomorrow",
	"Find us farther than today.",
]

rt = RhymeTagger()
rt.load_model(model='en')

rhymes = rt.tag(poem, output_format=3) 
print(rhymes)

>> [1, 2, 1, 2, 3, 4, 3, 4, 5, 6, 5, 6]
poem = [
	"Über allen Gipfeln",
	"Ist Ruh’,",
	"In allen Wipfeln",
	"Spürest du",
	"Kaum einen Hauch;",
	"Die Vögelein schweigen im Walde.",
	"Warte nur, balde",
	"Ruhest du auch.",
]

rt = RhymeTagger()
rt.load_model(model='de')

rhymes = rt.tag(poem, output_format=3) 
print(rhymes)

>> [1, 2, 1, 2, 3, 4, 4, 3]

To train your own model:

from rhymetagger import RhymeTagger

rt = RhymeTagger()
rt.new_model(lang=ISO_CODE)

for poem in YOUR_CORPUS:
	rt.add_to_model(poem)

rt.train_model()
rt.save_model(PATH_TO_FILE)

Pre-trained models

model description
cs Czech model (trained with PoeTree.cs; 80k poems)
de German model (trained with PoeTree.de; 75k poems)
en English model (trained with PoeTree.en; 40k poems)
es Spanish model (trained with PoeTree.es; 9k poems)
fr French model (trained with PoeTree.fr; 18k poems)
it Italian model (trained with PoeTree.it; 40k poems)
pt Portuguese model (trained with PoeTree.pt; 5k poems)
ru Russian model (trained with PoeTree.ru; 45k poems)
sl Slovene model (trained with PoeTree.sl; 5k poems)

Methods

RhymeTagger.load_model(model, verbose=False)

Load one of the pre-trained models or a custom model stored in JSON file

Parameters

model: string

either a name of one of the pre-trained models or path to a JSON file containing custom model

verbose:string

whether to print out info on model settings

RhymeTagger.tag(poem, transcribed=False, output_format=1, **kwargs)

Perform rhyme recognition

Parameters

poem: list

either a list of lines OR list of lists (stanzas > lines), each item may be either string holding text of the line OR ipa transcription (transcribed must be True) OR dict holding both orthography and ipa transcription {'text': ..., 'ipa': ...} (transcribed must be True)

transcribed: boolean

whether ipa transcription is passed

output_format: int

1: returns list of indices for each line 2: returns list of indices for each rhyme 3: returns classic ABBA list where ints instead of letters

e.g. a limerick with a rhyme scheme a-a-b-b-a would be encoded as

1: [ [1,4], [0,4], [2], [3], [0,1] ] 2: [ [0,1,4], [2,3] ] 3: [ 1,1,2,2,1 ]

**kwargs

Parameters that may be used to override settings inherited from the model (window, same_words, ngram, t_score_min, frequency_min, stanza_limit, prob_ipa_min, prob_ngram_min

Returns

rhymes: list

a list of rhymes in the requested format, see output_format

RhymeTagger.new_model(lang, transcribed=False, window=5, syll_max=2, stress=True, vowel_length=True, ngram=1, ngram_length=3, same_words=True, t_score_min=3.078, frequency_min=3, stanza_limit=False, prob_ipa_min=0.95, prob_ngram_min = 0.95, max_iter=20, verbose=True)

Initialize new model

Parameters

lang: string

ISO language code as required by eSpeak

transcribed: boolean

whether ipa transcription is passed

window: int

how many lines forward to look for rhymes

syll_max: int

maximum number of syllables taken into account

stress: boolean

whether to focus only on sounds following after the last stress

vowel_length: boolean

whether vowel length should be taken into account

same_words: boolean

whether repetition of the same word counts as rhyme

ngram: int

upon which iteration to start taking character n-grams into account (one-based indexing, 0 = disregard n-grams completely)

ngram_length: int

length of the character n-grams

t_score_min: float

minimum value of t-score to add pair to train set

frequency_min: int

minimum number of pair occurences to add to train set

stanza_limit: boolean

whether rhymes can only appear within the same stanza

prob_ipa_min: float

minimum ipa-based probability to treat pair as rhyme

prob_ngram_min: float

minimum ngram-based probability to treat pair as rhyme

max_iter: int

maximum number of training iteratations

verbose: boolean

should progress be printed out?

RhymeTagger.add_to_model(poem)

Feed the model with a poem

Parameters

poem: list

either a list of lines OR list of lists (stanzas > lines), each item may be either string holding text of the line OR dict holding both orthography and ipa transcription {'text': ..., 'ipa': ...} (transcribed must be True)

RhymeTagger.train_model()

Train the model fed with poems

RhymeTagger.save_model(file)

Save the model to a JSON file

Parameters

file: string

file path

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rhymetagger-1.0.0.tar.gz (2.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rhymetagger-1.0.0-py3-none-any.whl (2.7 MB view details)

Uploaded Python 3

File details

Details for the file rhymetagger-1.0.0.tar.gz.

File metadata

  • Download URL: rhymetagger-1.0.0.tar.gz
  • Upload date:
  • Size: 2.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for rhymetagger-1.0.0.tar.gz
Algorithm Hash digest
SHA256 20df9824e5fa795d53f5e02e04aaa92c134a80710e0221348d90487810e16766
MD5 24aee81b3eb3c50d915e918351345508
BLAKE2b-256 f76aedfc14068ba3c22b18f66a6686564efc4f03992346d37bd8eb6dd80b2738

See more details on using hashes here.

File details

Details for the file rhymetagger-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: rhymetagger-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 2.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for rhymetagger-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6bdd5ea859beba66d64e1c8be0c6d0fd8dec6d691337c83f21f0d62fc75faeff
MD5 63aa05ad165be0057c4a0362ea7eed24
BLAKE2b-256 42b7eda1afb1c26e3a539ef993c86cfb4570703b2260e0a1d8defcdc3cd9ff01

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page