Skip to main content

Scansion tool for Spanish texts

Project description

Rantanplan is a Python library for the automated scansion of Spanish poetry. Scansion is the measurement of the rhythm of verses of a poem and our tool achieves state-of-the-art results for mixed metre poems. It is also able to identify up to 45 different types of the most significant Spanish stanzas. Rantanplan is fast and accurate as it is built using SpaCy and SpaCy-affixes.

  • Free software: Apache Software License 2.0

Installation

pip install rantanplan

Install required resources

  1. Install spaCy model language for Spanish:

    python -m spacy download es_core_news_md
  2. Install Freeling rules for affixes:

    python -m spacy_affixes download es

Usage

Import Rantanplan

To use Rantanplan in a project:

import rantanplan

Usage example

from rantanplan.core import get_scansion

poem = """Me gustas cuando callas porque estás como ausente,
y me oyes desde lejos, y mi voz no te toca.
Parece que los ojos se te hubieran volado
y parece que un beso te cerrara la boca.

Como todas las cosas están llenas de mi alma
emerges de las cosas, llena del alma mía.
Mariposa de sueño, te pareces a mi alma,
y te pareces a la palabra melancolía."""

get_scansion(poem)

Output example

The output of Rantanplan is a complex structure that will be broken down for clarity.

First, Rantanplan will show a list of stanzas. Each stanza is then shown as two separate lists. A list of tokens, and a list of “phonological groups” i.e., the phonological units that form a verse after synalephas and sinaereris are taken into account.

Tokens

If the token is a word, it shows a list of the syllables it is made of, with the following information:

  • syllable: The text of the syllable.

  • is_stressed: Whether the syllable is stressed or not.

  • is_word_end: Whether the syllable is the end of a word or not.

  • has_synalepha or has_sinaeresis: Whether or not the syllable can be conjoined with the next one.

  • stress_position: Index, starting from 0, for the stressed syllable of the word. If the index is negative, the syllable position is counted from the end of the word:

    • 0: First syllable

    • -1: Last syllable

    • -2: Penultimate syllable

    • etc

If the token is a not a word, it is shown as symbol.

List of tokens example

 {'tokens': [{'word': [{'syllable': 'co', 'is_stressed': True},
    {'syllable': 'mo',
     'is_stressed': False,
     'has_synalepha': True,
     'is_word_end': True}],
   'stress_position': 0},
  {'word': [{'syllable': 'au', 'is_stressed': False}
...
  {'symbol': ','}],
...

Phonological groups

The next element of the output is a list of phonological groups. We use this term to refer to the phonological unit that makes up a poem when it is read, after synalephas and sinaereris are taken into account.

Phonological groups are quite similar to the token list but have no word boundaries because this is lost when applying synalephas. Each syllable within phonological_groups can carry the following information:

  • syllable: The text of the syllable.

  • is_stressed: Whether the syllable is stressed or not.

  • is_word_end: Whether the syllable is the end of a word or not.

  • synalepha_index or sinaeresis_index: The index of the character where the syllable is conjoined with the next one:

    • 0: No synalepha or sinaeresis has been realised.

    • Any other number: List of indexes on the syllable, starting from 0, where the original syllable or syllables have been conjoined with the next one:

      • Example: The syllable moau was originally split at position 1:

        {'syllable': 'moau', 'is_stressed': False, 'synalepha_index': [1]}
      • Indexes of the syllable:

        m o a u

        0 1 2 3

        We split at position 1: o, so then, we know that the original syllables are mo and au

Phonological groups example

{'phonological_groups': [{'syllable': 'Me',
  'is_stressed': False,
  'is_word_end': True},
 {'syllable': 'gus', 'is_stressed': True},
 {'syllable': 'tas', 'is_stressed': False, 'is_word_end': True},
 {'syllable': 'cuan', 'is_stressed': False},
 {'syllable': 'do', 'is_stressed': False, 'is_word_end': True},
 {'syllable': 'ca', 'is_stressed': True},
 {'syllable': 'llas', 'is_stressed': False, 'is_word_end': True},
 {'syllable': 'por', 'is_stressed': False},
 {'syllable': 'quees', 'is_stressed': False, 'synalepha_index': [2]},
 {'syllable': 'tás', 'is_stressed': True, 'is_word_end': True},
 {'syllable': 'co', 'is_stressed': False},
 {'syllable': 'moau', 'is_stressed': False, 'synalepha_index': [1]},
 {'syllable': 'sen', 'is_stressed': True},
 {'syllable': 'te', 'is_stressed': False, 'is_word_end': True}],

Metrical information

Finally, at the verse level we find information about the verse itself on the rhythm key:

  • rhythm: Pattern of the unstressed (-) and stressed (+) syllable. This output can be changed with the parameter rhythm_format. You can find more information about how this parameter works on the documentation.

  • length: Proposed length for the verse.

  • length_range: Minimum and maximum verse length possible. This is calculated taking into account all possible sinaeresis and synalephas.

Metrical information example

'rhythm': {'stress': '---+----+----+-',
 'length': 14,
 'length_range': {'min_length': 13, 'max_length': 16}},
 ...

Stanza detection

Rantanplan is also able to detect the stanza type from a list of popular Spanish stanzas. The complete list is:

  • Cantar

  • Chamberga

  • Copla arte mayor

  • Copla arte menor

  • Copla castellana

  • Copla mixta

  • Copla real

  • Couplet

  • Cuaderna vía

  • Cuarteta

  • Cuarteto

  • Cuarteto lira

  • Décima antigua

  • Endecha real

  • Espinela

  • Estrofa francisco de la torre

  • Estrofa manriqueña

  • Estrofa sáfica

  • Estrofa sáfica unamuno

  • Haiku

  • Lira

  • Novena

  • Octava

  • Octava real

  • Octavilla

  • Ovillejo

  • Quinteto

  • Quintilla

  • Redondilla

  • Romance

  • Romance arte mayor

  • Seguidilla

  • Seguidilla compuesta

  • Seguidilla gitana

  • Septeto

  • Septeto lira

  • Septilla

  • Serventesio

  • Sexta rima

  • Sexteto

  • Sexteto lira

  • Sextilla

  • Silva arromanzada

  • Soleá

  • Sonnet

  • Tercetillo

  • Terceto

  • Terceto encadenado

  • Terceto monorrimo

When this option is enabled with the rhyme_analysis, additional information about the stanza is shown on the output.

If we take this “cuarteto” for example:

Yo persigo una forma que no encuentra mi estilo,
botón de pensamiento que busca ser la rosa;
se anuncia con un beso que en mis labios se posa
al abrazo imposible de la Venus de Milo

If we call get_scansion with the rhyme_analysis parameter set to True, the following information is added to the analysis of each line:

  • structure: The name of the stanza that has been detected

  • rhyme: A letter code to match rhyming verses. In this example, verse 1 rhymes with verse 4, and verse 2 rhymes with verse 3, and a letter is assigned to verses that rhyme together as shown below:

    Yo persigo una forma que no encuentra mi estilo,  a
    botón de pensamiento que busca ser la rosa;       b
    se anuncia con un beso que en mis labios se posa  b
    al abrazo imposible de la Venus de Milo           a
  • ending: What part of the last word is rhyming.

  • ending_stress: Negative index (-1 for last, -2 for penultimate, etc.) for the vowel that carries the stress of the rhyming part.

  • rhyme_type: Whether the rhyme is consonant or assonant:
    • Consonant: All characters from the last stressed vowel to the end the the word coincide on verses that rhyme. For example:

      estILO
      mILO
    • Assonant: Same as consonant rhyme but only if all vowels match:

      amAdO
      cachArrO
  • rhyme_relaxation: Whether ot not rules for rhyme relaxation are applied. For example, removing weak vowels on diphthongs or making letters match when they are pronounced the same, for example c and z.

Stanza detection example

'structure': 'cuarteto',
'rhyme': 'a',
'ending': 'ilo',
'ending_stress': -3,
'rhyme_type': 'consonant',
'rhyme_relaxation': True},
 ...

Full output example

A complete example of Rantanplan output is shown here:

  [{'tokens': [{'word': [{'syllable': 'Me',
    'is_stressed': False,
    'is_word_end': True}],
  'stress_position': 0},
 {'word': [{'syllable': 'gus', 'is_stressed': True},
   {'syllable': 'tas', 'is_stressed': False, 'is_word_end': True}],
  'stress_position': -2},
 {'word': [{'syllable': 'cuan', 'is_stressed': False},
   {'syllable': 'do', 'is_stressed': False, 'is_word_end': True}],
  'stress_position': 0},
 {'word': [{'syllable': 'ca', 'is_stressed': True},
   {'syllable': 'llas', 'is_stressed': False, 'is_word_end': True}],
  'stress_position': -2},
 {'word': [{'syllable': 'por', 'is_stressed': False},
   {'syllable': 'que',
    'is_stressed': False,
    'has_synalepha': True,
    'is_word_end': True}],
  'stress_position': 0},
 {'word': [{'syllable': 'es', 'is_stressed': False},
   {'syllable': 'tás', 'is_stressed': True, 'is_word_end': True}],
  'stress_position': -1},
 {'word': [{'syllable': 'co', 'is_stressed': False},
   {'syllable': 'mo',
    'is_stressed': False,
    'has_synalepha': True,
    'is_word_end': True}],
  'stress_position': 0},
 {'word': [{'syllable': 'au', 'is_stressed': False},
   {'syllable': 'sen', 'is_stressed': True},
   {'syllable': 'te', 'is_stressed': False, 'is_word_end': True}],
  'stress_position': -2},
 {'symbol': ','}],
'phonological_groups': [{'syllable': 'Me',
  'is_stressed': False,
  'is_word_end': True},
 {'syllable': 'gus', 'is_stressed': True},
 {'syllable': 'tas', 'is_stressed': False, 'is_word_end': True},
 {'syllable': 'cuan', 'is_stressed': False},
 {'syllable': 'do', 'is_stressed': False, 'is_word_end': True},
 {'syllable': 'ca', 'is_stressed': True},
 {'syllable': 'llas', 'is_stressed': False, 'is_word_end': True},
 {'syllable': 'por', 'is_stressed': False},
 {'syllable': 'quees', 'is_stressed': False, 'synalepha_index': [2]},
 {'syllable': 'tás', 'is_stressed': True, 'is_word_end': True},
 {'syllable': 'co', 'is_stressed': False},
 {'syllable': 'moau', 'is_stressed': False, 'synalepha_index': [1]},
 {'syllable': 'sen', 'is_stressed': True},
 {'syllable': 'te', 'is_stressed': False, 'is_word_end': True}],
'rhythm': {'stress': '-+---+---+--+-', 'type': 'pattern', 'length': 14}},
 ...

Documentation

https://rantanplan.readthedocs.io/

Development

To run the all tests run:

tox

Note, to combine the coverage data from all the tox environments run:

Windows

set PYTEST_ADDOPTS=--cov-append
tox

Other

PYTEST_ADDOPTS=--cov-append tox

Changelog

0.7.1 (2021-09-13)

  • Fix output.

0.7.0 (2021-09-13)

  • Added option to generate a new output format, compliant with POSTDATA ontology.

  • Updated README.

0.6.0 (2021-01-28)

  • Option to show rhyme pattern.

  • Better documentation and README

  • Fixed rhyme issue when synalepha present on rhyming syllables

  • Add PoS to the output.

  • Added more rhyme patterns to stanzas rules, better handling of diphthongs with ‘h’.

  • Refactorization, typos fixed, and added more tests.

0.5.0 (2020-09-28)

Added support for the automatic detection of most Spanish stanzas:

  • Cantar

  • Chamberga

  • Copla arte mayor

  • Copla arte menor

  • Copla castellana

  • Copla mixta

  • Copla real

  • Couplet

  • Cuaderna vía

  • Cuarteta

  • Cuarteto

  • Cuarteto lira

  • Décima antigua

  • Endecha real

  • Espinela

  • Estrofa francisco de la torre

  • Estrofa manriqueña

  • Estrofa sáfica

  • Estrofa sáfica unamuno

  • Haiku

  • Lira

  • Novena

  • Octava

  • Octava real

  • Octavilla

  • Ovillejo

  • Quinteto

  • Quintilla

  • Redondilla

  • Romance

  • Romance arte mayor

  • Seguidilla

  • Seguidilla compuesta

  • Seguidilla gitana

  • Septeto

  • Septeto lira

  • Septilla

  • Serventesio

  • Sexta rima

  • Sexteto

  • Sexteto lira

  • Sextilla

  • Silva arromanzada

  • Soleá

  • Tercetillo

  • Terceto

  • Terceto encadenado

  • Terceto monorrimo

0.4.3 (2020-03-24)

  • Added support for filtering consecutive liaisons and syllabification

  • Added missing documentation

0.4.2 (2020-03-11)

  • Added documentation

0.4.1 (2019-12-19)

  • Added ‘AUX’ to the split_on list for spacy affixes

  • Fixed syllabification exceptions, support for disabling/enabling spacy_affixes

  • Fixed multiline break

  • Fixed splitted verb stresses and secondary stress on ‘-mente’ adverbs

  • Fixed some issues

  • Added minimum length for ‘-mente’ adverbs

0.4.0 (2019-11-21)

  • Added SpaCy Doc input support

  • Add umlaut hyatus

  • Added new hyatus and fixed init

  • Refactoring code

  • Feat/new syllabification

  • Naming conventions

  • Adding rhyme analaysis to scansion output

  • Adding ‘singleton’ behaviour to load_pipeline

  • Metre analysis w/ sinaeresis and synalephas

  • Added new workflow for syllabification, with tests

  • Post syllabification rules regexes

  • Added unit tests for all functions

0.3.0 (2019-06-18)

  • Added SpaCy Doc input support

  • Add umlaut hyatus

  • Fixed syllabyfication errors, affixes and the pipeline

  • Fixed hyphenator for diphthongs with u umlaut

  • Added hyphenation for explicit hyatus with umlaut vowels

  • Added new hyatus and fixed __init__

0.2.0 (2019-06-14)

  • Better hyphenator, and affixes and pipeline fixes

0.1.2 (2019-06-10)

  • Republishing on Pypi

0.1.0 (2019-07-03)

  • Project name change.

0.0.1 (2019-02-21)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rantanplan-0.7.1.tar.gz (807.6 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page