Scansion tool for Spanish texts
Project description
Rantanplan is a Python library for the automated scansion of Spanish poetry. Scansion is the measurement of the rhythm of verses of a poem and our tool achieves state-of-the-art results for mixed metre poems. It is also able to identify up to 45 different types of the most significant Spanish stanzas. Rantanplan is fast and accurate as it is built using SpaCy and SpaCy-affixes.
Free software: Apache Software License 2.0
Installation
pip install rantanplan
Install required resources
Install spaCy model language for Spanish:
python -m spacy download es_core_news_md
Install Freeling rules for affixes:
python -m spacy_affixes download es
Usage
Import Rantanplan
To use Rantanplan in a project:
import rantanplan
Usage example
from rantanplan.core import get_scansion
poem = """Me gustas cuando callas porque estás como ausente,
y me oyes desde lejos, y mi voz no te toca.
Parece que los ojos se te hubieran volado
y parece que un beso te cerrara la boca.
Como todas las cosas están llenas de mi alma
emerges de las cosas, llena del alma mía.
Mariposa de sueño, te pareces a mi alma,
y te pareces a la palabra melancolía."""
get_scansion(poem)
Output example
The output of Rantanplan is a complex structure that will be broken down for clarity.
First, Rantanplan will show a list of stanzas. Each stanza is then shown as two separate lists. A list of tokens, and a list of “phonological groups” i.e., the phonological units that form a verse after synalephas and sinaereris are taken into account.
Tokens
If the token is a word, it shows a list of the syllables it is made of, with the following information:
syllable: The text of the syllable.
is_stressed: Whether the syllable is stressed or not.
is_word_end: Whether the syllable is the end of a word or not.
has_synalepha or has_sinaeresis: Whether or not the syllable can be conjoined with the next one.
stress_position: Index, starting from 0, for the stressed syllable of the word. If the index is negative, the syllable position is counted from the end of the word:
0: First syllable
-1: Last syllable
-2: Penultimate syllable
etc
If the token is a not a word, it is shown as symbol.
List of tokens example
{'tokens': [{'word': [{'syllable': 'co', 'is_stressed': True},
{'syllable': 'mo',
'is_stressed': False,
'has_synalepha': True,
'is_word_end': True}],
'stress_position': 0},
{'word': [{'syllable': 'au', 'is_stressed': False}
...
{'symbol': ','}],
...
Phonological groups
The next element of the output is a list of phonological groups. We use this term to refer to the phonological unit that makes up a poem when it is read, after synalephas and sinaereris are taken into account.
Phonological groups are quite similar to the token list but have no word boundaries because this is lost when applying synalephas. Each syllable within phonological_groups can carry the following information:
syllable: The text of the syllable.
is_stressed: Whether the syllable is stressed or not.
is_word_end: Whether the syllable is the end of a word or not.
synalepha_index or sinaeresis_index: The index of the character where the syllable is conjoined with the next one:
0: No synalepha or sinaeresis has been realised.
Any other number: List of indexes on the syllable, starting from 0, where the original syllable or syllables have been conjoined with the next one:
Example: The syllable moau was originally split at position 1:
{'syllable': 'moau', 'is_stressed': False, 'synalepha_index': [1]}
Indexes of the syllable:
m o a u
0 1 2 3
We split at position 1: o, so then, we know that the original syllables are mo and au
Phonological groups example
{'phonological_groups': [{'syllable': 'Me',
'is_stressed': False,
'is_word_end': True},
{'syllable': 'gus', 'is_stressed': True},
{'syllable': 'tas', 'is_stressed': False, 'is_word_end': True},
{'syllable': 'cuan', 'is_stressed': False},
{'syllable': 'do', 'is_stressed': False, 'is_word_end': True},
{'syllable': 'ca', 'is_stressed': True},
{'syllable': 'llas', 'is_stressed': False, 'is_word_end': True},
{'syllable': 'por', 'is_stressed': False},
{'syllable': 'quees', 'is_stressed': False, 'synalepha_index': [2]},
{'syllable': 'tás', 'is_stressed': True, 'is_word_end': True},
{'syllable': 'co', 'is_stressed': False},
{'syllable': 'moau', 'is_stressed': False, 'synalepha_index': [1]},
{'syllable': 'sen', 'is_stressed': True},
{'syllable': 'te', 'is_stressed': False, 'is_word_end': True}],
Metrical information
Finally, at the verse level we find information about the verse itself on the rhythm key:
rhythm: Pattern of the unstressed (-) and stressed (+) syllable. This output can be changed with the parameter rhythm_format. You can find more information about how this parameter works on the documentation.
length: Proposed length for the verse.
length_range: Minimum and maximum verse length possible. This is calculated taking into account all possible sinaeresis and synalephas.
Metrical information example
'rhythm': {'stress': '---+----+----+-',
'length': 14,
'length_range': {'min_length': 13, 'max_length': 16}},
...
Stanza detection
Rantanplan is also able to detect the stanza type from a list of popular Spanish stanzas. The complete list is:
Cantar
Chamberga
Copla arte mayor
Copla arte menor
Copla castellana
Copla mixta
Copla real
Couplet
Cuaderna vía
Cuarteta
Cuarteto
Cuarteto lira
Décima antigua
Endecha real
Espinela
Estrofa francisco de la torre
Estrofa manriqueña
Estrofa sáfica
Estrofa sáfica unamuno
Haiku
Lira
Novena
Octava
Octava real
Octavilla
Ovillejo
Quinteto
Quintilla
Redondilla
Romance
Romance arte mayor
Seguidilla
Seguidilla compuesta
Seguidilla gitana
Septeto
Septeto lira
Septilla
Serventesio
Sexta rima
Sexteto
Sexteto lira
Sextilla
Silva arromanzada
Soleá
Sonnet
Tercetillo
Terceto
Terceto encadenado
Terceto monorrimo
When this option is enabled with the rhyme_analysis, additional information about the stanza is shown on the output.
If we take this “cuarteto” for example:
Yo persigo una forma que no encuentra mi estilo, botón de pensamiento que busca ser la rosa; se anuncia con un beso que en mis labios se posa al abrazo imposible de la Venus de Milo
If we call get_scansion with the rhyme_analysis parameter set to True, the following information is added to the analysis of each line:
structure: The name of the stanza that has been detected
rhyme: A letter code to match rhyming verses. In this example, verse 1 rhymes with verse 4, and verse 2 rhymes with verse 3, and a letter is assigned to verses that rhyme together as shown below:
Yo persigo una forma que no encuentra mi estilo, a botón de pensamiento que busca ser la rosa; b se anuncia con un beso que en mis labios se posa b al abrazo imposible de la Venus de Milo a
ending: What part of the last word is rhyming.
ending_stress: Negative index (-1 for last, -2 for penultimate, etc.) for the vowel that carries the stress of the rhyming part.
- rhyme_type: Whether the rhyme is consonant or assonant:
Consonant: All characters from the last stressed vowel to the end the the word coincide on verses that rhyme. For example:
estILO mILO
Assonant: Same as consonant rhyme but only if all vowels match:
amAdO cachArrO
rhyme_relaxation: Whether ot not rules for rhyme relaxation are applied. For example, removing weak vowels on diphthongs or making letters match when they are pronounced the same, for example c and z.
Stanza detection example
'structure': 'cuarteto',
'rhyme': 'a',
'ending': 'ilo',
'ending_stress': -3,
'rhyme_type': 'consonant',
'rhyme_relaxation': True},
...
Full output example
A complete example of Rantanplan output is shown here:
[{'tokens': [{'word': [{'syllable': 'Me',
'is_stressed': False,
'is_word_end': True}],
'stress_position': 0},
{'word': [{'syllable': 'gus', 'is_stressed': True},
{'syllable': 'tas', 'is_stressed': False, 'is_word_end': True}],
'stress_position': -2},
{'word': [{'syllable': 'cuan', 'is_stressed': False},
{'syllable': 'do', 'is_stressed': False, 'is_word_end': True}],
'stress_position': 0},
{'word': [{'syllable': 'ca', 'is_stressed': True},
{'syllable': 'llas', 'is_stressed': False, 'is_word_end': True}],
'stress_position': -2},
{'word': [{'syllable': 'por', 'is_stressed': False},
{'syllable': 'que',
'is_stressed': False,
'has_synalepha': True,
'is_word_end': True}],
'stress_position': 0},
{'word': [{'syllable': 'es', 'is_stressed': False},
{'syllable': 'tás', 'is_stressed': True, 'is_word_end': True}],
'stress_position': -1},
{'word': [{'syllable': 'co', 'is_stressed': False},
{'syllable': 'mo',
'is_stressed': False,
'has_synalepha': True,
'is_word_end': True}],
'stress_position': 0},
{'word': [{'syllable': 'au', 'is_stressed': False},
{'syllable': 'sen', 'is_stressed': True},
{'syllable': 'te', 'is_stressed': False, 'is_word_end': True}],
'stress_position': -2},
{'symbol': ','}],
'phonological_groups': [{'syllable': 'Me',
'is_stressed': False,
'is_word_end': True},
{'syllable': 'gus', 'is_stressed': True},
{'syllable': 'tas', 'is_stressed': False, 'is_word_end': True},
{'syllable': 'cuan', 'is_stressed': False},
{'syllable': 'do', 'is_stressed': False, 'is_word_end': True},
{'syllable': 'ca', 'is_stressed': True},
{'syllable': 'llas', 'is_stressed': False, 'is_word_end': True},
{'syllable': 'por', 'is_stressed': False},
{'syllable': 'quees', 'is_stressed': False, 'synalepha_index': [2]},
{'syllable': 'tás', 'is_stressed': True, 'is_word_end': True},
{'syllable': 'co', 'is_stressed': False},
{'syllable': 'moau', 'is_stressed': False, 'synalepha_index': [1]},
{'syllable': 'sen', 'is_stressed': True},
{'syllable': 'te', 'is_stressed': False, 'is_word_end': True}],
'rhythm': {'stress': '-+---+---+--+-', 'type': 'pattern', 'length': 14}},
...
Documentation
Development
To run the all tests run:
tox
Note, to combine the coverage data from all the tox environments run:
Windows |
set PYTEST_ADDOPTS=--cov-append tox |
---|---|
Other |
PYTEST_ADDOPTS=--cov-append tox |
Changelog
0.7.1 (2021-09-13)
Fix output.
0.7.0 (2021-09-13)
Added option to generate a new output format, compliant with POSTDATA ontology.
Updated README.
0.6.0 (2021-01-28)
Option to show rhyme pattern.
Better documentation and README
Fixed rhyme issue when synalepha present on rhyming syllables
Add PoS to the output.
Added more rhyme patterns to stanzas rules, better handling of diphthongs with ‘h’.
Refactorization, typos fixed, and added more tests.
0.5.0 (2020-09-28)
Added support for the automatic detection of most Spanish stanzas:
Cantar
Chamberga
Copla arte mayor
Copla arte menor
Copla castellana
Copla mixta
Copla real
Couplet
Cuaderna vía
Cuarteta
Cuarteto
Cuarteto lira
Décima antigua
Endecha real
Espinela
Estrofa francisco de la torre
Estrofa manriqueña
Estrofa sáfica
Estrofa sáfica unamuno
Haiku
Lira
Novena
Octava
Octava real
Octavilla
Ovillejo
Quinteto
Quintilla
Redondilla
Romance
Romance arte mayor
Seguidilla
Seguidilla compuesta
Seguidilla gitana
Septeto
Septeto lira
Septilla
Serventesio
Sexta rima
Sexteto
Sexteto lira
Sextilla
Silva arromanzada
Soleá
Tercetillo
Terceto
Terceto encadenado
Terceto monorrimo
0.4.3 (2020-03-24)
Added support for filtering consecutive liaisons and syllabification
Added missing documentation
0.4.2 (2020-03-11)
Added documentation
0.4.1 (2019-12-19)
Added ‘AUX’ to the split_on list for spacy affixes
Fixed syllabification exceptions, support for disabling/enabling spacy_affixes
Fixed multiline break
Fixed splitted verb stresses and secondary stress on ‘-mente’ adverbs
Fixed some issues
Added minimum length for ‘-mente’ adverbs
0.4.0 (2019-11-21)
Added SpaCy Doc input support
Add umlaut hyatus
Added new hyatus and fixed init
Refactoring code
Feat/new syllabification
Naming conventions
Adding rhyme analaysis to scansion output
Adding ‘singleton’ behaviour to load_pipeline
Metre analysis w/ sinaeresis and synalephas
Added new workflow for syllabification, with tests
Post syllabification rules regexes
Added unit tests for all functions
0.3.0 (2019-06-18)
Added SpaCy Doc input support
Add umlaut hyatus
Fixed syllabyfication errors, affixes and the pipeline
Fixed hyphenator for diphthongs with u umlaut
Added hyphenation for explicit hyatus with umlaut vowels
Added new hyatus and fixed __init__
0.2.0 (2019-06-14)
Better hyphenator, and affixes and pipeline fixes
0.1.2 (2019-06-10)
Republishing on Pypi
0.1.0 (2019-07-03)
Project name change.
0.0.1 (2019-02-21)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file rantanplan-0.7.1.tar.gz
.
File metadata
- Download URL: rantanplan-0.7.1.tar.gz
- Upload date:
- Size: 807.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e081af9f4b8489fa29345b8f8e55721b5a09013275564987f09f49ed49afd55 |
|
MD5 | 3556e1b9a885564cf7b92ec478ad9e85 |
|
BLAKE2b-256 | 0c8c3747fecd7a664a97b8410ee9034730597d2cac98878ca3d17ae1d714c3f9 |