Italian ATS Evaluator
Project description
italian-ats-evalautor
This is an open source project to evaluate the performance of an italian ATS (Automatic Text Simplifier) on a set of texts.
You can analyze a single text extracting the following features:
- Overall:
- Number of tokens
- Number of tokens (including punctuation)
- Number of characters
- Number of characters (including punctuation)
- Number of words
- Number of syllables
- Number of unique lemmas
- Number of sentences
- Part of Speech (POS) distribution
- Verbs distribution
- Active Verbs
- Passive Verbs
- Reflective Verbs
- Lexicon:
- Italian Basic Vocabulary (NVdB)
from Il Nuovo vocabolario di base della lingua italiana, Tullio De Mauro
- All
- FO (Fundamentals)
- AU (High Usage)
- AD (High Availability)
- Difficult connectives
- Juridical expressions
- Latinisms
- Italian Basic Vocabulary (NVdB)
from Il Nuovo vocabolario di base della lingua italiana, Tullio De Mauro
- Readability:
- Type-Token Ratio (TTR)
- Gulpease Index
- Flesch-Vacca Index
- Lexical Density
You can also compare two texts and get the following metrics:
- Semantic:
- Semantic Similarity
- Character diff:
- Edit Distance
- Token diff:
- Amount of tokens added
- Amount of tokens removed
- Amount of VdB tokens removed
- Amount of VdB tokens added
Installation
pip install italian-ats-evaluator
Usage
Create the TextAnalyzer and SimplificationAnalyzer objects with the desired models.
from italian_ats_evaluator import TextAnalyzer
from italian_ats_evaluator import SimplificationAnalyzer
text_analyzer = TextAnalyzer(
spacy_model_name="it_core_news_lg"
)
simplification_analyzer = SimplificationAnalyzer(
spacy_model_name="it_core_news_lg",
sentence_transformers_model_name="intfloat/multilingual-e5-base"
)
Call the analyze method on the TextAnalyzer object to evaluate the features of a text.
text_evaluation = text_analyzer.analyze("Il gatto mangia il topo.")
print(text_evaluation)
Call the analyze method on the SimplificationAnalyzer object to evaluate the features of two texts.
simplification_evaluation = simplification_analyzer.analyze(
reference_text="Il felino mangia il roditore",
simplified_text="Il gatto mangia il topo"
)
print(simplification_evaluation)
Development
Create a virtual environment
python3 -m venv venv
source venv/bin/activate
Install the package in editable mode
pip install -e .
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Acknowledgements
This contribution is a result of the research conducted within the framework of the PRIN 2020 (Progetti di Rilevante Interesse Nazionale) “VerbACxSS: on analytic verbs, complexity, synthetic verbs, and simplification. For accessibility” (Prot. 2020BJKB9M), funded by the Italian Ministero dell’Università e della Ricerca.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file italian_ats_evaluator-3.0.1.tar.gz.
File metadata
- Download URL: italian_ats_evaluator-3.0.1.tar.gz
- Upload date:
- Size: 161.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e2376819eb9cffa173988542bdef35a7d070428fc629ed944b067152119d8006
|
|
| MD5 |
3ec2e8ab748600831ebd13c65e2b9a3f
|
|
| BLAKE2b-256 |
858fc6fe0e8a068e7b47ff6a4fa1d2cfc829764e7012256f1b5dd0278a3aec04
|
File details
Details for the file italian_ats_evaluator-3.0.1-py3-none-any.whl.
File metadata
- Download URL: italian_ats_evaluator-3.0.1-py3-none-any.whl
- Upload date:
- Size: 166.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2dd14c81525193c0f05a1045cf961adb3fc6f60441647d241824e82ae5df8b6b
|
|
| MD5 |
181811d094a0857f3823e1d2579fd5ac
|
|
| BLAKE2b-256 |
a5909041dbdd4a48efc9d26981048fe73995abf53ba2c3305a14be7b610af5e6
|