Skip to main content

Italian ATS Evaluator

Project description

italian-ats-evalautor

This is an open source project to evaluate the performance of an italian ATS (Automatic Text Simplifier) on a set of texts.

You can analyze a single text extracting the following features:

  • Overall:
    • Number of tokens
    • Number of tokens (including punctuation)
    • Number of characters
    • Number of characters (including punctuation)
    • Number of words
    • Number of syllables
    • Number of unique lemmas
    • Number of sentences
  • Readability:
    • Type-Token Ratio (TTR)
    • Gulpease Index
    • Flesch-Vacca Index
    • Lexical Density
  • Part of Speech (POS) distribution
  • Verbs distribution
    • Active Verbs
    • Passive Verbs
  • Italian Basic Vocabulary (NVdB) from Il Nuovo vocabolario di base della lingua italiana, Tullio De Mauro
    • All
    • FO (Fundamentals)
    • AU (High Usage)
    • AD (High Availability)
  • Expression:
    • Difficult connectives
    • Latinisms

You can also compare two texts and get the following metrics:

  • Semantic:
    • Semantic Similarity
  • Character diff:
    • Edit Distance
  • Token diff:
    • Amount of tokens added
    • Amount of tokens removed
    • Amount of VdB tokens removed
    • Amount of VdB tokens added

Installation

pip install italian-ats-evaluator

Usage

from italian_ats_evaluator import TextAnalyzer

result = TextAnalyzer(
  text="Il gatto mangia il topo",
  spacy_model_name="it_core_news_lg"
)
from italian_ats_evaluator import SimplificationAnalyzer

result =  SimplificationAnalyzer(
  reference_text="Il felino mangia il roditore",
  simplified_text="Il gatto mangia il topo",
  spacy_model_name="it_core_news_lg",
  sentence_transformers_model_name="intfloat/multilingual-e5-base"
)

Development

Create a virtual environment

python3 -m venv venv
source venv/bin/activate

Install the package in editable mode

pip install -e .

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Acknowledgements

This contribution is a result of the research conducted within the framework of the PRIN 2020 (Progetti di Rilevante Interesse Nazionale) “VerbACxSS: on analytic verbs, complexity, synthetic verbs, and simplification. For accessibility” (Prot. 2020BJKB9M), funded by the Italian Ministero dell’Università e della Ricerca.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

italian_ats_evaluator-2.0.8.tar.gz (39.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

italian_ats_evaluator-2.0.8-py3-none-any.whl (40.5 kB view details)

Uploaded Python 3

File details

Details for the file italian_ats_evaluator-2.0.8.tar.gz.

File metadata

  • Download URL: italian_ats_evaluator-2.0.8.tar.gz
  • Upload date:
  • Size: 39.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for italian_ats_evaluator-2.0.8.tar.gz
Algorithm Hash digest
SHA256 1a8bc7331d4f555411ff52bc0712cc6d9d9076b9ccda9aec9a216cff5410f408
MD5 97fa908979b2bec336f6aa470c0e76e7
BLAKE2b-256 f2a29fe5706d0b47003483b9966facad4bd88d3dae6f2351507d31a8d6d082ab

See more details on using hashes here.

File details

Details for the file italian_ats_evaluator-2.0.8-py3-none-any.whl.

File metadata

File hashes

Hashes for italian_ats_evaluator-2.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 a9b1839db6eef2d130a0fc75d9d9e97a1dc5311816436b09f33674e04d2c5d6f
MD5 bf8d90b2429706fffd12dfc47b33e98d
BLAKE2b-256 a476fd44ec555263d5961be0fb6484edfcaf3e11dcb540236f784dd36159a064

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page