The state-of-the-art NLP toolkit for (modern) Greek

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: Apache Software License
Natural Language
- Greek
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Text Processing :: Linguistic

Project description

`gr-nlp-toolkit`

gr-nlp-toolkit Logo

gr-nlp-toolkit is a Python toolkit with state-of-the-art performance in (modern) Greek, supporting the following functionalities:

Named Entity Recognition (NER)
Part-of-Speech Tagging (POS Tagging)
Morphological tagging
Dependency parsing
Greeklish to Greek transliteration ("kalimera" -> "καλημερα")

Web Demo 🤗

Apart from the python library (details below), you can also interact with gr-nlp-toolkit in a no-code fashion by visiting our web playground here: https://huggingface.co/spaces/AUEB-NLP/greek-nlp-toolkit-demo

Thanks to HuggingFace 🤗 for the GPUs.

Installation

The toolkit is supported for Python 3.9+.

You can install it from PyPI by executing the following in the command line:

pip install gr-nlp-toolkit

Usage

Available Processors/Features

To use the toolkit, first initialize a Pipeline specifying which task processors you need. Each processor annotates the text with a specific task's annotations.

For example:

To obtain Part-of-Speech and Morphological Tagging annotations, add the pos processor
To obtain Named Entity Recognition annotations, add the ner processor
To obtain Dependency Parsing annotations, add the dp processor
To enable the transliteration from Greeklish to Greek, add the g2g processor or the g2g_lite processor for a lighter but less accurate model (Greeklish to Greek transliteration example: "thessalonikh" -> "θεσσαλονίκη")

Example Usage Scenarios

DP, POS, NER processors (input text in Greek)

from gr_nlp_toolkit import Pipeline

nlp = Pipeline("pos,ner,dp")  # Instantiate the Pipeline with the DP, POS and NER processors
doc = nlp("Η Ιταλία κέρδισε την Αγγλία στον τελικό του Euro 2020.") # Apply the pipeline to a sentence in Greek

A Document object is created and is annotated. The original text is tokenized and split to tokens

# Iterate over the generated tokens
for token in doc.tokens:
  print(token.text) # the text of the token

  print(token.ner) # the named entity label in IOBES encoding : str

  print(token.upos) # the UPOS tag of the token
  print(token.feats) # the morphological features for the token

  print(token.head) # the head of the token
  print(token.deprel) # the dependency relation between the current token and its head

token.ner is set by the ner processor, token.upos and token.feats are set by the pos processor and token.head and token.deprel are set by the dp processor.

A small detail is that to get the Token object that is the head of another token you need to access doc.tokens[head-1]. The reason for this is that the enumeration of the tokens starts from 1 and when the field token.head is set to 0, that means the token is the root of the sentence.

Greeklish to Greek Conversion (input text in Greeklish)

from gr_nlp_toolkit import Pipeline
nlp  = Pipeline("g2g")  # Instantiate the pipeline with the g2g processor

doc = nlp("O Volos kai h Larisa einai sth Thessalia") # Apply the pipeline to a sentence in Greeklish
print(doc.text) # Access the transliterated text, which is "ο Βόλος και η Λάρισα είναι στη Θεσσαλία"

Use all the processors together (input text in Greeklish)

from gr_nlp_toolkit import Pipeline
nlp = Pipeline("pos,ner,dp,g2g")  # Instantiate the Pipeline with the G2G, DP, POS and NER processors

doc = nlp("O Volos kai h Larisa einai sthn Thessalia") # Apply the pipeline to a sentence in Greeklish

print(doc.text) # Print the transliterated text

# Iterate over the generated tokens
for token in doc.tokens:
  print(token.text) # the text of the token

  print(token.ner) # the named entity label in IOBES encoding : str

  print(token.upos) # the UPOS tag of the token
  print(token.feats) # the morphological features for the token

  print(token.head) # the head of the token
  print(token.deprel) # the dependency relation between the current token and its head

Paper

The software was presented as a paper at COLING 2025. Read the full technical report/paper here: https://aclanthology.org/2025.coling-demos.17/

If you use our toolkit, please cite it:

@inproceedings{loukas-etal-coling2025-greek-nlp-toolkit,
    title = "{GR}-{NLP}-{TOOLKIT}: An Open-Source {NLP} Toolkit for {M}odern {G}reek",
    author = "Loukas, Lefteris  and
      Smyrnioudis, Nikolaos  and
      Dikonomaki, Chrysa  and
      Barbakos, Spiros  and
      Toumazatos, Anastasios  and
      Koutsikakis, John  and
      Kyriakakis, Manolis  and
      Georgiou, Mary  and
      Vassos, Stavros  and
      Pavlopoulos, John  and
      Androutsopoulos, Ion",
    editor = "Rambow, Owen  and
      Wanner, Leo  and
      Apidianaki, Marianna  and
      Al-Khalifa, Hend  and
      Eugenio, Barbara Di  and
      Schockaert, Steven  and
      Mather, Brodie  and
      Dras, Mark",
    booktitle = "Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations",
    month = jan,
    year = "2025",
    address = "Abu Dhabi, UAE",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.coling-demos.17/",
    pages = "174--182",
}

Technical Notes

The first time you use a processor, the models are downloaded from Hugging Face and stored into the .cache folder. The NER, DP and POS processors are each about 500 MB, while the G2G processor is about 1.2 GB in size.
If the input text is already in Greek, the G2G (Greeklish-to-Greek) processor is skipped.
If your machine has an accelerator but you want to run the process on the CPU, you can pass the flag use_cpu=True to the Pipeline object. By default, use_cpu is set to False.
The Greeklish-to-Greek transliteration processor (ByT5) weights can be found in HuggingFace: https://huggingface.co/AUEB-NLP/ByT5_g2g
The NER/POS/DP processors/weights can be found in HuggingFace: https://huggingface.co/AUEB-NLP/gr-nlp-toolkit

References

While many methodology details are shared in the GR-NLP-TOOLKIT paper publication @ COLING 2025 (see above), additional research details can be found here:

C. Dikonimaki, "A Transformer-based natural language processing toolkit for Greek -- Part of speech tagging and dependency parsing", BSc thesis, Department of Informatics, Athens University of Economics and Business, 2021. http://nlp.cs.aueb.gr/theses/dikonimaki_bsc_thesis.pdf (POS/DP/Morphological tagging processor)
N. Smyrnioudis, "A Transformer-based natural language processing toolkit for Greek -- Named entity recognition and multi-task learning", BSc thesis, Department of Informatics, Athens University of Economics and Business, 2021. http://nlp.cs.aueb.gr/theses/smyrnioudis_bsc_thesis.pdf (NER processor)
A. Toumazatos, J. Pavlopoulos, I. Androutsopoulos, & S. Vassos, "Still All Greeklish to Me: Greeklish to Greek Transliteration." In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 15309–15319). https://aclanthology.org/2024.lrec-main.1330/ (Greeklish-to-Greek processor)

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: Apache Software License
Natural Language
- Greek
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Text Processing :: Linguistic

Release history Release notifications | RSS feed

This version

0.3.0

Apr 18, 2026

0.2.1

Jan 8, 2025

0.2.0

Jan 8, 2025

0.1.5

Aug 29, 2024

0.1.4

Aug 29, 2024

0.1.3

Aug 28, 2024

0.1.2

Aug 25, 2024

0.1.1

Aug 25, 2024

0.1.0

Aug 22, 2024

0.0.4

Aug 22, 2024

0.0.3

Jul 22, 2021

0.0.2

Jul 22, 2021

0.0.1

Jul 20, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gr_nlp_toolkit-0.3.0.tar.gz (34.7 kB view details)

Uploaded Apr 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gr_nlp_toolkit-0.3.0-py3-none-any.whl (46.4 kB view details)

Uploaded Apr 18, 2026 Python 3

File details

Details for the file gr_nlp_toolkit-0.3.0.tar.gz.

File metadata

Download URL: gr_nlp_toolkit-0.3.0.tar.gz
Upload date: Apr 18, 2026
Size: 34.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.0

File hashes

Hashes for gr_nlp_toolkit-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`ac065afddc33ec447e413c8f990efad3d6abf571776b9fa00fa935aa68f01504`
MD5	`1176d241513c4f387a5469bb8bcdf6af`
BLAKE2b-256	`bf05a36b0aec9216db9854b2f1a0ac74d9b02495666f97bb79fa90f4463bf8bd`

See more details on using hashes here.

File details

Details for the file gr_nlp_toolkit-0.3.0-py3-none-any.whl.

File metadata

Download URL: gr_nlp_toolkit-0.3.0-py3-none-any.whl
Upload date: Apr 18, 2026
Size: 46.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.0

File hashes

Hashes for gr_nlp_toolkit-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f250d5f8fd736e3250147d65af0083431736135cff8bbbe1fed3b7d2f8a44220`
MD5	`b425e13df5627dade8a2fbe08e27e69a`
BLAKE2b-256	`28fa05ae2abbd28b609fbbcf137ad816c2785bae1787e27d1ecf8ad0118c4473`

See more details on using hashes here.

gr-nlp-toolkit 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

`gr-nlp-toolkit`

Web Demo 🤗

Installation

Usage

Available Processors/Features

Example Usage Scenarios

Paper

Technical Notes

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes