Skip to main content

Natural language processing for Icelandic

Project description

superlinter License: AGPL v3


Greynir

GreynirSeq

GreynirSeq is a natural language parsing toolkit for Icelandic focused on sequence modeling with neural networks. It is under active development and is in its early stages.

The modeling part (nicenlp) of GreynirSeq is built on top of the excellent fairseq from Facebook (which is built on top of pytorch).

GreynirSeq is licensed under the GNU AFFERO GPLv3 license unless otherwise stated at the top of a file.

What's new?

  • This repository!
  • An Icelandic RoBERTa model, IceBERT finetuned for NER and POS tagging.

What's on the horizon?

  • More fine tuning tasks for Icelandic, constituency parsing and grammatical error detection
  • Icelandic - English translation example

Be aware that usage of the CLI or otherwise downloading model files will result in downloading of gigabytes of data.

Features

TL;DR give me the CLI

The greynirseq CLI interface can be used to run state-of-the-art POS and NER tagging for Icelandic. Run pip install greynirseq && greynirseq -h to see what options are available.

POS

 pip install greynirseq
❯ echo "Systurnar Guðrún og Monique átu einar um jólin á McDonalds ." | greynirseq pos --input -

nvfng nven-s c ns sfg3fþ lvfnsf aff nhfog aff ns pl

NER

 pip install greynirseq
❯ echo "Systurnar Guðrún og Monique átu einar um jólin á McDonalds ." | greynirseq ner --input -

O B-Person O B-Person O O O O O B-Organization O

Neural Icelandic Language Processing - NIceNLP

IceBERT is an Icelandic BERT-based (RoBERTa) language model that is suitable for fine tuning on downstream tasks.

The following fine tuning tasks are available both through the greynirseq CLI and for loading programmatically.

  1. POS tagging
  2. NER tagging

Installation

From python packaging index

In a suitable virtual environment

pip install greynirseq

Development

To install GreynirSeq in development mode we recommend using poetry as shown below

pip install poetry && poetry install

Development

Linting

All code is checked with Super-Linter in a GitHub Action, we recommend running it locally before pushing

docker run -e RUN_LOCAL=true -v /path/to/local/GreynirSeq:/tmp/lint github/super-linter

Type annotation

Type annotation will soon be checked with mypy and should be included.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

greynirseq-0.1.tar.gz (416.1 kB view hashes)

Uploaded Source

Built Distribution

greynirseq-0.1-py3-none-any.whl (472.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page