A natural language parser for Icelandic
Project description
A fast, efficient natural language processor for Icelandic
Overview
Greynir is a Python 3.x package for working with Icelandic natural language text. Greynir can parse text into sentence trees, find lemmas, inflect noun phrases, assign part-of-speech tags and much more.
Greynir’s sentence trees can inter alia be used to extract information from text, for instance about people, titles, entities, facts, actions and opinions.
Full documentation for Greynir is available here.
Greynir is the engine of Greynir.is, a natural-language front end for a database of 9 million sentences parsed from Icelandic news articles, and Embla, a natural-language query app for smart phones.
Greynir uses the Tokenizer package, by the same authors, to tokenize text.
Examples
Use Greynir to easily inflect noun phrases:
from reynir import NounPhrase as Nl
# Create a NounPhrase ('nafnliður') object
nl = Nl("þrír lúxus-miðar á Star Wars og tveir brimsaltir pokar af poppi")
# Print the NounPhrase in the correct case for each context
# (þf=þolfall/accusative, þgf=þágufall/dative)
print("Þú keyptir {nl:þf}.".format(nl=nl))
print("Hér er kvittunin þín fyrir {nl:þgf}.".format(nl=nl))
The program outputs the following text, correctly inflected:
Þú keyptir þrjá lúxus-miða á Star Wars og tvo brimsalta poka af poppi. Hér er kvittunin þín fyrir þremur lúxus-miðum á Star Wars og tveimur brimsöltum pokum af poppi.
Use Greynir to parse a sentence:
>>> from reynir import Greynir
>>> g = Greynir()
>>> sent = g.parse_single("Ása sá sól.")
>>> print(sent.tree.view)
P # Root
+-S-MAIN # Main sentence
+-IP # Inflected phrase
+-NP-SUBJ # Noun phrase, subject
+-no_et_nf_kvk: 'Ása' # Noun, singular, nominative, feminine
+-VP # Verb phrase containing arguments
+-VP # Verb phrase containing verb
+-so_1_þf_et_p3: 'sá' # Verb, 1 accusative arg, singular, 3rd p
+-NP-OBJ # Noun phrase, object
+-no_et_þf_kvk: 'sól' # Noun, singular, accusative, feminine
+-'.' # Punctuation
>>> sent.tree.nouns
['Ása', 'sól']
>>> sent.tree.verbs
['sjá']
>>> sent.tree.flat
'P S-MAIN IP NP-SUBJ no_et_nf_kvk /NP-SUBJ VP so_1_þf_et_p3
NP-OBJ no_et_þf_kvk /NP-OBJ /VP /IP /S-MAIN p /P'
>>> # The subject noun phrase (S.IP.NP also works)
>>> sent.tree.S.IP.NP_SUBJ.lemmas
['Ása']
>>> # The verb phrase
>>> sent.tree.S.IP.VP.lemmas
['sjá', 'sól']
>>> # The object within the verb phrase (S.IP.VP.NP also works)
>>> sent.tree.S.IP.VP.NP_OBJ.lemmas
['sól']
Prerequisites
This package runs on CPython 3.5 or newer, and on PyPy 3.5 or newer.
To find out which version of Python you have, enter:
$ python --version
If a binary wheel package isn’t available on PyPi for your system, you may need to have the python3-dev package (or its Windows equivalent) installed on your system to set up Greynir successfully. This is because a source distribution install requires a C++ compiler and linker:
$ # Debian or Ubuntu: $ sudo apt-get install python3-dev
Depending on your system, you may also need to install libffi-dev:
$ # Debian or Ubuntu $ sudo apt-get install libffi-dev
Installation
To install this package, assuming Python 3 is your default Python:
$ pip install reynir
If you have git and git-lfs installed and want to be able to edit the source, do like so:
$ git clone https://github.com/mideind/ReynirPackage $ cd ReynirPackage $ # [ Activate your virtualenv here if you have one ] $ git lfs install $ git pull $ pip install -e .
The package source code is now in ReynirPackage/src/reynir.
Note that git-lfs is required to clone and pull the full compressed binary files for the Beygingarlýsing íslensks nútímamáls (BÍN) database. If it is missing, you will get assertion errors when you try to run Greynir.
Tests
To run the built-in tests, install pytest, cd to your ReynirPackage subdirectory (and optionally activate your virtualenv), then run:
$ python -m pytest
Documentation
Please consult Greynir’s documentation for detailed installation instructions, a quickstart guide, and reference information, as well as important information about copyright and licensing.
Copyright and licensing
Greynir is copyright (C) 2020 by Miðeind ehf. The original author of this software is Vilhjálmur Þorsteinsson.
This set of programs is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This set of programs is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
The full text of the GNU General Public License v3 is included here and also available here: https://www.gnu.org/licenses/gpl-3.0.html.
If you would like to use this software in ways that are incompatible with the standard GNU GPLv3 license, please contact Miðeind ehf. to negotiate alternative arrangements.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for reynir-2.1.0-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a3fe1e0f11bfae7f3da7c2cde2cfb76c96a0d6e8f8eeb46c642ac9c152ccd3c3 |
|
MD5 | a43cac6999c979380d3626e34eae7f23 |
|
BLAKE2b-256 | d6cf0fa1e30d0d51d818db49349d5e1ad20ec74d0872822a9fd489bd8817ed3a |
Hashes for reynir-2.1.0-cp37-cp37m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 17272c6715e53bbf3316f25cd6d3f947282dcc1aaf8b23529102b4ee94c46841 |
|
MD5 | a19e9203e4af6067cebfae061f6c7ecc |
|
BLAKE2b-256 | 3f1d8c831e36efbb6fc57c2d85c835a9909bb8f459f076d851d988fe0e56c6a1 |
Hashes for reynir-2.1.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1395175abe846d3834c207cc3ca24166dcb7a79799b9898a3cd263d476d4f72f |
|
MD5 | 18eff7bd1260168705711593bf93a8da |
|
BLAKE2b-256 | a07a6ccd1689cd7f1a63a5bea30e355459d25b5dbdf248aeeb4422e4a090a0f8 |
Hashes for reynir-2.1.0-cp35-cp35m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6fcbc0b590b1540d5d242485a12f967e63775c90af28200293335d373d4a256f |
|
MD5 | 781fdb17d0e5a2ef378cec63b69be5b2 |
|
BLAKE2b-256 | 2b66c330327149db6c519ec686f8ebea32b323e3d630ffaccb87b3ce91685e3f |