FreeLing plug-in for Sparv (Språkbanken's corpus annotation pipeline)
Project description
sparv-sbx-freeling
This is a plugin for the Sparv pipeline containing a wrapper for FreeLing. Please observe that this plugin has a more restrictive license than the Sparv piepeline!
This plugin allows you to run the Sparv pipeline and get sentence segmentation, tokenisation, baseform analysis, and part-of-speech annotations for the following languages:
- Asturian
- Catalan
- English
- French
- Galician
- German
- Italian
- Norwegian
- Portuguese
- Russian
- Slovenian
- Spanish
Furthermore Sparv will convert the FreeLing POS-tags into Universal POS tags and output them as a separate annotation.
Some of these languages (Catalan, English, German, Portuguese and Spanish) also support named-entity recognition.
Prerequisites
Installation
Option 1: Installation from pypi with pipx:
pipx inject sparv-pipeline sparv-sbx-freeling
Option 2: Installation from GitHub with pipx:
pipx inject sparv-pipeline https://github.com/spraakbanken/sparv-sbx-freeling/archive/latest.tar.gz
Option 3: Manual download of plugin and installation in your sparv-pipeline virtual environment:
source [path to sparv-pipeline virtual environment]/bin/activate
pip install [path to the downloaded sparv-sbx-freeling directory]
Usage
The Sparv pipeline needs a config file describing your corpus and the desired output format. Please refer to the Sparv pipeline user manual for more details on config files and running Sparv.
Example input:
<text title="Example">
This is an example for how to run Sparv.
</text>
Example command for creating xml with annotations:
sparv run
Result file:
<?xml version="1.0" encoding="UTF-8"?>
<text lix="20.00" title="Example">
<sentence>
<token baseform="this" pos="DT" upos="DET">This</token>
<token baseform="be" pos="VBZ" upos="VERB">is</token>
<token baseform="a" pos="DT" upos="DET">an</token>
<token baseform="example" pos="NN" upos="NOUN">example</token>
<token baseform="for" pos="IN" upos="ADP">for</token>
<token baseform="how" pos="WRB" upos="ADV">how</token>
<token baseform="to" pos="TO" upos="PART">to</token>
<token baseform="run" pos="VB" upos="VERB">run</token>
<token baseform="sparv" ne_type="person" pos="NP00SP0" upos="PROPN">Sparv</token>
<token baseform="." pos="Fp" upos="PUNCT">.</token>
</sentence>
</text>
Additional Info about Annotations
A full list of what analyses are supported for what languages can be found here:
https://freeling-user-manual.readthedocs.io/en/latest/basics/#supported-languages
Integrating dependency parsing
FreeLing supports dependency parsing for some languages. The output format is a bit cumbersome though.
Input:
This is a sentence.
Output:
DT/top/(This this DT -) [
vb-be/modnorule/(is be VBZ -)
sn-chunk/modnorule/(sentence sentence NN -) [
DT/det/(a a DT -)
]
st-brk/modnorule/(. . Fp -)
]
It is possible to write a new parser to handle this format but so far this has not been a priority for us.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sparv_sbx_freeling-5.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | de4d2130a6d0c8d1248c5733fe6f5a0fcefcb180a9f4e4f48bfd8d1129be0a7f |
|
MD5 | bb70623a1a83cc8057440ec9564fe1fa |
|
BLAKE2b-256 | 5bbc1818d48c3a8539636d1a25ea1322812d372d977985f49d389b9dba3b50ed |