FreeLing plug-in for Sparv (Språkbanken's corpus annotation pipeline)
Project description
sparv-freeling
This is a plugin for the Sparv pipeline containing a wrapper for FreeLing. Please observe that this plugin has a more restrictive license than the Sparv piepeline!
This plugin allows you to run the Sparv pipeline and get sentence segmentation, tokenisation, baseform analysis, and part-of-speech annotations for the following languages:
- Asturian
- Catalan
- English
- French
- Galician
- German
- Italian
- Norwegian
- Portuguese
- Russian
- Slovenian
- Spanish
Furthermore Sparv will convert the FreeLing POS-tags into Universal POS tags and output them as a separate annotation.
Some of these languages (Catalan, English, German, Portuguese and Spanish) also support named-entity recognition.
Prerequisites
Installation
Option 1: Installation from pypi with pipx:
pipx inject sparv-pipeline sparv-freeling
Option 2: Installation from GitHub with pipx:
pipx inject sparv-pipeline https://github.com/spraakbanken/sparv-freeling/archive/latest.tar.gz
Option 3: Manual download of plugin and installation in your sparv-pipeline virtual environment:
source [path to sparv-pipeline virtual environment]/bin/activate
pip install [path to the downloaded sparv-freeling directory]
Usage
The Sparv pipeline needs a config file describing your corpus and the desired output format. Please refer to the Sparv pipeline user manual for more details on config files and running Sparv.
Example input:
<text title="Example">
This is an example for how to run Sparv.
</text>
Example command for creating xml with annotations:
sparv run
Result file:
<?xml version="1.0" encoding="UTF-8"?>
<text lix="20.00" title="Example">
<sentence>
<token baseform="this" pos="DT" upos="DET">This</token>
<token baseform="be" pos="VBZ" upos="VERB">is</token>
<token baseform="a" pos="DT" upos="DET">an</token>
<token baseform="example" pos="NN" upos="NOUN">example</token>
<token baseform="for" pos="IN" upos="ADP">for</token>
<token baseform="how" pos="WRB" upos="ADV">how</token>
<token baseform="to" pos="TO" upos="PART">to</token>
<token baseform="run" pos="VB" upos="VERB">run</token>
<token baseform="sparv" ne_type="person" pos="NP00SP0" upos="PROPN">Sparv</token>
<token baseform="." pos="Fp" upos="PUNCT">.</token>
</sentence>
</text>
Additional Info about Annotations
A full list of what analyses are supported for what languages can be found here:
https://freeling-user-manual.readthedocs.io/en/latest/basics/#supported-languages
Integrating dependency parsing
FreeLing supports dependency parsing for some languages. The output format is a bit cumbersome though.
Input:
This is a sentence.
Output:
DT/top/(This this DT -) [
vb-be/modnorule/(is be VBZ -)
sn-chunk/modnorule/(sentence sentence NN -) [
DT/det/(a a DT -)
]
st-brk/modnorule/(. . Fp -)
]
It is possible to write a new parser to handle this format but so far this has not been a priority for us.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for sparv_freeling-4.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5ca795649c7b50d466abb9bfc81882761d3ea826c13f5a9fb3639a0af733243 |
|
MD5 | 251f1d2e7fb51f49c838605b600e98e0 |
|
BLAKE2b-256 | f21b97808076903b0f5247c700589a45ee8b866fb622e5556091ac5ab19ccca1 |
Hashes for sparv_freeling-4.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6be9b52ac2a125eabf409956569ed0e8b43453ada9f28c1ea184baa3b728e17 |
|
MD5 | 5536c1eca839c79520d86eea7a8059d3 |
|
BLAKE2b-256 | 2c548101f5e8ad43430da8797370b6672ee8542d30dfa20300fb6d2b91ad9392 |