Skip to main content

FreeLing plug-in for Sparv (Språkbanken's corpus annotation pipeline)

Project description

sparv-sbx-freeling

This is a plugin for the Sparv pipeline containing a wrapper for FreeLing. Please observe that this plugin has a more restrictive license than the Sparv piepeline!

This plugin allows you to run the Sparv pipeline and get sentence segmentation, tokenisation, baseform analysis, and part-of-speech annotations for the following languages:

  • Asturian
  • Catalan
  • English
  • French
  • Galician
  • German
  • Italian
  • Norwegian
  • Portuguese
  • Russian
  • Slovenian
  • Spanish

Furthermore Sparv will convert the FreeLing POS-tags into Universal POS tags and output them as a separate annotation.

Some of these languages (Catalan, English, German, Portuguese and Spanish) also support named-entity recognition.

Prerequisites

Installation

Option 1: Installation from pypi with pipx:

pipx inject sparv-pipeline sparv-sbx-freeling

Option 2: Installation from GitHub with pipx:

pipx inject sparv-pipeline https://github.com/spraakbanken/sparv-sbx-freeling/archive/latest.tar.gz

Option 3: Manual download of plugin and installation in your sparv-pipeline virtual environment:

source [path to sparv-pipeline virtual environment]/bin/activate
pip install [path to the downloaded sparv-sbx-freeling directory]

Usage

The Sparv pipeline needs a config file describing your corpus and the desired output format. Please refer to the Sparv pipeline user manual for more details on config files and running Sparv.

Example input:

<text title="Example">
  This is an example for how to run Sparv.
</text>

Example command for creating xml with annotations:

sparv run

Result file:

<?xml version="1.0" encoding="UTF-8"?>
<text lix="20.00" title="Example">
  <sentence>
    <token baseform="this" pos="DT" upos="DET">This</token>
    <token baseform="be" pos="VBZ" upos="VERB">is</token>
    <token baseform="a" pos="DT" upos="DET">an</token>
    <token baseform="example" pos="NN" upos="NOUN">example</token>
    <token baseform="for" pos="IN" upos="ADP">for</token>
    <token baseform="how" pos="WRB" upos="ADV">how</token>
    <token baseform="to" pos="TO" upos="PART">to</token>
    <token baseform="run" pos="VB" upos="VERB">run</token>
    <token baseform="sparv" ne_type="person" pos="NP00SP0" upos="PROPN">Sparv</token>
    <token baseform="." pos="Fp" upos="PUNCT">.</token>
  </sentence>
</text>

Additional Info about Annotations

A full list of what analyses are supported for what languages can be found here:

https://freeling-user-manual.readthedocs.io/en/latest/basics/#supported-languages

Integrating dependency parsing

FreeLing supports dependency parsing for some languages. The output format is a bit cumbersome though.

Input:

This is a sentence.

Output:

DT/top/(This this DT -) [
  vb-be/modnorule/(is be VBZ -)
  sn-chunk/modnorule/(sentence sentence NN -) [
    DT/det/(a a DT -)
  ]
  st-brk/modnorule/(. . Fp -)
]

It is possible to write a new parser to handle this format but so far this has not been a priority for us.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparv-sbx-freeling-5.2.0.tar.gz (21.4 kB view details)

Uploaded Source

Built Distribution

sparv_sbx_freeling-5.2.0-py3-none-any.whl (20.8 kB view details)

Uploaded Python 3

File details

Details for the file sparv-sbx-freeling-5.2.0.tar.gz.

File metadata

  • Download URL: sparv-sbx-freeling-5.2.0.tar.gz
  • Upload date:
  • Size: 21.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for sparv-sbx-freeling-5.2.0.tar.gz
Algorithm Hash digest
SHA256 d6b71f9679ed830dcddf6d875a2dc03708ef847ff45288a656be3f55e378cf51
MD5 af6634513856821332fd4427de0f996c
BLAKE2b-256 6ba5ab62a336c3e641d74ec53e3a9f35346e2daada8a92f087baaeaa39198f24

See more details on using hashes here.

File details

Details for the file sparv_sbx_freeling-5.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sparv_sbx_freeling-5.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 201028620bc4d47a6249f56b5ca911ed31b1b44751bda8946448b7c596cd147a
MD5 495d6caeb9324cc440950eec8814f9e8
BLAKE2b-256 1adddc29a93b7e33e4a83ac01d722019cd3ec55417758ecb047ad5e733e36912

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page