Skip to main content

FreeLing plug-in for Sparv (Språkbanken's corpus annotation pipeline)

Project description

sparv-freeling

This is a plugin for the Sparv pipeline containing a wrapper for FreeLing. Please observe that this plugin has a more restrictive license than the Sparv piepeline!

This plugin allows you to run the Sparv pipeline and get sentence segmentation, tokenisation, baseform analysis, and part-of-speech annotations for the following languages:

  • Asturian
  • Catalan
  • English
  • French
  • Galician
  • German
  • Italian
  • Norwegian
  • Portuguese
  • Russian
  • Slovenian
  • Spanish

Furthermore Sparv will convert the FreeLing POS-tags into Universal POS tags and output them as a separate annotation.

Some of these languages (Catalan, English, German, Portuguese and Spanish) also support named-entity recognition.

Prerequisites

Installation

Option 1: Installation from pypi with pipx:

pipx inject sparv-pipeline sparv-freeling

Option 2: Installation from GitHub with pipx:

pipx inject sparv-pipeline https://github.com/spraakbanken/sparv-freeling/archive/latest.tar.gz

Option 3: Manual download of plugin and installation in your sparv-pipeline virtual environment:

source [path to sparv-pipeline virtual environment]/bin/activate
pip install [path to the downloaded sparv-freeling directory]

Usage

The Sparv pipeline needs a config file describing your corpus and the desired output format. Please refer to the Sparv pipeline user manual for more details on config files and running Sparv.

Example input:

<text title="Example">
  This is an example for how to run Sparv.
</text>

Example command for creating xml with annotations:

sparv run

Result file:

<?xml version="1.0" encoding="UTF-8"?>
<text lix="20.00" title="Example">
  <sentence>
    <token baseform="this" pos="DT" upos="DET">This</token>
    <token baseform="be" pos="VBZ" upos="VERB">is</token>
    <token baseform="a" pos="DT" upos="DET">an</token>
    <token baseform="example" pos="NN" upos="NOUN">example</token>
    <token baseform="for" pos="IN" upos="ADP">for</token>
    <token baseform="how" pos="WRB" upos="ADV">how</token>
    <token baseform="to" pos="TO" upos="PART">to</token>
    <token baseform="run" pos="VB" upos="VERB">run</token>
    <token baseform="sparv" ne_type="person" pos="NP00SP0" upos="PROPN">Sparv</token>
    <token baseform="." pos="Fp" upos="PUNCT">.</token>
  </sentence>
</text>

Additional Info about Annotations

A full list of what analyses are supported for what languages can be found here:

https://freeling-user-manual.readthedocs.io/en/latest/basics/#supported-languages

Integrating dependency parsing

FreeLing supports dependency parsing for some languages. The output format is a bit cumbersome though.

Input:

This is a sentence.

Output:

DT/top/(This this DT -) [
  vb-be/modnorule/(is be VBZ -)
  sn-chunk/modnorule/(sentence sentence NN -) [
    DT/det/(a a DT -)
  ]
  st-brk/modnorule/(. . Fp -)
]

It is possible to write a new parser to handle this format but so far this has not been a priority for us.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparv-freeling-4.0.1.tar.gz (20.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sparv_freeling-4.0.1-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file sparv-freeling-4.0.1.tar.gz.

File metadata

  • Download URL: sparv-freeling-4.0.1.tar.gz
  • Upload date:
  • Size: 20.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/56.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.6

File hashes

Hashes for sparv-freeling-4.0.1.tar.gz
Algorithm Hash digest
SHA256 0465d08210da36a8022ee3c3dea59bc0b8671ce36a8b1be1872eadcb5d0b95b6
MD5 e93c01d6e1aae98d38636e33913a0383
BLAKE2b-256 805948259adebeaeaf9339fb505bdc5b3bfaa1785449001ee52e50b413886358

See more details on using hashes here.

File details

Details for the file sparv_freeling-4.0.1-py3-none-any.whl.

File metadata

  • Download URL: sparv_freeling-4.0.1-py3-none-any.whl
  • Upload date:
  • Size: 19.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/56.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.6

File hashes

Hashes for sparv_freeling-4.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9627c5b693ddde49b20c9419bd8cf8ae01e210170b2212951c2256e8b7d5be65
MD5 e47d33582c0b30292a14d540204bffd8
BLAKE2b-256 fd785789fbaa7523a3effc0b3e4a8feae3bb0a2a0054da6c2d8607b2b62877a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page