Skip to main content

FreeLing plug-in for Sparv (Språkbanken's corpus annotation pipeline)

Project description

sparv-freeling

This is a plugin for the Sparv pipeline containing a wrapper for FreeLing. Please observe that this plugin has a more restrictive license than the Sparv piepeline!

This plugin allows you to run the Sparv pipeline and get sentence segmentation, tokenisation, baseform analysis, and part-of-speech annotations for the following languages:

  • Asturian
  • Catalan
  • English
  • French
  • Galician
  • German
  • Italian
  • Norwegian
  • Portuguese
  • Russian
  • Slovenian
  • Spanish

Furthermore Sparv will convert the FreeLing POS-tags into Universal POS tags and output them as a separate annotation.

Some of these languages (Catalan, English, German, Portuguese and Spanish) also support named-entity recognition.

Prerequisites

Installation

Option 1: Installation from pypi with pipx:

pipx inject sparv-pipeline sparv-freeling

Option 2: Installation from GitHub with pipx:

pipx inject sparv-pipeline https://github.com/spraakbanken/sparv-freeling/archive/latest.tar.gz

Option 3: Manual download of plugin and installation in your sparv-pipeline virtual environment:

source [path to sparv-pipeline virtual environment]/bin/activate
pip install [path to the downloaded sparv-freeling directory]

Usage

The Sparv pipeline needs a config file describing your corpus and the desired output format. Please refer to the Sparv pipeline user manual for more details on config files and running Sparv.

Example input:

<text title="Example">
  This is an example for how to run Sparv.
</text>

Example command for creating xml with annotations:

sparv run

Result file:

<?xml version="1.0" encoding="UTF-8"?>
<text lix="20.00" title="Example">
  <sentence>
    <token baseform="this" pos="DT" upos="DET">This</token>
    <token baseform="be" pos="VBZ" upos="VERB">is</token>
    <token baseform="a" pos="DT" upos="DET">an</token>
    <token baseform="example" pos="NN" upos="NOUN">example</token>
    <token baseform="for" pos="IN" upos="ADP">for</token>
    <token baseform="how" pos="WRB" upos="ADV">how</token>
    <token baseform="to" pos="TO" upos="PART">to</token>
    <token baseform="run" pos="VB" upos="VERB">run</token>
    <token baseform="sparv" ne_type="person" pos="NP00SP0" upos="PROPN">Sparv</token>
    <token baseform="." pos="Fp" upos="PUNCT">.</token>
  </sentence>
</text>

Additional Info about Annotations

A full list of what analyses are supported for what languages can be found here:

https://freeling-user-manual.readthedocs.io/en/latest/basics/#supported-languages

Integrating dependency parsing

FreeLing supports dependency parsing for some languages. The output format is a bit cumbersome though.

Input:

This is a sentence.

Output:

DT/top/(This this DT -) [
  vb-be/modnorule/(is be VBZ -)
  sn-chunk/modnorule/(sentence sentence NN -) [
    DT/det/(a a DT -)
  ]
  st-brk/modnorule/(. . Fp -)
]

It is possible to write a new parser to handle this format but so far this has not been a priority for us.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparv-freeling-4.0.tar.gz (8.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sparv_freeling-4.0.0-py3-none-any.whl (19.7 kB view details)

Uploaded Python 3

sparv_freeling-4.0-py3-none-any.whl (19.7 kB view details)

Uploaded Python 3

File details

Details for the file sparv-freeling-4.0.tar.gz.

File metadata

  • Download URL: sparv-freeling-4.0.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for sparv-freeling-4.0.tar.gz
Algorithm Hash digest
SHA256 dcfffca03128a139e4c4cf608a7ee36879ca74d9c133bfc89d6d2b232b20816e
MD5 e9fd95ee26f2f7d146d0a3d9f996fe84
BLAKE2b-256 83dbd81d2595e106c76e077d668d7f536818300885cf7b809f9937eb13ab1047

See more details on using hashes here.

File details

Details for the file sparv_freeling-4.0.0-py3-none-any.whl.

File metadata

  • Download URL: sparv_freeling-4.0.0-py3-none-any.whl
  • Upload date:
  • Size: 19.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for sparv_freeling-4.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b5ca795649c7b50d466abb9bfc81882761d3ea826c13f5a9fb3639a0af733243
MD5 251f1d2e7fb51f49c838605b600e98e0
BLAKE2b-256 f21b97808076903b0f5247c700589a45ee8b866fb622e5556091ac5ab19ccca1

See more details on using hashes here.

File details

Details for the file sparv_freeling-4.0-py3-none-any.whl.

File metadata

  • Download URL: sparv_freeling-4.0-py3-none-any.whl
  • Upload date:
  • Size: 19.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for sparv_freeling-4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b6be9b52ac2a125eabf409956569ed0e8b43453ada9f28c1ea184baa3b728e17
MD5 5536c1eca839c79520d86eea7a8059d3
BLAKE2b-256 2c548101f5e8ad43430da8797370b6672ee8542d30dfa20300fb6d2b91ad9392

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page