Skip to main content

Add your description here

Project description

sparv-sbx-conllu

PyPI version PyPI license PyPI - Python Version

Maturity badge - level 2 Stage

codecov

CI(check) CI(release) CI(scheduled) CI(test)

Plugin to sparv import CoNLL-U files to Sparv.

Install

First, install sparv as suggested,

pipx install sparv

Then install sparv-sbx-conllu with

sparv plugins install sparv-sbx-conllu

Usage

To use this plugin to import CoNLL-U files to your corpus, add the following to your config.yaml:

# file=config.yaml
import:
  importer: sbx_conllu:parse

Configuration

All annotations are exported by default, but if you want to use a annotation in another analysis you need to specify them in your config.yaml.

By default, sparv_sbx_conllu exports the annotations text, document, sentenceand token. Other annotations can be added to sbx_conllu.import_attributes.

For instance, to use the annotations token:xpos and sentence:sent_id add them like this:

# file=config.yaml
sbx_conllu:
  import_attributes:
    - token:xpos
    - sentence:sent_id

Classes

To use annotations from sparv_sbx_conllu in other analysis you can be needed to add them to classes in your config.yaml.

Example

You want to use the sentence and token from sparv_sbx_conllu and also map token:xpos as token:pos and token:pos_ud as token:upos.

# file=config.yaml
classes:
  sentence: sentence
  token: token
  "token:pos": token:xpos
  "token:upos": token:pos_ud
Common annotation from a standard CoNLL-U file

These annotations are some of the annotations you can expect from a CoNLL-U file, but it depends on the file.

| Annotation | CoNLL-U column | Always? | Comment | Example in CoNLL-U | | ------------------------------------- | ------------------------------------------- | ------- | ------------------------------------------------------ | -------------------------------------------- | ---------- | ----------- | ------------- | | text | | yes | the whole text | | document | | yes | either implicit, or at least one specified | | sentence | | yes | must contain at least one sentence | | token | | yes | must contain at least one token | | document:id | # newdoc id = | no | | # newdoc id = ID gives document:id = ID | | paragraph | # newpar
NewPar=Yes in column misc | no | Can exist around sentences
And inside sentences | # newpar
NewPar=Yes in misc column | | paragraph:id | # newpar id = | no | | # newpar id = ID gives paragraph:id = ID | | token:id | id column | no | Always present in standard CoNLL-U | | token span over this form in the text | form column | no | Always present in standard CoNLL-U, may contain spaces | | token:baseform | lemma column | no | May contain spaces | | token:pos_ud | upos column | no | UD POS | | token:xpos | xpos column | no | custom POS (no standard) | | token:feats_ud | feats column | no | Dict-like values | Case=Nom | Gender=Fem | Number=Sing | Polarity=Pos | | token:dephead_ud | head column | no | integer | | token:deprel_ud | deprel column | no | UD-dep value | | token:deps_ud | depscolumn | no | At least one pairhead:deprel |2:obj | 4:obj | |token:misc_ud |misccolumn | no | Dict-like values |SpaceAfter=No` |

Known issues

Importing CoNLL-U

Sparv will log a warning if any of the following are encountered.

  • When importing CoNLL-U data, empty nodes are skipped. E.g. id = 5.1 Tracking issue
  • When importing CoNLL-U data, tokens inside multiword tokens are skipped. E.g. id = 2-3 are added, but id = 2 and id = 3 are skipped. Tracking issue

Minimum Supported Python Version Policy

The Minimum Supported Python Version is fixed for a given minor (1.x) version. However it can be increased when bumping minor versions, i.e. going from 1.0 to 1.1 allows us to increase the Minimum Supported Python Version. Users unable to increase their Python version can use an older minor version instead. Below is a list of sparv-sbx-conllu versions and their Minimum Supported Python Version:

  • v0.1: Python 3.11.

Note however that sparv-sbx-conllu also has dependencies, which might have different MSPV policies. We try to stick to the above policy when updating dependencies, but this is not always possible.

Changelog

This project keeps a changelog.

License

This repository is licensed under the MIT license.

Development

Development prerequisites

For starting to develop on this repository:

  • Clone the repo (in one of the ways below):
    • git clone git@github.com:spraakbanken/sparv-sbx-conllu.git
    • git clone https://github.com/spraakbanken/sparv-sbx-conllu.git
  • Setup environment: make dev
  • Install pre-commit hooks: pre-commit install

Do your work.

Tasks to do:

  • Test the code with make test or make test-w-coverage.
    • Snapshot can be updated by make snapshot-update
  • Lint the code with make lint.
  • Check formatting with make check-fmt.
  • Format the code with make fmt.
  • Type-check the code with make type-check.
  • Test the examples with:
    • make test-example-en_ewt-ud-test
    • make test-example-long-token-to-text
    • make test-example-no-metadata
    • make test-example-paragraph-and-document
    • make test-example-paragraph-in-sentence
    • make test-example-sentence-comments

This repo uses conventional commits.

Release a new version

  • Prepare the CHANGELOG: make prepare-release.
  • Edit CHANGELOG.md to your liking.
  • Add to git: git add --update
  • Commit with git commit -m 'chore(release): prepare release' or cog commit chore 'prepare release' release.
  • Bump version (depends on `bump-my-version)
    • install with uv tool install bump-my-version
    • Major: make bumpversion part=major
    • Minor: make bumpversion part=minor
    • Patch: make bumpversion part=patch or make bumpversion
  • Push main and tags to GitHub: git push main --tags or make publish
  • Add metadata for Språkbanken's resource

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparv_sbx_conllu-0.1.0.tar.gz (9.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sparv_sbx_conllu-0.1.0-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file sparv_sbx_conllu-0.1.0.tar.gz.

File metadata

  • Download URL: sparv_sbx_conllu-0.1.0.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for sparv_sbx_conllu-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6691b0dd4bd19075706d79725b608b75f1c50a1fd48f88b063cb453e7b78c21f
MD5 5d828c842b75b9cbf31d48270943a818
BLAKE2b-256 3a84d7754836c75c30b8c70a9533985b466a26f9042eadc7b14f592e35d12862

See more details on using hashes here.

File details

Details for the file sparv_sbx_conllu-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: sparv_sbx_conllu-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for sparv_sbx_conllu-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 140948eb908c95e31bb76b21083a06f2012c867db95cf56703a2c05979e38dbd
MD5 5cea2708109177421f3651d0b7e76109
BLAKE2b-256 746e16e3cd1e87e1b961254dee2e4f5ca35b05ad7703b733fa18260988d16d52

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page