Add your description here
Project description
sparv-sbx-conllu
Plugin to sparv import CoNLL-U files to Sparv.
Install
First, install sparv as suggested,
pipx install sparv
Then install sparv-sbx-conllu with
sparv plugins install sparv-sbx-conllu
Usage
To use this plugin to import CoNLL-U files to your corpus, add the following to your config.yaml:
# file=config.yaml
import:
importer: sbx_conllu:parse
Configuration
All annotations are exported by default, but if you want to use a annotation in another analysis
you need to specify them in your config.yaml.
By default, sparv_sbx_conllu exports the annotations text, document, sentenceand token.
Other annotations can be added to sbx_conllu.import_attributes.
For instance, to use the annotations token:xpos and sentence:sent_id add them like this:
# file=config.yaml
sbx_conllu:
import_attributes:
- token:xpos
- sentence:sent_id
Classes
To use annotations from sparv_sbx_conllu in other analysis you can be needed to add them to classes
in your config.yaml.
Example
You want to use the sentence and token from sparv_sbx_conllu and also map token:xpos as token:pos and token:pos_ud as token:upos.
# file=config.yaml
classes:
sentence: sentence
token: token
"token:pos": token:xpos
"token:upos": token:pos_ud
Common annotation from a standard CoNLL-U file
These annotations are some of the annotations you can expect from a CoNLL-U file, but it depends on the file.
| Annotation | CoNLL-U column | Always? | Comment | Example in CoNLL-U |
| ------------------------------------- | ------------------------------------------- | ------- | ------------------------------------------------------ | -------------------------------------------- | ---------- | ----------- | ------------- |
| text | | yes | the whole text |
| document | | yes | either implicit, or at least one specified |
| sentence | | yes | must contain at least one sentence |
| token | | yes | must contain at least one token |
| document:id | # newdoc id = | no | | # newdoc id = ID gives document:id = ID |
| paragraph | # newparNewPar=Yes in column misc | no | Can exist around sentences
And inside sentences | # newparNewPar=Yes in misc column |
| paragraph:id | # newpar id = | no | | # newpar id = ID gives paragraph:id = ID |
| token:id | id column | no | Always present in standard CoNLL-U |
| token span over this form in the text | form column | no | Always present in standard CoNLL-U, may contain spaces |
| token:baseform | lemma column | no | May contain spaces |
| token:pos_ud | upos column | no | UD POS |
| token:xpos | xpos column | no | custom POS (no standard) |
| token:feats_ud | feats column | no | Dict-like values | Case=Nom | Gender=Fem | Number=Sing | Polarity=Pos |
| token:dephead_ud | head column | no | integer |
| token:deprel_ud | deprel column | no | UD-dep value |
| token:deps_ud | depscolumn | no | At least one pairhead:deprel |2:obj | 4:obj | |token:misc_ud |misccolumn | no | Dict-like values |SpaceAfter=No` |
Known issues
Importing CoNLL-U
Sparv will log a warning if any of the following are encountered.
- When importing CoNLL-U data, empty nodes are skipped. E.g.
id = 5.1Tracking issue - When importing CoNLL-U data, tokens inside multiword tokens are skipped.
E.g.
id = 2-3are added, butid = 2andid = 3are skipped. Tracking issue
Minimum Supported Python Version Policy
The Minimum Supported Python Version is fixed for a given minor (1.x) version. However it can be increased when bumping minor versions, i.e. going from 1.0 to 1.1 allows us to increase the Minimum Supported Python Version. Users unable to increase their Python version can use an older minor version instead. Below is a list of sparv-sbx-conllu versions and their Minimum Supported Python Version:
- v0.1: Python 3.11.
Note however that sparv-sbx-conllu also has dependencies, which might have different MSPV policies. We try to stick to the above policy when updating dependencies, but this is not always possible.
Changelog
This project keeps a changelog.
License
This repository is licensed under the MIT license.
Development
Development prerequisites
For starting to develop on this repository:
- Clone the repo (in one of the ways below):
git clone git@github.com:spraakbanken/sparv-sbx-conllu.gitgit clone https://github.com/spraakbanken/sparv-sbx-conllu.git
- Setup environment:
make dev - Install
pre-commithooks:pre-commit install
Do your work.
Tasks to do:
- Test the code with
make testormake test-w-coverage.- Snapshot can be updated by
make snapshot-update
- Snapshot can be updated by
- Lint the code with
make lint. - Check formatting with
make check-fmt. - Format the code with
make fmt. - Type-check the code with
make type-check. - Test the examples with:
make test-example-en_ewt-ud-testmake test-example-long-token-to-textmake test-example-no-metadatamake test-example-paragraph-and-documentmake test-example-paragraph-in-sentencemake test-example-sentence-comments
This repo uses conventional commits.
Release a new version
- Prepare the CHANGELOG:
make prepare-release. - Edit
CHANGELOG.mdto your liking. - Add to git:
git add --update - Commit with
git commit -m 'chore(release): prepare release'orcog commit chore 'prepare release' release. - Bump version (depends on `bump-my-version)
- install with
uv tool install bump-my-version - Major:
make bumpversion part=major - Minor:
make bumpversion part=minor - Patch:
make bumpversion part=patchormake bumpversion
- install with
- Push
mainand tags to GitHub:git push main --tagsormake publish- GitHub Actions workflow will build, test and publish the package to PyPi.
- Add metadata for Språkbanken's resource
- Generate metadata:
make generate-metadata - Upload the files from
assets/metadata/export/sbx_metadata/utilityto https://github.com/spraakbanken/metadata/tree/main/yaml/utility.
- Generate metadata:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sparv_sbx_conllu-0.1.0.tar.gz.
File metadata
- Download URL: sparv_sbx_conllu-0.1.0.tar.gz
- Upload date:
- Size: 9.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6691b0dd4bd19075706d79725b608b75f1c50a1fd48f88b063cb453e7b78c21f
|
|
| MD5 |
5d828c842b75b9cbf31d48270943a818
|
|
| BLAKE2b-256 |
3a84d7754836c75c30b8c70a9533985b466a26f9042eadc7b14f592e35d12862
|
File details
Details for the file sparv_sbx_conllu-0.1.0-py3-none-any.whl.
File metadata
- Download URL: sparv_sbx_conllu-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Arch Linux","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
140948eb908c95e31bb76b21083a06f2012c867db95cf56703a2c05979e38dbd
|
|
| MD5 |
5cea2708109177421f3651d0b7e76109
|
|
| BLAKE2b-256 |
746e16e3cd1e87e1b961254dee2e4f5ca35b05ad7703b733fa18260988d16d52
|