Helper for converting CONLLU files and uploading the corpus to LiRI Corpus Platform (LCP)
Project description
LCP CLI module
Command-line tool for converting CONLLU files and uploading the corpus to LCP
Installation
Make sure you have python 3.11+ with pip installed in your local environment, then run:
pip install lcpcli
Usage
Examples:
Conversion of a CoNLL-U (Plus) corpus:
lcpcli -i ~/conll_ext/ -o ~/upload/
Data upload:
lcpcli -c ~/upload/ -k $API_KEY -s $API_SECRET -p "my project" --live
Including --live points the upload to the live instance of LCP. Leave it out if you want to add a corpus to an instance of LCP running on localhost.
Help:
lcpcli --help
lcpcli can take a corpus of CoNLL-U (PLUS) files and import it to a collection created on LCP.
Besides the standard token-level CoNLL-U fields (form, lemma, upos, xpos, feats, head, deprel, deps) one can also provide document-, paragraph- and sentence-level annotations using comment lines in the files (see the CoNLL-U Format section).
CoNLL-U Format
The CoNLL-U format is documented at: https://universaldependencies.org/format.html
The LCP CLI converter will treat all the comments that start with # newdoc KEY = VALUE as document-level attributes, and all the comments that start with # newpar KEY = VALUE as paragraph-level attributes. All other comment lines following the format # key = value will be treated sentence-level attributes.
The key-value pairs in the FEATS and MISC columns of a token line will be mapped to corresponding attributes in the LCP corpus. Additionally, if the MISC cell includes SpaceAfter=Yes or SpaceAfter=No (case senstive) the token will be represented with (respectively, without) a trailing space character in the database.
CoNLL-U Plus
CoNLL-U Plus is an extension to the CoNLLU-U format documented at: https://universaldependencies.org/ext-format.html
If your files start with a comment line of the form # global.columns = ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC, lcpcli will treat them as CoNLL-U PLUS files and process the columns according to the names you set in that line.
CoNLL-U conversion and upload
-
Create a directory in which you have all your properly-fromatted CoNLL-U files.
-
Visit an LCP instance (e.g. catchphrase) and create a new collection if you don't already have one where your corpus should go.
-
Retrieve the API key and secret for your project by clicking on the button that says: "Create API Key".
-
Once you have your API key and secret, you can start converting and uploading your corpus by running the following command:
lcpcli -i $CONLLU_FOLDER -o $OUTPUT_FOLDER -k $API_KEY -s $API_SECRET -p $PROJECT_NAME --live
$CONLLU_FOLDERshould point to the folder that contains your CONLLU files$OUTPUT_FOLDERshould point to another folder that will be used to store the converted files to be uploaded$API_KEYis the key you copied from your project on LCP (still visible when you visit the page)$API_SECRETis the secret you copied from your project on LCP (only visible upon API Key creation)$PROJECT_NAMEis the name of the project exactly as displayed on LCP -- it is case-sensitive, and space characters should be escaped
Other input formats, rich data
Previous versions of lcpcli defined procedures to include rich annotations in CoNLL-U files, including time-anchored media files, in combination with annex non-CoNLL-U files. These methods are no longer supported -- use an older version of lcpcli if you require those features.
lcpcli now ships with a Python module called lcpcli.builder that you can use to convert any input format. The default CoNLL-U converter included in lcpcli uses lcpcli.builder under the hood.
You can find a short tutorial on how to use the module in BUILDER.md. Further information can be found in the LCP documentation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lcpcli-0.3.0.tar.gz.
File metadata
- Download URL: lcpcli-0.3.0.tar.gz
- Upload date:
- Size: 10.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e10f8ecbd246c478e8651ed9e317a0163b262fe3e640070d7a65f4db9e0c397c
|
|
| MD5 |
9e1f856a520c78fdfd09db7757b91e61
|
|
| BLAKE2b-256 |
247149b63f354557c951bfb1299d3aead4b9a51fc6e47e0c5e7a2967db86c4c2
|
File details
Details for the file lcpcli-0.3.0-py3-none-any.whl.
File metadata
- Download URL: lcpcli-0.3.0-py3-none-any.whl
- Upload date:
- Size: 10.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70c3e9934f9d519080426362185dfd31c060fa42db4c88354edc242bbafbe2b2
|
|
| MD5 |
a705323ce3e946620675a0547f4c3aee
|
|
| BLAKE2b-256 |
aea3cabb1a53566ef47b56a76dfe243b07fd7f3b676638ad0a2c68b3ed24a8d1
|