Convert FLEx data to CLDF-ready CSV.
Project description
cldflex
Convert FLEx data to CLDF-ready CSV.
Many descriptive linguists have annotated language data in a FLEx (SIL's Fieldworks Lexical Explorer) database, perhaps the most popular and accessible assisted segmentation and annotation workflow.
However, a reasonably complete data export is only available in XML, which is not human-friendly, and is not readily converted to other data.
A data format growing in popularity is the CLDF standard, a table-based approach with human-readable datasets, designed to be used in CLLD apps and easily processable by any software that can read CSV files, including R, pandas or spreadsheet applications.
The goal of cldflex
is to convert lexicon and corpus data stored in FLEx to CSV tables, primarily for use in CLDF datasets.
Installation
cldflex
is available on PyPI:
pip install cldflex
Usage
At the moment, there are two commands: cldflex flex2csv
processes .flextext
(corpora), and cldflex lift2csv
processes .lift
(lexica) files.
Both commands create a number of CSV files.
One can either use cldfbench to create one's own CLDF datasets from these files, or add the --cldf
argument to create (simple) datasets.
Project-specific configuration can be passed via --conf your/config.yaml
flex2csv
Basic usage:
cldflex flex2csv texts.flextext
Connect the corpus with the lexicon:
cldflex flex2csv texts.flextext --lexicon lexicon.lift
Create a CLDF dataset:
cldflex flex2csv texts.flextext --lexicon lexicon.lift --cldf
lift2csv
Extract morphemes, morphs, and entries from lexicon.lift
:
cldflex lift2csv lexicon.lift
Create a CLDF dataset with a Dictionary
module:
cldflex lift2csv lexicon.lift --cldf
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.