Tools for converting and searching data between different formats and RDF specification
Project description
ligttools
A collection of tools for converting IGT (Interlinear Glossed Text) data between different formats, including Ligt, an RDF specification.
Overview
ligttools is a Python library and collection of command-line tools for working with Interlinear Glossed Text in RDF. It provides utilities for converting data between various commonly used formats (ToolBox, FLEx, etc.) and RDF (Resource Description Framework) using Ligt vocabulary.
Installation
Install ligttools using pip:
git clone https://github.com/ligt-dev/ligttools.git
pip install .
After installing the package, a command-line tool ligt-convert
will be available in your system.
If you installed the package in a virtual environment, make sure the environment is activated before using the tool.
For Developers
For development, we recommend using uv. To set up the environment:
# Clone the repository
git clone https://github.com/ligt-dev/ligttools.git
cd ligttools
uv sync
# For development dependencies (testing, etc.)
uv sync --extra dev
Available Tools
ligt-convert
A tool for converting data between common IGT data formats and RDF-based Ligt:
# Convert from CLDF to Ligt
ligt-convert -f cldf -t ligt input.json -o output.rdf
# Convert from Ligt to Toolbox
ligt-convert -f ligt -t toolbox input.rdf -o output.json
# You can also use long-form flags:
ligt-convert --from=cldf --to=ligt examples.csv --output=examples.ttl
# List supported formats
ligt-convert --list-formats
For advanced usage:
# Read from stdin (specify input format explicitly)
cat input.json | ligt-convert -f cldf -t ligt -o output.ttl
# Write to stdout (omit the output file)
ligt-convert -f cldf -t ligt examples.csv
# Specify RDF serialisation (default is Turtle)
ligt-convert -f cldf -t ligt.n3 examples.csv
ligt-search
A simple command-line interface to search for Ligt examples across local and remote datasets and SPARQL endpoints. Supports providing additional triples containing local annotations and a table mapping string labels to external ontologies:
To extract sentences containing glossed GEN from a local Turtle file:
uv run ligt-search -q ":GEN" test-data.ttl
To extract sentences with morph glossed as "cat" and a morph glossed as as "ed" and having a value PST from a remote SPARQL endpoint:
uv run ligt-search -q "cat ed:PST" http://sparql-endpoint-url/sparql
To extract sentences with morph with the form "l" that corresponds to a past tense connected to an external ontology from a local data file:
uv run ligt-search -q "l:<https://purl.org/olia/unimorph/unimorph.owl#PST>" test-data.ttl \
test-mappings.ttl https://raw.githubusercontent.com/acoli-repo/olia/refs/heads/master/owl/experimental/unimorph/unimorph.owl
Query syntax
Disclaimer: The query language integrated with the tool is rudimentary and is temporary. Future integration of the tool into a GUI application and the development of this tool will lead to significant changes.
The anatomy of a query is the following:
- The query specifies filters on utterances represented in the specified datasets
- Each space-separated token corresponds to a word-like object corresponds to an
ligt:Wordobject, i.e. a word-like token - The order and co-occurrence is not limited by the query
- Each token has a form of
<form>[:<gloss>], where gloss can be a string literal or a URI in angular brackets
enligten / ligt-serve
A Flask-based REST API for extracting and converting IGT data from supported formats to RDF
on-the-fly with ligt-convert. Usage:
# Start the server on default port (8080) and host (0.0.0.0)
enligten
# Start the server on specific host and port
enligten -p 5000 -h 127.0.0.1
# We can also start the server using its alias that is consistent with the rest of the tools:
ligt-serve
Calling the API:
# Convert from CLDF to Ligt
curl "http://localhost:8080/https://raw.githubusercontent.com/cldf-datasets/apics/refs/heads/master/cldf/StructureDataset-metadata.json"
# Convert from CLDF to Ligt with format specified explicitly
curl "http://localhost:8080/https://raw.githubusercontent.com/cldf-datasets/apics/refs/heads/master/cldf/StructureDataset-metadata.json" -H "format: cldf"
Other tools (in development)
ligt-validate- Validates data against the Ligt schemaligt-query- Query RDF data using SPARQLligt-visualize- Visualizes linguistic data structures
Python API
You can also use LigtTools as a Python library:
Conversion
from ligttools.converters import get_converter
# Convert JSON to RDF
cldf_converter = get_converter('cldf')
rdf_data = cldf_converter.to_rdf('examples.csv', 'output.ttl')
# Convert RDF to JSON
json_data = cldf_converter.from_rdf('input.ttl', 'output.csv')
# Get list of supported formats
from ligttools.converters import get_supported_formats
formats = get_supported_formats()
Search
Importing the necessary functions and initialising a graph:
from ligttools.search import Dataset
from ligttools.search.sparql import create_graph
datasets = [
Dataset("test-data.ttl", is_sparql=False),
Dataset("http://sparql-endpoin-url/sparql", is_sparql=True),
# A dataset can be also initialised from a string
Dataset.from_string("https://remote.url/dataset.ttl")
]
g = create_graph(datasets)
Now we can define the arguments and run the query:
from ligttools.search import QueryArg
from ligttools.search.sparql import get_results
args = [
QueryArg("s", "PL", is_uri=False),
QueryArg(None, "NOM"),
QueryArg(None, "<https://purl.org/olia/unimorph/unimorph.owl#PST>", is_uri=True),
# An argument can also be parsed from a string
QueryArg.from_token(":PST")
]
# We need to explicitly provide a list of remote SPARQL endpoints
endpoints = [ds.url for ds in datasets if ds.is_sparql]
for row in get_results(g, endpoints, args):
print(row)
Supported Formats
Currently, ligttools supports the following formats:
- CLDF
- ToolBox
- FLExText
Extending ligttools
To add support for a new format:
- Create a new converter class that extends
BaseConverter - Implement the
to_rdfandfrom_rdfmethods - Register the converter using the registration function
Example:
from ligttools.converters.base import BaseConverter
from ligttools.converters import register_converter
class ELANConverter(BaseConverter):
def to_rdf(self, input_data, output_path=None):
# Implementation...
pass
def from_rdf(self, input_data, output_path=None):
# Implementation...
pass
# Register the converter
register_converter('xml', ELANConverter)
License
This software is licensed under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ligttools-0.3.0.tar.gz.
File metadata
- Download URL: ligttools-0.3.0.tar.gz
- Upload date:
- Size: 162.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
847a303473a3fef7620a87bdd1440af21df7907feb1339f5743a62ab928a9a98
|
|
| MD5 |
b0d4f1c06766de5b538162a5fd594216
|
|
| BLAKE2b-256 |
7158f641a46252281d3cab75c43374f0e917494d880eb8d41386456f113e7729
|
Provenance
The following attestation bundles were made for ligttools-0.3.0.tar.gz:
Publisher:
publish.yml on ligt-dev/ligttools
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ligttools-0.3.0.tar.gz -
Subject digest:
847a303473a3fef7620a87bdd1440af21df7907feb1339f5743a62ab928a9a98 - Sigstore transparency entry: 483028277
- Sigstore integration time:
-
Permalink:
ligt-dev/ligttools@c280576f2cd7e95c3bdbc335079505053fb1c399 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ligt-dev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c280576f2cd7e95c3bdbc335079505053fb1c399 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file ligttools-0.3.0-py3-none-any.whl.
File metadata
- Download URL: ligttools-0.3.0-py3-none-any.whl
- Upload date:
- Size: 80.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dd41a1af1b66ea3df62d6e844b4ee79a431b7e2257b358fd6abbec90e54d99e2
|
|
| MD5 |
b45038d3b72475f0b240857d956ebe69
|
|
| BLAKE2b-256 |
d76f3cded7cb22e505849b70f25435416664f273de25e7ff933f4605c0d8fa95
|
Provenance
The following attestation bundles were made for ligttools-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on ligt-dev/ligttools
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ligttools-0.3.0-py3-none-any.whl -
Subject digest:
dd41a1af1b66ea3df62d6e844b4ee79a431b7e2257b358fd6abbec90e54d99e2 - Sigstore transparency entry: 483028296
- Sigstore integration time:
-
Permalink:
ligt-dev/ligttools@c280576f2cd7e95c3bdbc335079505053fb1c399 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ligt-dev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c280576f2cd7e95c3bdbc335079505053fb1c399 -
Trigger Event:
workflow_dispatch
-
Statement type: