Skip to main content

Tools for converting and searching data between different formats and RDF specification

Project description

ligttools

A collection of tools for converting IGT (Interlinear Glossed Text) data between different formats, including Ligt, an RDF specification.

Overview

ligttools is a Python library and collection of command-line tools for working with Interlinear Glossed Text in RDF. It provides utilities for converting data between various commonly used formats (ToolBox, FLEx, etc.) and RDF (Resource Description Framework) using Ligt vocabulary.

Installation

Install ligttools using pip:

git clone https://github.com/ligt-dev/ligttools.git
pip install .

After installing the package, a command-line tool ligt-convert will be available in your system.

If you installed the package in a virtual environment, make sure the environment is activated before using the tool.

For Developers

For development, we recommend using uv. To set up the environment:

# Clone the repository
git clone https://github.com/ligt-dev/ligttools.git
cd ligttools
uv sync

# For development dependencies (testing, etc.)
uv sync --extra dev

Available Tools

ligt-convert

A tool for converting data between common IGT data formats and RDF-based Ligt:

# Convert from CLDF to Ligt
ligt-convert -f cldf -t ligt input.json -o output.rdf

# Convert from Ligt to Toolbox 
ligt-convert -f ligt -t toolbox input.rdf -o output.json

# You can also use long-form flags:
ligt-convert --from=cldf --to=ligt examples.csv --output=examples.ttl

# List supported formats
ligt-convert --list-formats

For advanced usage:

# Read from stdin (specify input format explicitly)
cat input.json | ligt-convert -f cldf -t ligt -o output.ttl

# Write to stdout (omit the output file)
ligt-convert -f cldf -t ligt examples.csv

# Specify RDF serialisation (default is Turtle)

ligt-convert -f cldf -t ligt.n3 examples.csv

ligt-search

A simple command-line interface to search for Ligt examples across local and remote datasets and SPARQL endpoints. Supports providing additional triples containing local annotations and a table mapping string labels to external ontologies:

To extract sentences containing glossed GEN from a local Turtle file:

uv run ligt-search -q ":GEN" test-data.ttl

To extract sentences with morph glossed as "cat" and a morph glossed as as "ed" and having a value PST from a remote SPARQL endpoint:

uv run ligt-search -q "cat ed:PST" http://sparql-endpoint-url/sparql

To extract sentences with morph with the form "l" that corresponds to a past tense connected to an external ontology from a local data file:

uv run ligt-search -q "l:<https://purl.org/olia/unimorph/unimorph.owl#PST>" test-data.ttl \
   test-mappings.ttl https://raw.githubusercontent.com/acoli-repo/olia/refs/heads/master/owl/experimental/unimorph/unimorph.owl

Query syntax

Disclaimer: The query language integrated with the tool is rudimentary and is temporary. Future integration of the tool into a GUI application and the development of this tool will lead to significant changes.

The anatomy of a query is the following:

  • The query specifies filters on utterances represented in the specified datasets
  • Each space-separated token corresponds to a word-like object corresponds to an ligt:Word object, i.e. a word-like token
  • The order and co-occurrence is not limited by the query
  • Each token has a form of <form>[:<gloss>], where gloss can be a string literal or a URI in angular brackets

enligten / ligt-serve

A Flask-based REST API for extracting and converting IGT data from supported formats to RDF on-the-fly with ligt-convert. Usage:

# Start the server on default port (8080) and host (0.0.0.0)
enligten

# Start the server on specific host and port
enligten -p 5000 -h 127.0.0.1

# We can also start the server using its alias that is consistent with the rest of the tools:
ligt-serve

Calling the API:

# Convert from CLDF to Ligt
curl "http://localhost:8080/https://raw.githubusercontent.com/cldf-datasets/apics/refs/heads/master/cldf/StructureDataset-metadata.json"

# Convert from CLDF to Ligt with format specified explicitly
curl "http://localhost:8080/https://raw.githubusercontent.com/cldf-datasets/apics/refs/heads/master/cldf/StructureDataset-metadata.json" -H "format: cldf"

Other tools (in development)

  • ligt-validate - Validates data against the Ligt schema
  • ligt-query - Query RDF data using SPARQL
  • ligt-visualize - Visualizes linguistic data structures

Python API

You can also use LigtTools as a Python library:

Conversion

from ligttools.converters import get_converter

# Convert JSON to RDF
cldf_converter = get_converter('cldf')
rdf_data = cldf_converter.to_rdf('examples.csv', 'output.ttl')

# Convert RDF to JSON
json_data = cldf_converter.from_rdf('input.ttl', 'output.csv')

# Get list of supported formats
from ligttools.converters import get_supported_formats
formats = get_supported_formats()

Search

Importing the necessary functions and initialising a graph:

from ligttools.search import Dataset
from ligttools.search.sparql import create_graph

datasets = [
    Dataset("test-data.ttl", is_sparql=False),
    Dataset("http://sparql-endpoin-url/sparql", is_sparql=True),
    
    # A dataset can be also initialised from a string
    Dataset.from_string("https://remote.url/dataset.ttl")
]

g = create_graph(datasets)

Now we can define the arguments and run the query:

from ligttools.search import QueryArg
from ligttools.search.sparql import get_results

args = [
    QueryArg("s", "PL", is_uri=False),
    QueryArg(None, "NOM"),
    QueryArg(None, "<https://purl.org/olia/unimorph/unimorph.owl#PST>", is_uri=True),
    
    # An argument can also be parsed from a string
    QueryArg.from_token(":PST")
]

# We need to explicitly provide a list of remote SPARQL endpoints
endpoints = [ds.url for ds in datasets if ds.is_sparql]
for row in get_results(g, endpoints, args):
    print(row)

Supported Formats

Currently, ligttools supports the following formats:

  • CLDF
  • ToolBox
  • FLExText

Extending ligttools

To add support for a new format:

  1. Create a new converter class that extends BaseConverter
  2. Implement the to_rdf and from_rdf methods
  3. Register the converter using the registration function

Example:

from ligttools.converters.base import BaseConverter
from ligttools.converters import register_converter

class ELANConverter(BaseConverter):
    def to_rdf(self, input_data, output_path=None):
        # Implementation...
        pass

    def from_rdf(self, input_data, output_path=None):
        # Implementation...
        pass

# Register the converter
register_converter('xml', ELANConverter)

License

This software is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ligttools-0.3.0.tar.gz (162.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ligttools-0.3.0-py3-none-any.whl (80.4 kB view details)

Uploaded Python 3

File details

Details for the file ligttools-0.3.0.tar.gz.

File metadata

  • Download URL: ligttools-0.3.0.tar.gz
  • Upload date:
  • Size: 162.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ligttools-0.3.0.tar.gz
Algorithm Hash digest
SHA256 847a303473a3fef7620a87bdd1440af21df7907feb1339f5743a62ab928a9a98
MD5 b0d4f1c06766de5b538162a5fd594216
BLAKE2b-256 7158f641a46252281d3cab75c43374f0e917494d880eb8d41386456f113e7729

See more details on using hashes here.

Provenance

The following attestation bundles were made for ligttools-0.3.0.tar.gz:

Publisher: publish.yml on ligt-dev/ligttools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ligttools-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: ligttools-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 80.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ligttools-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dd41a1af1b66ea3df62d6e844b4ee79a431b7e2257b358fd6abbec90e54d99e2
MD5 b45038d3b72475f0b240857d956ebe69
BLAKE2b-256 d76f3cded7cb22e505849b70f25435416664f273de25e7ff933f4605c0d8fa95

See more details on using hashes here.

Provenance

The following attestation bundles were made for ligttools-0.3.0-py3-none-any.whl:

Publisher: publish.yml on ligt-dev/ligttools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page