Profile of proycon

wandexer

Last released Jun 9, 2026

"index annorepo container to elastic index"

codemeta2mp

Last released Apr 9, 2026

Converts codemeta to a representation for the SSHOC Open Marketplace

codemeta2html

Last released Mar 18, 2026

Convert software metadata in codemeta to html for visualisation

CodeMetaPy

Last released Mar 18, 2026

Generate and manage CodeMeta software metadata

gecco

Last released Mar 10, 2026

Gene Cluster prediction with Conditional random fields.

python-frog

Last released Feb 2, 2026

Python binding to Frog, an NLP suite for Dutch doing part-of-speech tagging, lemmatisation, morphological analysis, named-entity recognition, shallow parsing, and dependency parsing.

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is a regular-expression based, extensible, and advanced tokeniser written in C++ (https://languagemachines.github.io/ucto).

text-info

Last released Jan 14, 2026

Tools to extract useful information from xml text corpora

analiticcl

Last released Jan 5, 2026

None

stam

Last released Jan 5, 2026

None

CLAM

Last released Oct 22, 2025

Turns command-line tools into fully-fledged RESTful webservices with an auto-generated web-interface for human end-users.

codemeta-server

Last released Aug 18, 2025

Web API serving codemeta software metadata using codemeta and schema.org, provides a SPARQL endpoint and also offers a human web-interface

FoLiA-tools

Last released May 8, 2025

FoLiA-tools contains various Python-based command line tools for working with FoLiA XML (Format for Linguistic Annotation)

python3-timbl

Last released May 2, 2025

Python 3 language binding for the Tilburg Memory-Based Learner

sesdiff

Last released Oct 15, 2024

Generates a shortest edit script (Myers' diff algorithm) to indicate how to get from the strings in column A to the strings in column B. Also provides the edit distance (levenshtein). This is the Python binding.

FoLiA

Last released Oct 11, 2024

An extensive library for processing FoLiA documents. FoLiA stands for Format for Linguistic Annotation and is a very rich XML-based format used by various Natural Language Processing tools.

FoLiA-Linguistic-Annotation-Tool

Last released Jul 5, 2024

FLAT is a web-based linguistic annotation environment based around the FoLiA format (https://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm.

Spacy2FoLiA

Last released Feb 27, 2024

Library that adds FoLiA (format for linguistic annotation) support to spaCy

foliadocserve

Last released Feb 7, 2024

The FoLiA Document Server is a backend HTTP service to interact with documents in the FoLiA format, a rich XML-based format for linguistic annotation (http://proycon.github.io/folia). It provides an interface to efficiently edit FoLiA documents through the FoLiA Query Language (FQL).

Glem

Last released Oct 5, 2023

GLEM is a lemmatizer for Ancient Greek.

colibricore

Last released Jul 3, 2023

Colibri Core is an NLP tool as well as a C++ and Python library (all included in this package) for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` which allows you to build, view, manipulate and query pattern models.

piereling

Last released Nov 30, 2020

Piereling is a webservice and web-application to convert between a variety of document formats, mostly from and to FoLiA XML. It is intended for NLP pipelines.

CLAMServices

Last released Nov 30, 2020

A collection of CLAM Webservices for various of our NLP tools

lamastats

Last released Jul 24, 2020

Simple visitor analytics application for presenting usage statistics on several components included in LaMachine.

hanzigrid

Last released Dec 28, 2019

Generate a Chinese character grid for study

WikiEnte

Last released Jun 12, 2019

Entity extraction using DBPedia through spotlight

PyNLPl

Last released Mar 13, 2019

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl contains modules for basic tasks, clients for interfacting with server, and modules for parsing several file formats common in NLP, most notably FoLiA.

Maarten van Gompel

32 projects