Skip to main content

Converts FoLiA and TEI files to Alpino XML files

Project description

Actions Status


CHAT, FoLiA, PaQu metadata, plaintext and TEI to Alpino XML or PaQu metadata format

Converts CHAT, FoLiA, PaQu metadata, plaintext and TEI XML files to Alpino XML files. Each sentence in the input file is parsed separately.


Command Line

pip install corpus2alpino
corpus2alpino -s localhost:7001 folia.xml -o alpino.xml

Or from project root:

python -m corpus2alpino -s localhost:7001 folia.xml -o alpino.xml


from corpus2alpino.converter import Converter
from corpus2alpino.annotators.alpino import AlpinoAnnotator
from corpus2alpino.collectors.filesystem import FilesystemCollector
from corpus2alpino.targets.memory import MemoryTarget
from corpus2alpino.writers.lassy import LassyWriter

alpino = AlpinoAnnotator("localhost", 7001)
converter = Converter(FilesystemCollector(["folia.xml"]),
    # Not needed when using the PaQuWriter
    # This can also be ConsoleTarget, FilesystemTarget
    # Set to merge treebanks, also possible to use PaQuWriter

# get the Alpino XML output, combined into one treebank XML file
parses = converter.convert()
print(''.join(parses)) # <treebank><alpino_ds ... /></treebank>


It is possible to add custom properties to (existing) Lassy/Alpino files. This is done using a csv-file containing the node attributes and values to look for and the custom properties to assign.

For example:

python -m corpus2alpino tests/example_lassy.xml -e tests/enrichment.csv -of lassy

See corpus2alpino.annotators.enrich_lassy for more information.


Unit Test

python -m unittest

Upload to PyPi


Make sure setuptools and wheel are installed. Then from the virtualenv:

python sdist bdist_wheel
twine upload dist/*


Installation Instructions for Ubuntu

sudo apt install libfolia-dev libxml2-dev
pip install -r requirements.txt

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

corpus2alpino-0.3.10.tar.gz (21.1 kB view hashes)

Uploaded source

Built Distribution

corpus2alpino-0.3.10-py3-none-any.whl (31.1 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page