Skip to main content

Converts FoLiA and TEI files to Alpino XML files

Project description

Build Status

FoLiA and TEI to Alpino XML

Converts FoLiA and TEI XML files to Alpino XML files. Each sentence in the input file is parsed separately.

Usage

Command Line

pip install corpus2alpino
corpus2alpino -s localhost:7001 folia.xml -o alpino.xml

Or from project root:

python -m corpus2alpino -s localhost:7001 folia.xml -o alpino.xml

Library

from corpus2alpino.converter import Converter
from corpus2alpino.annotators.alpino import AlpinoAnnotator
from corpus2alpino.collectors.filesystem import FilesystemCollector
from corpus2alpino.targets.memory import MemoryTarget
from corpus2alpino.writers.lassy import LassyWriter

alpino = AlpinoAnnotator("localhost", 7001)
converter = Converter(FilesystemCollector(["folia.xml"]),
    # Not needed when using the PaQuWriter
    annotators=[alpino],
    # This can also be ConsoleTarget, FilesystemTarget
    target=MemoryTarget(),
    # Set to merge treebanks, also possible to use PaQuWriter
    writer=LassyWriter(True))

# get the Alpino XML output, combined into one treebank XML file
parses = converter.convert()
print(''.join(parses)) # <treebank><alpino_ds ... /></treebank>

Unit Test

python -m unittest

Upload to PyPi

See: https://packaging.python.org/tutorials/packaging-projects/#generating-distribution-archives

Make sure setuptools and wheel are installed. Then from the virtualenv:

python setup.py sdist bdist_wheel
twine upload dist/*

Requirements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
corpus2alpino-0.2.3-py3-none-any.whl (19.5 kB) Copy SHA256 hash SHA256 Wheel py3
corpus2alpino-0.2.3.tar.gz (13.5 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page