Converts FoLiA and TEI files to Alpino XML files
Project description
FoLiA and TEI to Alpino XML
Converts FoLiA and TEI XML files to Alpino XML files. Each sentence in the input file is parsed separately.
Usage
Command Line
pip install corpus2alpino
corpus2alpino -s localhost:7001 folia.xml -o alpino.xml
Or from project root:
python -m corpus2alpino -s localhost:7001 folia.xml -o alpino.xml
Library
from corpus2alpino.converter import Converter
from corpus2alpino.annotators.alpino import AlpinoAnnotator
from corpus2alpino.collectors.filesystem import FilesystemCollector
from corpus2alpino.targets.memory import MemoryTarget
from corpus2alpino.writers.lassy import LassyWriter
alpino = AlpinoAnnotator("localhost", 7001)
converter = Converter(FilesystemCollector(["folia.xml"]),
# Not needed when using the PaQuWriter
annotators=[alpino],
# This can also be ConsoleTarget, FilesystemTarget
target=MemoryTarget(),
# Set to merge treebanks, also possible to use PaQuWriter
writer=LassyWriter(True))
# get the Alpino XML output, combined into one treebank XML file
parses = converter.convert()
print(''.join(parses)) # <treebank><alpino_ds ... /></treebank>
Unit Test
python -m unittest
Upload to PyPi
python setup.py sdist
twine upload dist/*
Requirements
- Alpino parser running as a server:
Alpino batch_command=alpino_server -notk server_port=7001
- Python 3.6 or higher (developed using 3.6.3).
- libfolia-dev
- libicu-dev
- libxml2-dev
- libticcutils2-dev
- libucto-dev
- ucto
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
corpus2alpino-0.2.0.tar.gz
(4.6 kB
view hashes)