6 projects
webarticlecurator
A crawler program to download content from portals (news, forums, blogs) and convert it to the desired output format according to the configuration.
mplogger
Multi-processing capable print-like logger for Python
mthasher
Calculate multiple hash digests for a piece of data in parallel, one algo/thread.
html2tei
Map the HTML schema of portals to valid TEI XML with the tags and structures used in them using small manual portal-specific configurations.
quntoken
Hungarian tokenizer based on quex and huntoken.
xtsv
A generic TSV-style format based intermodular communication framework and REST API