Skip to main content

Reproducible Experimentation for Computational Linguistics Use

Project description

Recluse

Author: L. Amber Wilcox-O’Hearn

Contact: amber@cs.toronto.edu

Released under the GNU AFFERO GENERAL PUBLIC LICENSE, see COPYING file for details.

Introduction

Recluse (Reproducible Experimentation for Computational Linguistics Use) is a set of tools for running computational linguistics experiments reproducibly.

This version contains

  • utils, which has a function for reading and writing unicode with regular or compressed text, and a function for splitting a file into smaller pieces. The latter is needed for some tools that load everything into RAM, or train on all the data when we would be satisfied with training on partial data.

  • article_randomiser, which reproducibly randomly divides a corpus into training, development, and test sets.

  • nltk_based_segmenter_tokeniser, which does sentence segmentation and word tokenisation. It is optimised for Wikipedia type text, and it has a mode that preserves the untokenised text (modulo extra whitespace).

  • vocabulary_generator and the helper class vocabulary_cutter. This wraps srilm as it makes unigram counts, and then selects the most frequent.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recluse-0.1.12-dirty.tar.gz (31.9 kB view details)

Uploaded Source

File details

Details for the file recluse-0.1.12-dirty.tar.gz.

File metadata

File hashes

Hashes for recluse-0.1.12-dirty.tar.gz
Algorithm Hash digest
SHA256 33ba8e867e1065c79329dea09b6e5e7397be8110a3349951d0879598a38fff47
MD5 958d88da96ce490bf33e4182b7bfeead
BLAKE2b-256 e1afc224bb254fed7e36af7178ea9cff036249ad6564b6b9aebd8783bac05abe

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page