Skip to main content

Reproducible Experimentation for Computational Linguistics Use

Project description

Recluse

Author: L. Amber Wilcox-O'Hearn

Contact: amber@cs.toronto.edu

Released under the GNU AFFERO GENERAL PUBLIC LICENSE, see COPYING file for details.

==============
Introduction
==============

Recluse (Reproducible Experimentation for Computational Linguistics Use) is a set of tools for running computational linguistics experiments reproducibly.

This version contains

* utils, which has three functions:
** open_with_unicode for reading and writing unicode with regular or compressed text
** split_file_into_chunks for splitting a file into smaller pieces. This is needed for some tools that load everything into RAM, or train on all the data when we would be satisfied with training on partial data.
** partition_by_list works like a combination of the string methods partition and split; it keeps the separators, but partitions into a list.

* article_randomiser, which reproducibly randomly divides a corpus into training, development, and test sets.
* nltk_based_segmenter_tokeniser, which does sentence segmentation and word tokenisation.
It is optimised for Wikipedia type text.
* vocabulary_generator and the helper class vocabulary_cutter. This wraps srilm as it makes unigram counts, and then selects the most frequent.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recluse-0.2.4.tar.gz (459.3 kB view details)

Uploaded Source

File details

Details for the file recluse-0.2.4.tar.gz.

File metadata

  • Download URL: recluse-0.2.4.tar.gz
  • Upload date:
  • Size: 459.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for recluse-0.2.4.tar.gz
Algorithm Hash digest
SHA256 9fb5aa08a8e1d3514b7b5f53cdf21894e43024eb09571752149d956c0aea6a1d
MD5 322374707b605bb048c76ac698c942c8
BLAKE2b-256 df99e2c0f8e7071e54b2f8d413b424d4334bf8e416f9c0697d9358ecc70b3d7a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page