recluse

Reproducible Experimentation for Computational Linguistics Use

Project description

Recluse

Author: L. Amber Wilcox-O’Hearn

Released under the GNU AFFERO GENERAL PUBLIC LICENSE, see COPYING file for details.

Introduction

Recluse (Reproducible Experimentation for Computational Linguistics Use) is a set of tools for running computational linguistics experiments reproducibly.

This version contains

utils, which has a function for reading and writing unicode with regular or compressed text, and a function for splitting a file into smaller pieces. The latter is needed for some tools that load everything into RAM, or train on all the data when we would be satisfied with training on partial data.
article_randomiser, which reproducibly randomly divides a corpus into training, development, and test sets.
nltk_based_segmenter_tokeniser, which does sentence segmentation and word tokenisation. It is optimised for Wikipedia type text, and it has a mode that preserves the untokenised text (modulo extra whitespace).
vocabulary_generator and the helper class vocabulary_cutter. This wraps srilm as it makes unigram counts, and then selects the most frequent.

Project details

Release history Release notifications | RSS feed

0.4.4

Mar 17, 2014

0.4.3

Nov 23, 2013

0.4.2

Nov 12, 2013

0.4.1

Nov 9, 2013

0.4.0

Nov 9, 2013

0.3.2

Nov 3, 2013

0.3.1

Nov 3, 2013

0.3.0

Oct 25, 2013

0.2.4

Oct 20, 2013

0.2.2

Oct 19, 2013

0.2.1

Oct 17, 2013

0.2.0

Oct 16, 2013

0.1.21

Oct 3, 2013

0.1.20

Oct 3, 2013

0.1.19

Oct 3, 2013

0.1.16

Oct 3, 2013

0.1.14

Sep 22, 2013

0.1.14-dirty

Sep 28, 2013

0.1.12

Sep 22, 2013

0.1.12-dirty

Sep 22, 2013

This version

0.1.11

Sep 21, 2013

0.1.10

Sep 21, 2013

0.1.9

Sep 21, 2013

0.1.7

Sep 15, 2013

0.1.6

Sep 15, 2013

0.1.5

Sep 15, 2013

0.1.4

Sep 14, 2013

0.1.3

Sep 14, 2013

0.1.2

Sep 14, 2013

0.1.1

Sep 10, 2013

0.1.0

Sep 5, 2013

unknown

Sep 21, 2013

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recluse-0.1.11.tar.gz (31.9 kB view hashes)

Uploaded Sep 21, 2013 Source

Hashes for recluse-0.1.11.tar.gz

Hashes for recluse-0.1.11.tar.gz
Algorithm	Hash digest
SHA256	`046047a7d953426f6b2b57cc9f0b1252dbaffd2fd2f1a2ddd41fd2b0c065a4bb`
MD5	`9be9f8893100e4337f92959761799b9c`
BLAKE2b-256	`eca2e06c5cd7084ffdf9c918aa38ecb9df6a17a88db688527350b71a9a1e8464`