recluse

Reproducible Experimentation for Computational Linguistics Use

Project description

Recluse

Author: L. Amber Wilcox-O'Hearn

Contact: amber@cs.toronto.edu

Released under the GNU AFFERO GENERAL PUBLIC LICENSE, see COPYING file for details.

==============
Introduction
==============

recluse (Reproducible Experimentation for Computational Linguistics USE) is a set of tools for running computational linguistics experiments reproducibly.

This version contains

* utils, which has four functions:
** open_with_unicode for reading and writing unicode with regular or compressed text
** split_file_into_chunks for splitting a file into smaller pieces. This is needed for some tools that load everything into RAM, or train on all the data when we would be satisfied with training on partial data.
** partition_by_list works like a combination of the string methods partition and split; it keeps the separators, but partitions into a list.
** precision_recall_f_measure calculates those things.

* article_selector (to replace article_randomiser below), reproducibly randomly selects a portion of a large corpus for the experiment, divides it into training, development, and test sets, and returns an article index to those sets.
* article_randomiser, which reproducibly randomly divides a corpus into training, development, and test sets.
* nltk_based_segmenter_tokeniser, which does sentence segmentation and word tokenisation.
It is optimised for Wikipedia type text.
* vocabulary_generator and the helper class vocabulary_cutter. This wraps srilm as it makes unigram counts, and then selects the most frequent.

============
Dependencies
============

recluse depends on the pypi package `regex`_, which (unlike re) has unicode category support.

sudo pip install regex

==========
Installing
==========

recluse is registered with pypi, so can be installed with pip:

sudo pip install recluse

.. _regex: https://pypi.python.org/pypi/regex/

Project details

Release history Release notifications | RSS feed

This version

0.4.4

Mar 17, 2014

0.4.3

Nov 23, 2013

0.4.2

Nov 12, 2013

0.4.1

Nov 9, 2013

0.4.0

Nov 9, 2013

0.3.2

Nov 3, 2013

0.3.1

Nov 3, 2013

0.3.0

Oct 25, 2013

0.2.4

Oct 20, 2013

0.2.2

Oct 19, 2013

0.2.1

Oct 17, 2013

0.2.0

Oct 16, 2013

0.1.21

Oct 3, 2013

0.1.20

Oct 3, 2013

0.1.19

Oct 3, 2013

0.1.16

Oct 3, 2013

0.1.14

Sep 22, 2013

0.1.14-dirty

Sep 28, 2013

0.1.12

Sep 22, 2013

0.1.12-dirty

Sep 22, 2013

0.1.11

Sep 21, 2013

0.1.10

Sep 21, 2013

0.1.9

Sep 21, 2013

0.1.7

Sep 15, 2013

0.1.6

Sep 15, 2013

0.1.5

Sep 15, 2013

0.1.4

Sep 14, 2013

0.1.3

Sep 14, 2013

0.1.2

Sep 14, 2013

0.1.1

Sep 10, 2013

0.1.0

Sep 5, 2013

unknown

Sep 21, 2013

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recluse-0.4.4.tar.gz (463.0 kB view details)

Uploaded Mar 17, 2014 Source

File details

Details for the file recluse-0.4.4.tar.gz.

File metadata

Download URL: recluse-0.4.4.tar.gz
Upload date: Mar 17, 2014
Size: 463.0 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for recluse-0.4.4.tar.gz
Algorithm	Hash digest
SHA256	`e9d964e635a18072d1709a23b4f19372c2721dfdedb33fcf9a8d3410de6e2e9e`
MD5	`962aa5060ad3a8dd4ca99b12e190d1cc`
BLAKE2b-256	`c16c62a8cccb1ceb99bc52c969628b84d3b6e52eaa292f04cc125bd451ff336e`

See more details on using hashes here.

recluse 0.4.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes