Skip to main content

Reproducible Experimentation for Computational Linguistics Use

Project description

Recluse

Author: L. Amber Wilcox-O’Hearn

Contact: amber@cs.toronto.edu

Released under the GNU AFFERO GENERAL PUBLIC LICENSE, see COPYING file for details.

Introduction

Recluse (Reproducible Experimentation for Computational Linguistics Use) is a set of tools for running computational linguistics experiments reproducibly.

This version contains

  • utils, which has a function for reading and writing unicode with regular or compressed text.

  • article_randomiser, which reproducibly randomly divides a corpus into training, development, and test sets.

  • nltk_based_segmenter_tokeniser, which does sentence segmentation and and tokenisation. It is optimised for Wikipedia type text, and it has a mode that preserves the untokenised text (modulo extra whitespace).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recluse-0.1.2.tar.gz (23.1 kB view details)

Uploaded Source

File details

Details for the file recluse-0.1.2.tar.gz.

File metadata

  • Download URL: recluse-0.1.2.tar.gz
  • Upload date:
  • Size: 23.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for recluse-0.1.2.tar.gz
Algorithm Hash digest
SHA256 774cf5a1b56d069b2181fbaa8d306b11a90e30bcd643af81b328a5a06a677287
MD5 fe145147278e506979d5ef0b8be4b905
BLAKE2b-256 73604ac759f8c0472962f021ba9f3adbf8eaf395c531feb27228f23557051e57

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page