Skip to main content

Parallel data preprocessing for NLP and ML.

Project description

Wrangl

Tests Docs

Parallel data preprocessing for NLP and ML. See docs here. If you find this work helpful, please consider citing

@misc{zhong2021wrangl,
  author = {Zhong, Victor},
  title = {Wrangl: Parallel data preprocessing for NLP and ML},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/vzhong/wrangl}}
}

The supervised learning dataset parallelization component of this library uses Ray. The reinforcement learning environment parallelization component of this library uses Torchbeast.

Installation

pip install -e .  # add [dev] if you want to run tests and build docs.

# for latest
pip install git+https://github.com/vzhong/wrangl

# pypi release
pip install wrangl

Usage

See examples for usage. Here are some common use cases:

Commandline utilities

Current supports:

  • annotating text files
  • plotting learning curves
  • autodocumenting this package
wrangl -h

Run tests

python -m unittest discover tests

Generate docs

wrangl autodoc

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wrangl-0.0.6.tar.gz (17.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page