Skip to main content

Parallel data preprocessing for NLP and ML.

Project description

Wrangl

Parallel data preprocessing for NLP and ML. See docs here. If you find this work helpful, please consider citing

@misc{zhong2021wrangl,
  author = {Zhong, Victor},
  title = {Wrangl: Parallel data preprocessing for NLP and ML},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/vzhong/wrangl}}
}

The supervised learning dataset parallelization component of this library uses Ray. The reinforcement learning environment parallelization component of this library uses Torchbeast.

Installation

pip install -e .  # add [dev] if you want to run tests and build docs.

# for latest
pip install git+https://github.com/vzhong/wrangl

# pypi release
pip install wrangl

Usage

See examples for usage. Here are some common use cases:

Additional utilities

Annotate data in commandline:

wannotate -h

Run tests

python -m unittest discover tests

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wrangl-0.0.5.tar.gz (25.1 kB view hashes)

Uploaded Source

Built Distribution

wrangl-0.0.5-py3-none-any.whl (30.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page