Skip to main content

Parallel data preprocessing for NLP and ML.

Project description

Wrangl

Tests Docs

Parallel data preprocessing and fast experiments for NLP and ML. See docs here.

Why?

I built this library to prototype ideas quickly. In essence it combines Hydra, Pytorch Lightning, moolib, and Ray for some fast data processing and (supervised/reinforcement) learning. The following are supported with command line or config tweaks (e.g. no additional boilerplate code):

  • checkpointing
  • early stopping
  • auto git diffs
  • logging to S3 (along with auto-generated seaborn plot), wandb
  • Slurm launcher

Installation

pip install -e .  # add [dev] if you want to run tests and build docs.

# for latest
pip install git+https://github.com/vzhong/wrangl

# pypi release
pip install wrangl

If moolib install fails because you do not have CUDA you can try installing it yourself with env USE_CUDA=0 pip install moolib.

Usage

See the documentation for how to use Wrangl. Examples of projects using Wrangl are found in wrangl.examples. In particular wrangl.examples.learn.xor_clf shows an example of using Wrangl to quickly set up a supervised classification task. For parallel data preprocessing wrangl.examples.preprocess.using_stanza shows an example of using Stanford NLP Stanza to parse text in parallel across CPU cores.

If you find this work helpful, please consider citing

@misc{zhong2021wrangl,
  author = {Zhong, Victor},
  title = {Wrangl: Parallel data preprocessing for NLP and ML},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/vzhong/wrangl}}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wrangl-0.0.8.tar.gz (119.9 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page