Skip to main content

Ray-based preprocesisng pipeline.

Project description

chunkyp

A small and concise data preprocessing library inspired by common NLP preprocessing workflows.

Supports ray.

Installation

chunkyp is available on PyPi.

pip install chunkyp

For the dev version you can run the following.

git clone https://github.com/neophocion/chunkyp
cd chunkyp
pip install -e .

Usage

The simplest way to get started is to look at the Jupyter notebooks in notebooks/

A small example:

from chunkyp import 

res = pipe(
    records, # a list, or iterator across, dicts
    p('field', lambda x: x.lower()),
    p('field', lambda x: x.upper(), 'new_field'),
    p(['field1', 'field2'], lambda x,y: len(x.split()) == y, 'new_field2'),
)

res = list(res)
res

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chunkyp-0.0.2.tar.gz (4.8 kB view hashes)

Uploaded Source

Built Distribution

chunkyp-0.0.2-py3-none-any.whl (9.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page