Skip to main content

Ray-based preprocesisng pipeline.

Project description

chunkyp

A small and concise data preprocessing library inspired by common NLP preprocessing workflows.

Supports ray.

Installation

chunkyp is available on PyPi.

pip install chunkyp

For the dev version you can run the following.

git clone https://github.com/neophocion/chunkyp
cd chunkyp
pip install -e .

Usage

The simplest way to get started is to look at the Jupyter notebooks in notebooks/

A small example:

from chunkyp import 

res = pipe(
    records, # a list, or iterator across, dicts
    p('field', lambda x: x.lower()),
    p('field', lambda x: x.upper(), 'new_field'),
    p(['field1', 'field2'], lambda x,y: len(x.split()) == y, 'new_field2'),
)

res = list(res)
res

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chunkyp-0.0.2.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

chunkyp-0.0.2-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file chunkyp-0.0.2.tar.gz.

File metadata

  • Download URL: chunkyp-0.0.2.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.7

File hashes

Hashes for chunkyp-0.0.2.tar.gz
Algorithm Hash digest
SHA256 4521b5d3fa62a874bffb06a090c65bb414dfea52471b77a2516f105607a435af
MD5 d5c232a193341465599f28c7e04033cb
BLAKE2b-256 a85dfe5598e107467a8496b282a736a190a9100eb6b0d0bd3d9fd1d6f6816e19

See more details on using hashes here.

File details

Details for the file chunkyp-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: chunkyp-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.7

File hashes

Hashes for chunkyp-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 146ebc50b1c360f0da830d9dcaceacfb0ed61a0967f088ccfed73caa84f8334c
MD5 b9257fe661537b5cec35b4532d2d46bf
BLAKE2b-256 19d8181f47ca8e6f94956cb5cee8f2a9302486c56671801ffb7cc3d44501d7df

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page