Skip to main content

Reversible String Process Pipeline

Project description

Build Status PyPI version

Reversible string processing pipe. Featuring reproducibility, serializability and performance.

Installation

pip install strpipe

Usage

import strpipe as sp

p = sp.Pipe()
p.add_step_by_op_name(
    op_name='Trim',
    op_kwargs={'tokens': ['\n', '\r']},
)
p.add_step_by_op_name('CharTokenize')
p.add_step_by_op_name(
    op_name='MapStringToIndex',
    state={'你': 0, '好': 1, '早': 2},  # if provided, the p.fit won't change it
)

data = [
    '你好啊\n',
    '早安',
    '你早上好\n',
]

p.fit(data)
result, tx_info = p.transform(data)  # convention: tx => tranform
back_data = p.inverse_transform(result, tx_info)

Serialization

# Save it
p.save_json('/path/of/pipe')

# Load it
p = sp.Pipe.restore_from_json('/path/of/pipe')
result, meta = p.transform(['你好'])

Test

$ make test

Docs

$ make docs

Docs will be built in the `docs/build/html` folder. (Note: this also reinstalls the package because we
need Cython code to be rebuilt.)

Extend Ops

  1. Extend the new ops with BaseOp

  2. Define input_type, output_type

  3. Implement op creation

  4. Implement fit, transform, inverse_transform. If the op is stateless, the fit method should return None.

    Note: It is expected that an ops’s functionality will often be able to be decomposed into several functions. These functions should be written into (or imported from) the toolkit package for easy reuse. Ops in the ops package will, for the most part, be wrappers for functions in toolkit.

  5. Write tests

  6. Register to op_factory

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strpipe-0.4.1.tar.gz (21.3 kB view details)

Uploaded Source

File details

Details for the file strpipe-0.4.1.tar.gz.

File metadata

  • Download URL: strpipe-0.4.1.tar.gz
  • Upload date:
  • Size: 21.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.3

File hashes

Hashes for strpipe-0.4.1.tar.gz
Algorithm Hash digest
SHA256 c47c9ab4edeae7a87e890914721d994117259f3d8f822f8abe6d78496b7cdb6f
MD5 6587137e5904bf8bfb5b01769880c200
BLAKE2b-256 efd7544f0d7724bdbbac9eebc5d5872ecd5b742e0452718a4d0f65e7b0e9d191

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page