Skip to main content

Yoctol Utterance processing utilities

Project description

UTTUT

travis codecov pypi release

UTTerance UTilities for dialogue system. This package provides some general utils when processing chatbot utterance data.

BERT Pipe

To create a pipe for BERT preprocessing, please take a look at BERT.

Installation

$ pip install uttut

Usage

Let's create a Pipe to preprocess a Datum with English utterance.

Build a Pipe

>>> from uttut.pipeline.pipe import Pipe

>>> p = Pipe()
>>> p.add('IntTokenWithSpace')
>>> p.add('FloatTokenWithSpace')
>>> p.add('MergeWhiteSpaceCharacters')
>>> p.add('StripWhiteSpaceCharacters')
>>> p.add('EngTokenizer')  # word-level (ref: BERT)
>>> p.add('AddSosEos', checkpoint='result_of_add_sos_eos')
>>> p.add('Pad', {'maxlen': 5})
>>> p.add(
    'Token2Index',
    {
       'token2index': {
            '<sos>': 0, '<eos>': 1,  # for  AddSosEos
            '<unk>': 2, '<pad>': 3,  # for Pad
            '_int_': 4,  # for IntTokenWithSpace
            '_float_': 5,  # for FloatTokenWithSpace
            'I': 6,
            'apples': 7,
        },
    },
)

transform

>>> from uttut.elements import Datum, Entity, Intent
>>> datum = Datum(
    utterance='I like apples.',
    intents=[Intent(label=1), Intent(label=2)],
    entities=[Entity(start=7, end=13, value='apples', label=7)],
)
>>> output_indices, intent_labels, entity_labels, label_aligner, intermediate = p.transform(datum)
>>> output_indices
[0, 6, 2, 7, 1, 3, 3]
>>> intent_labels
[1, 2]
>>> entity_labels
[0, 0, 0, 7, 0, 0, 0]

# intermediate
>>> intermediate.get_from_checkpoint('result_of_add_sos_eos')
["<sos>", "I", "like", "apples", "<eos>"] 

# label_aligner
>>> label_aligner.inverse_transform(entity_labels)
[0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 0]

transform sequence

>>> output_sequence, label_aligner, intermediate = p.transform_sequence('I like apples.')
>>> output_sequence
[0, 6, 2, 7, 1, 3, 3]

# label_aligner
>>> label_aligner.transform([0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 0])
[0, 0, 0, 7, 0, 0, 0]
>>> label_aligner.inverse_transform([0, 0, 0, 7, 0, 0, 0])
[0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 0]

# intermediate
>>> intermediate.get_from_checkpoint('result_of_add_sos_eos')
["<sos>", "I", "like", "apples", "<eos>"]

Serialization

Serialize

>>> serialized_str = p.serialize()

Deserialize

>>> from uttut.pipeline.pipe import Pipe
>>> p = Pipe.deserialize(serialized_str )

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uttut-1.4.10.tar.gz (499.3 kB view details)

Uploaded Source

File details

Details for the file uttut-1.4.10.tar.gz.

File metadata

  • Download URL: uttut-1.4.10.tar.gz
  • Upload date:
  • Size: 499.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.3

File hashes

Hashes for uttut-1.4.10.tar.gz
Algorithm Hash digest
SHA256 a0fb3efd88888a241a904346e6a1d095912688d06907e3267ebf7f57618257eb
MD5 af42f3c0daaab0434926e183edb5c0a9
BLAKE2b-256 848bb83407642b79f20623d93a311fde05fbaf4cdb7d139e361f741caf8d0e29

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page