Yoctol Utterance processing utilities
Project description
UTTUT
UTTerance UTilities for dialogue system. This package provides some general utils when processing chatbot utterance data.
BERT Pipe
To create a pipe for BERT preprocessing, please take a look at BERT.
Installation
$ pip install uttut
Usage
Let's create a Pipe to preprocess a Datum with English utterance.
Build a Pipe
>>> from uttut.pipeline.pipe import Pipe
>>> p = Pipe()
>>> p.add('IntTokenWithSpace')
>>> p.add('FloatTokenWithSpace')
>>> p.add('MergeWhiteSpaceCharacters')
>>> p.add('StripWhiteSpaceCharacters')
>>> p.add('EngTokenizer') # word-level (ref: BERT)
>>> p.add('AddSosEos', checkpoint='result_of_add_sos_eos')
>>> p.add('Pad', {'maxlen': 5})
>>> p.add(
'Token2Index',
{
'token2index': {
'<sos>': 0, '<eos>': 1, # for AddSosEos
'<unk>': 2, '<pad>': 3, # for Pad
'_int_': 4, # for IntTokenWithSpace
'_float_': 5, # for FloatTokenWithSpace
'I': 6,
'apples': 7,
},
},
)
transform
>>> from uttut.elements import Datum, Entity, Intent
>>> datum = Datum(
utterance='I like apples.',
intents=[Intent(label=1), Intent(label=2)],
entities=[Entity(start=7, end=13, value='apples', label=7)],
)
>>> output_indices, intent_labels, entity_labels, label_aligner, intermediate = p.transform(datum)
>>> output_indices
[0, 6, 2, 7, 1, 3, 3]
>>> intent_labels
[1, 2]
>>> entity_labels
[0, 0, 0, 7, 0, 0, 0]
# intermediate
>>> intermediate.get_from_checkpoint('result_of_add_sos_eos')
["<sos>", "I", "like", "apples", "<eos>"]
# label_aligner
>>> label_aligner.inverse_transform(entity_labels)
[0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 0]
transform sequence
>>> output_sequence, label_aligner, intermediate = p.transform_sequence('I like apples.')
>>> output_sequence
[0, 6, 2, 7, 1, 3, 3]
# label_aligner
>>> label_aligner.transform([0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 0])
[0, 0, 0, 7, 0, 0, 0]
>>> label_aligner.inverse_transform([0, 0, 0, 7, 0, 0, 0])
[0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 0]
# intermediate
>>> intermediate.get_from_checkpoint('result_of_add_sos_eos')
["<sos>", "I", "like", "apples", "<eos>"]
Serialization
Serialize
>>> serialized_str = p.serialize()
Deserialize
>>> from uttut.pipeline.pipe import Pipe
>>> p = Pipe.deserialize(serialized_str )
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
uttut-1.4.10.tar.gz
(499.3 kB
view details)
File details
Details for the file uttut-1.4.10.tar.gz
.
File metadata
- Download URL: uttut-1.4.10.tar.gz
- Upload date:
- Size: 499.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a0fb3efd88888a241a904346e6a1d095912688d06907e3267ebf7f57618257eb |
|
MD5 | af42f3c0daaab0434926e183edb5c0a9 |
|
BLAKE2b-256 | 848bb83407642b79f20623d93a311fde05fbaf4cdb7d139e361f741caf8d0e29 |