Yoctol Utterance processing utilities
Project description
UTTUT
UTTerance UTilities for dialogue system. This package provides some general utils when processing chatbot utterance data.
Installation
$ pip install uttut
Usage
Let’s create a Pipe to preprocess a Datum with English utterance.
>>> from uttut.pipeline.pipe import Pipe
>>> p = Pipe()
>>> p.add('IntTokenWithSpace')
>>> p.add('FloatTokenWithSpace')
>>> p.add('MergeWhiteSpaceCharacters')
>>> p.add('StripWhiteSpaceCharacters')
>>> p.add('EngTokenizer') # word-level (ref: BERT)
>>> p.add('AddSosEos')
>>> p.add('Pad')
>>> p.add(
'Token2Index',
{
'<sos>': 0, '<eos>': 1, # for AddSosEos
'<unk>': 2, '<pad>': 3, # for Pad
'_int_': 4, # for IntTokenWithSpace
'_float_': 5, # for FloatTokenWithSpace
'I': 6,
'apples': 7,
},
)
>>> from uttut.elements import Datum, Entity, Intent
>>> datum = Datum(
utterance='I like apples.',
intents=[Intent(label=1), Intent(label=2)],
entities=[Entity(start=7, end=12, value='apples', label=7)],
)
>>> output_indices, intent_labels, entity_labels, realigner = p.transform(datum)
>>> output_indices
[0, 6, 2, 7, 1, 3, 3]
>>> intent_labels
[1, 2]
>>> entity_labels
[0, 0, 0, 7, 0, 0, 0]
>>> realigner(entity_labels)
[0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 0]
Serialization
Serialize
>>> serialized_str = p.serialize()
Deserialize
>>> from uttut.pipeline.pipe import Pipe
>>> p = Pipe.deserialize(serialized_str )
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
uttut-1.0.0.tar.gz
(205.9 kB
view hashes)