A tool for converting raw text to slot
Project description
bert_slot_tokenizer
Version 0.3
bert_slot_tokenizer 是一个将slot filling 任务中slot解析为其他格式的工具
环境:
- Python 3
- Python 2
安装:
pip install bert-slot-tokenizer
支持的格式:
- IOB格式
- IOBS格式
- BMES格式
- SPAN格式
使用方法:
from bert_slot_tokenizer import SlotConverter
vocab_path = 'tests/test_data/example_vocab.txt'
# you can find a example here --> https://github.com/DevRoss/bert-slot-tokenizer/blob/master/tests/test_data/example_vocab.txt
sc = SlotConverter(vocab_path, do_lower_case=True)
text = 'Too YOUNG, too simple, sometimes naive! 蛤蛤+1s蛤蛤蛤嗝'
slot = {'蛤蛤': 'name', '+1s': 'time', '嗝': '语气'}
output_text, iob_slot = sc.convert(text, slot, fmt='IOB')
output_text, iobs_slot = sc.convert(text, slot, fmt='IOBS')
output_text, bmes_slot = sc.convert(text, slot, fmt='BMES')
output_text, span_slot = sc.convert(text, slot, fmt='SPAN')
print(output_text)
# ['too', 'young', ',', 'too', 'simple', ',', 'some', '##times', 'na', '##ive', '!', '蛤', '蛤', '+', '1', '##s', '蛤', '蛤', '蛤', '嗝']
print(iob_slot)
# ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-name', 'I-name', 'B-time', 'I-time', 'I-time', 'B-name', 'I-name', 'O', 'B-语气']
print(iobs_slot)
# ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-name', 'I-name', 'B-time', 'I-time', 'I-time', 'B-name', 'I-name', 'O', 'S-语气']
print(bmes_slot)
# ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-name', 'E-name', 'B-time', 'M-time', 'E-time', 'B-name', 'E-name', 'O', 'S-语气']
print(span_slot)
# [[11, 12, 'name'], [13, 15, 'time'], [16, 17, 'name'], [19, 19, '语气']]
写在最后:
感谢BERT对NLP领域的推动
感谢开源
欢迎PR和issue
联系方式: devross1997@gmail.com
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
bert_slot_tokenizer-0.3.0.tar.gz
(13.7 kB
view details)
Built Distribution
File details
Details for the file bert_slot_tokenizer-0.3.0.tar.gz
.
File metadata
- Download URL: bert_slot_tokenizer-0.3.0.tar.gz
- Upload date:
- Size: 13.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe6051eca73de131608f420da3d6655ba1be7ca8da716d2e7732d199ea0cd370 |
|
MD5 | 7ab3beebbf1f0f204acaef89064b1625 |
|
BLAKE2b-256 | 27585e79b31379724eb3de47630bf89e1ff733eaf8fdd6eb5f14acee6e6558e9 |
File details
Details for the file bert_slot_tokenizer-0.3.0-py2.py3-none-any.whl
.
File metadata
- Download URL: bert_slot_tokenizer-0.3.0-py2.py3-none-any.whl
- Upload date:
- Size: 14.0 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 32fa535f3bc4f998a9d7fbb384b2c796f87224314cf04fce2acd8ed9a6466cfa |
|
MD5 | 62d6c79e8959eb248e8ce58af2aabf84 |
|
BLAKE2b-256 | 2be6f59a2f457683e7f258d1152eabf7d931a4524bb4d83b1e1c727211b8255a |