Skip to main content

A tool for converting raw text to slot

Project description

bert_slot_tokenizer

Version 0.3

Travis (.org) GitHub

bert_slot_tokenizer 是一个将slot filling 任务中slot解析为其他格式的工具

环境:

  • Python 3
  • Python 2

安装:

pip install bert-slot-tokenizer

支持的格式:

  • IOB格式
  • IOBS格式
  • BMES格式
  • SPAN格式

使用方法:

from bert_slot_tokenizer import SlotConverter
vocab_path = 'tests/test_data/example_vocab.txt' 
# you can find a example here --> https://github.com/DevRoss/bert-slot-tokenizer/blob/master/tests/test_data/example_vocab.txt
sc = SlotConverter(vocab_path, do_lower_case=True)
text = 'Too YOUNG, too simple, sometimes naive! 蛤蛤+1s蛤蛤蛤嗝'
slot = {'蛤蛤': 'name', '+1s': 'time', '嗝': '语气'}
output_text, iob_slot = sc.convert(text, slot, fmt='IOB')
output_text, iobs_slot = sc.convert(text, slot, fmt='IOBS')
output_text, bmes_slot = sc.convert(text, slot, fmt='BMES')
output_text, span_slot = sc.convert(text, slot, fmt='SPAN')
print(output_text)
# ['too', 'young', ',', 'too', 'simple', ',', 'some', '##times', 'na', '##ive', '!', '蛤', '蛤', '+', '1', '##s', '蛤', '蛤', '蛤', '嗝']

print(iob_slot)
# ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-name', 'I-name', 'B-time', 'I-time', 'I-time', 'B-name', 'I-name', 'O', 'B-语气']

print(iobs_slot)
# ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-name', 'I-name', 'B-time', 'I-time', 'I-time', 'B-name', 'I-name', 'O', 'S-语气']

print(bmes_slot)
# ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-name', 'E-name', 'B-time', 'M-time', 'E-time', 'B-name', 'E-name', 'O', 'S-语气']

print(span_slot)
# [[11, 12, 'name'], [13, 15, 'time'], [16, 17, 'name'], [19, 19, '语气']]

写在最后:

感谢BERT对NLP领域的推动

感谢开源

欢迎PR和issue

联系方式: devross1997@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bert_slot_tokenizer-0.3.0.tar.gz (13.7 kB view details)

Uploaded Source

Built Distribution

bert_slot_tokenizer-0.3.0-py2.py3-none-any.whl (14.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file bert_slot_tokenizer-0.3.0.tar.gz.

File metadata

  • Download URL: bert_slot_tokenizer-0.3.0.tar.gz
  • Upload date:
  • Size: 13.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.6

File hashes

Hashes for bert_slot_tokenizer-0.3.0.tar.gz
Algorithm Hash digest
SHA256 fe6051eca73de131608f420da3d6655ba1be7ca8da716d2e7732d199ea0cd370
MD5 7ab3beebbf1f0f204acaef89064b1625
BLAKE2b-256 27585e79b31379724eb3de47630bf89e1ff733eaf8fdd6eb5f14acee6e6558e9

See more details on using hashes here.

File details

Details for the file bert_slot_tokenizer-0.3.0-py2.py3-none-any.whl.

File metadata

  • Download URL: bert_slot_tokenizer-0.3.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 14.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.6

File hashes

Hashes for bert_slot_tokenizer-0.3.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 32fa535f3bc4f998a9d7fbb384b2c796f87224314cf04fce2acd8ed9a6466cfa
MD5 62d6c79e8959eb248e8ce58af2aabf84
BLAKE2b-256 2be6f59a2f457683e7f258d1152eabf7d931a4524bb4d83b1e1c727211b8255a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page