Skip to main content

An customized pytorch version of pre-trained model.

Project description

torchKbert

  • Our customized version of bert for pytorch

说明

这是笔者基于 Meelfy 的 pytorch_pretrained_BERT 库进行部分定制化修改的模型库。

本项目的初衷是为了满足个人实验的方便,因此不会经常更新。

功能

使用

  • 安装:

    pip install torchKbert
    
  • 典型的使用例子请参考官方 examples 目录。

  • 若想使用层次分解位置编码,使 BERT 可以处理长文本,在 model 中传入参数 is_hierarchical=True 即可。示例如下:

    model = BertModel(config)
    encoder_outputs, _ = model(input_ids, token_ids, input_mask, is_hierarchical=True)
    
  • 若想使用基于词颗粒度的中文WoBERT,只需在构建BertTokenizer对象时传入新参数:

    from torchKbert.tokenization import BertTokenizer
    
    tokenizer = BertTokenizer(
        vocab_file=vocab_path, 
        pre_tokenizer=lambda s: jieba.cut(s, HMM=False))
    

    不传入时,默认为None。分词时默认以词为单位,若想恢复使用以字为单位,只需在tokenize时传入新参数pre_tokenize=False

    tokenzier.tokenize(text, pre_tokenize=False)
    

背景

之前一直在用 Meelfy 编写的 pytorch_pretrained_BERT,调用预训练模型或进行微调已经十分方便。后来因个人的需求,所以就想改写一个支持层次分解位置编码的版本。

苏神的 bert4keras 已经实现了这样的功能。但因个人惯于使用 pytorch,已经很久不用 keras 了,所以才打算自己改写一个。

更新

  • 2021.03.07 : 添加层次分解位置编码。
  • 2021.05.27 : 添加基于词颗粒度的中文WoBERT。

鸣谢

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchKbert-1.1.tar.gz (71.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

torchKbert-1.1-py2.py3-none-any.whl (88.1 kB view details)

Uploaded Python 2Python 3

File details

Details for the file torchKbert-1.1.tar.gz.

File metadata

  • Download URL: torchKbert-1.1.tar.gz
  • Upload date:
  • Size: 71.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for torchKbert-1.1.tar.gz
Algorithm Hash digest
SHA256 5ec58f0ee9df2c5b4b949a2c527220e20fe018b71771bbb3bc0aa42352b71330
MD5 c56148be05f3b7b9f5c89dfdf1d5543b
BLAKE2b-256 ab3d2c0001ea2236a4e8dea021cc2a1bbc037d08c4783301eb10483477c47a9d

See more details on using hashes here.

File details

Details for the file torchKbert-1.1-py2.py3-none-any.whl.

File metadata

  • Download URL: torchKbert-1.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 88.1 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for torchKbert-1.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 f5a7da611dd71d08caedb665b76e1ae5e696a20b0b83427c4048a06c669fce2c
MD5 ccea40c4bc967bc912e6974e0ca0cc9e
BLAKE2b-256 e983e48f7381ab3c92347a48ed4447ca9ad4cf46367515d73ba9e19d71987a49

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page