some useful code
Project description
This is a project for later lazy work! Only support for python3, ☹️, but maybe you can try in python2
Install
命令行直接安装
pip install poros
从代码库安装
git clone https://github.com/diqiuzhuanzhuan/poros.git
cd poros
python setup install
Some code is from other people, and some is from me.
unilmv2
- create pretraining data
from poros.unilmv2.dataman import PreTrainingDataMan # vocab_file: make sure [SOS],[EOS] and [Pseudo] are in vocab_file vocab_file = "vocab_file"# your vocab file ptdm = PreTrainingDataMan(vocab_file=vocab_file, max_seq_length=128, max_predictions_per_seq=20, random_seed=2334) input_file = "my_input_file" #file format is like bert output_file = "my_output_file" #the output file is a tfrecord file ptdm.create_pretraining_data(input_file, output_file) dataset = ptdm.read_data_from_tfrecord(output_file, is_training=True, batch_size=8)
- create unilmv2 model and train it
from poros.unilmv2.config import Unilmv2Config from poros.unilmv2 import Unilmv2Model from poros_train import optimization """ the configuration is like this: { "attention_probs_dropout_prob": 0.1, "directionality": "bidi", "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.08, "intermediate_size": 3072, "max_position_embeddings": 512, "num_attention_heads": 12, "num_hidden_layers": 12, "pooler_fc_size": 768, "pooler_num_attention_heads": 12, "pooler_num_fc_layers": 3, "pooler_size_per_head": 128, "pooler_type": "first_token_transform", "type_vocab_size": 2, "vocab_size": 21131 } A json file recording these configuration is recommended. """ json_file = "my_config_file" unilmv2_config = Unilmv2Config.from_json_file(json_file) unilmv2_model = Unilmv2Model(config=unilmv2_config, is_training=True) epoches=2000 steps_per_epoch=15 optimizer = optimization.create_optimizer(init_lr=6e-4, num_train_steps=epoches * steps_per_epoch, num_warmup_steps=1500) unilmv2_model.compile(optimizer=optimizer) unilmv2_model.fit(dataset, epochs=epoches, steps_per_epoch=15)
bert
usage:
- create pretrain data
from poros.bert import create_pretraining_data >>> create_pretraining_data.main(input_file="./test_data/sample_text.txt", output_file="./test_data/output", vocab_file="./test_data/vocab.txt")
- pretrain bert model
from poros.bert_model import pretrain >>> pretrain.run(input_file="./test_data/output", bert_config_file="./test_data/bert_config.json", output_dir="./output")
- prepare a trained model, tell classifier model
- prepare train.csv and test.csv, its format is like this: "id, text1, label", but note no header!
- init the model, the code is like below
from poros.bert_model.run_classifier import SimpleClassifierModel >>> model = SimpleClassifierModel( bert_config_file="./data/chinese_L-12_H-768_A-12/bert_config.json", vocab_file="./data/chinese_L-12_H-768_A-12/vocab.txt", output_dir="./output", max_seq_length=512, train_file="./data/train.csv", dev_file="./data/dev.csv", init_checkpoint="./data/chinese_L-12_H-768_A-12/bert_model.ckpt", label_list=["0", "1", "2", "3"] )
sentence bert
from poros.sentence_bert.apps import SentenceBert from poros.sentence_bert.dataman import SnliDataMan import tensorflow as tf >>> snli_dataman = SnliDataMan() >>> data = snli_dataman.batch(data_type='train', batch_size=32) >>> sbm = SentenceBert(loss_fn='triple_loss') >>> optimazer = tf.keras.optimizers.Adam(learning_rate=1e-5) >>> sbm.compile(optimizer=optimazer) >>> sbm.fit(data, epochs=1, steps_per_epoch=20)
poros_dataset
some operations about tensor
from poros.poros_dataset import about_tensor import tensorflow as tf >>> A = tf.constant(value=[0]) >>> print(about_tensor.get_shape(A)) [1]
poros_chars
Provide a list of small functions
usage:
- convert chinese words into arabic number:
from poros.poros_chars import chinese_to_arabic >>> print(chinese_to_arabic.NumberAdapter.convert("四千三百万")) 43000000
Thanks
PyCharm
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
poros-0.0.68.tar.gz
(20.6 kB
view hashes)
Built Distribution
poros-0.0.68-py3-none-any.whl
(195.4 kB
view hashes)