some useful code
Project description
This is a project for later lazy work! Only support for python3, ☹️, but maybe you can try in python2
Install
命令行直接安装
pip install poros
从代码库安装
git clone https://github.com/diqiuzhuanzhuan/poros.git
cd poros
python setup install
Some code is from other people, and some is from me.
unilmv2
- create pretraining data
from poros.unilmv2.dataman import PreTrainingDataMan
# vocab_file: make sure [SOS],[EOS] and [Pseudo] are in vocab_file
vocab_file = "vocab_file"# your vocab file
ptdm = PreTrainingDataMan(vocab_file=vocab_file, max_seq_length=128, max_predictions_per_seq=20, random_seed=2334)
input_file = "my_input_file" #file format is like bert
output_file = "my_output_file" #the output file is a tfrecord file
ptdm.create_pretraining_data(input_file, output_file)
dataset = ptdm.read_data_from_tfrecord(output_file, is_training=True, batch_size=8)
- create unilmv2 model and train it
from poros.unilmv2.config import Unilmv2Config
from poros.unilmv2 import Unilmv2Model
from poros_train import optimization
"""
the configuration is like this:
{
"attention_probs_dropout_prob": 0.1,
"directionality": "bidi",
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.08,
"intermediate_size": 3072,
"max_position_embeddings": 512,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pooler_fc_size": 768,
"pooler_num_attention_heads": 12,
"pooler_num_fc_layers": 3,
"pooler_size_per_head": 128,
"pooler_type": "first_token_transform",
"type_vocab_size": 2,
"vocab_size": 21131
}
A json file recording these configuration is recommended.
"""
json_file = "my_config_file"
unilmv2_config = Unilmv2Config.from_json_file(json_file)
unilmv2_model = Unilmv2Model(config=unilmv2_config, is_training=True)
epoches=2000
steps_per_epoch=15
optimizer = optimization.create_optimizer(init_lr=6e-4, num_train_steps=epoches * steps_per_epoch, num_warmup_steps=1500)
unilmv2_model.compile(optimizer=optimizer)
unilmv2_model.fit(dataset, epochs=epoches, steps_per_epoch=15)
bert
usage:
- create pretrain data
from poros.bert import create_pretraining_data
>>> create_pretraining_data.main(input_file="./test_data/sample_text.txt",
output_file="./test_data/output", vocab_file="./test_data/vocab.txt")
- pretrain bert model
from poros.bert_model import pretrain
>>> pretrain.run(input_file="./test_data/output", bert_config_file="./test_data/bert_config.json",
output_dir="./output")
- prepare a trained model, tell classifier model
- prepare train.csv and test.csv, its format is like this: "id, text1, label", but note no header!
- init the model, the code is like below
from poros.bert_model.run_classifier import SimpleClassifierModel
>>> model = SimpleClassifierModel(
bert_config_file="./data/chinese_L-12_H-768_A-12/bert_config.json",
vocab_file="./data/chinese_L-12_H-768_A-12/vocab.txt",
output_dir="./output",
max_seq_length=512,
train_file="./data/train.csv",
dev_file="./data/dev.csv",
init_checkpoint="./data/chinese_L-12_H-768_A-12/bert_model.ckpt",
label_list=["0", "1", "2", "3"]
)
sentence bert
from poros.sentence_bert.apps import SentenceBert
from poros.sentence_bert.dataman import SnliDataMan
import tensorflow as tf
>>> snli_dataman = SnliDataMan()
>>> data = snli_dataman.batch(data_type='train', batch_size=32)
>>> sbm = SentenceBert(loss_fn='triple_loss')
>>> optimazer = tf.keras.optimizers.Adam(learning_rate=1e-5)
>>> sbm.compile(optimizer=optimazer)
>>> sbm.fit(data, epochs=1, steps_per_epoch=20)
poros_dataset
some operations about tensor
from poros.poros_dataset import about_tensor
import tensorflow as tf
>>> A = tf.constant(value=[0])
>>> print(about_tensor.get_shape(A))
[1]
poros_chars
Provide a list of small functions
usage:
- convert chinese words into arabic number:
from poros.poros_chars import chinese_to_arabic
>>> print(chinese_to_arabic.NumberAdapter.convert("四千三百万"))
43000000
Thanks
PyCharm
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
poros-0.0.68.tar.gz
(20.6 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
poros-0.0.68-py3-none-any.whl
(195.4 kB
view details)
File details
Details for the file poros-0.0.68.tar.gz.
File metadata
- Download URL: poros-0.0.68.tar.gz
- Upload date:
- Size: 20.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.0.0.post20200309 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fbaebe63dfd23604a58aa314aca79bd2ef29ef81a7c63c417ef45f9d91d01655
|
|
| MD5 |
57513c2edcb87351fb5cb6f72e8e159c
|
|
| BLAKE2b-256 |
288c1ca63c3e56b3bc896c6d38692029ebb3c9b287edecf91e1ff9d4d8203fcd
|
File details
Details for the file poros-0.0.68-py3-none-any.whl.
File metadata
- Download URL: poros-0.0.68-py3-none-any.whl
- Upload date:
- Size: 195.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.0.0.post20200309 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d74313bef85d8d87026642206136133769163b01ca1bc41d32536ea991c5c78
|
|
| MD5 |
da9598ef1357ab6d7cefcf6a371c45f1
|
|
| BLAKE2b-256 |
fdc4529b2ec7a2962c4258c512379e1784f600b4eb7eb343a2a5ddf04746ebe8
|