A BERT implementation for twitter in Spanish.
Project description
Bilma
Bert In Latin aMericA
Bilma is a BERT implementation in tensorflow and trained on the Masked Language Model task under the https://sadit.github.io/regional-spanish-models-talk-2022/ datasets.
The regional models can be downloaded from http://geo.ingeotec.mx/~lgruiz/regional-models-bilma/. You will also need to download the vocabulary file which is common to all the model and regions.
The accuracy of the models trained on the MLM task for different regions are:
We also fine tuned the models for emoticon prediction, the resulting accuracy is as follows:
Pre-requisites
You will need TensorFlow 2.4 or newer.
Quick guide
You can see the demo notebooks for a quick guide on how to use the models.
Clone this repository and then run
bash download-emoji15-bilma.sh
to download the MX model. Then to load the model you can use the code:
from bilma import bilma_model
vocab_file = "vocab_file_All.txt"
model_file = "bilma_small_MX_epoch-1_classification_epochs-13.h5"
model = bilma_model.load(model_file)
tokenizer = bilma_model.tokenizer(vocab_file=vocab_file,
max_length=280)
Now you will need some text:
texts = ["Tenemos tres dias sin internet ni senal de celular en el pueblo.",
"Incomunicados en el siglo XXI tampoco hay servicio de telefonia fija",
"Vamos a comer unos tacos",
"Los del banco no dejan de llamarme"]
toks = tokenizer.tokenize(texts)
With this, you are ready to use the model
p = model.predict(toks)
tokenizer.decode_emo(p[1])
which produces the output:
each emoji correspond to each entry in texts
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bilma-0.1.11.tar.gz
.
File metadata
- Download URL: bilma-0.1.11.tar.gz
- Upload date:
- Size: 112.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 69925ef56c7df1bb86c020bd324de90d7dc0c9c8177291e1a8b74f178d6f765b |
|
MD5 | 97ed82b64d90a1faf82a693699317c1e |
|
BLAKE2b-256 | 4c77f6cd9e852785a0b28075e1df8971483cee3936577ff37a577e080c3aa081 |
File details
Details for the file bilma-0.1.11-py3-none-any.whl
.
File metadata
- Download URL: bilma-0.1.11-py3-none-any.whl
- Upload date:
- Size: 110.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5ae8f1e1f27ce9bf999f191d014d2c6c624ac402148950a190869820529cbae |
|
MD5 | d3dd2b55d0873abef7eea62f33b5d95f |
|
BLAKE2b-256 | fac9c5c9b3547fa598d8dceb374bfd88b1b3a2b3ccaa500df8c898784ea3e93f |