Skip to main content

A BERT implementation for twitter in Spanish.

Project description

Bilma

Bert In Latin aMericA

Bilma is a BERT implementation in tensorflow and trained on the Masked Language Model task under the https://sadit.github.io/regional-spanish-models-talk-2022/ datasets.

The regional models can be downloaded from http://geo.ingeotec.mx/~lgruiz/regional-models-bilma/. You will also need to download the vocabulary file which is common to all the model and regions.

The accuracy of the models trained on the MLM task for different regions are:

bilma-mlm-comp

We also fine tuned the models for emoticon prediction, the resulting accuracy is as follows:

bilma-cls-comp

Pre-requisites

You will need TensorFlow 2.4 or newer.

Quick guide

You can see the demo notebooks for a quick guide on how to use the models.

Clone this repository and then run

bash download-emoji15-bilma.sh

to download the MX model. Then to load the model you can use the code:

from bilma import bilma_model
vocab_file = "vocab_file_All.txt"
model_file = "bilma_small_MX_epoch-1_classification_epochs-13.h5"
model = bilma_model.load(model_file)
tokenizer = bilma_model.tokenizer(vocab_file=vocab_file,
max_length=280)

Now you will need some text:

texts = ["Tenemos tres dias sin internet ni senal de celular en el pueblo.",
         "Incomunicados en el siglo XXI tampoco hay servicio de telefonia fija",
         "Vamos a comer unos tacos",
         "Los del banco no dejan de llamarme"]
toks = tokenizer.tokenize(texts)

With this, you are ready to use the model

p = model.predict(toks)
tokenizer.decode_emo(p[1])

which produces the output: emoji-output each emoji correspond to each entry in texts.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bilma-0.1.11.tar.gz (112.4 kB view details)

Uploaded Source

Built Distribution

bilma-0.1.11-py3-none-any.whl (110.7 kB view details)

Uploaded Python 3

File details

Details for the file bilma-0.1.11.tar.gz.

File metadata

  • Download URL: bilma-0.1.11.tar.gz
  • Upload date:
  • Size: 112.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.8

File hashes

Hashes for bilma-0.1.11.tar.gz
Algorithm Hash digest
SHA256 69925ef56c7df1bb86c020bd324de90d7dc0c9c8177291e1a8b74f178d6f765b
MD5 97ed82b64d90a1faf82a693699317c1e
BLAKE2b-256 4c77f6cd9e852785a0b28075e1df8971483cee3936577ff37a577e080c3aa081

See more details on using hashes here.

File details

Details for the file bilma-0.1.11-py3-none-any.whl.

File metadata

  • Download URL: bilma-0.1.11-py3-none-any.whl
  • Upload date:
  • Size: 110.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.8

File hashes

Hashes for bilma-0.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 b5ae8f1e1f27ce9bf999f191d014d2c6c624ac402148950a190869820529cbae
MD5 d3dd2b55d0873abef7eea62f33b5d95f
BLAKE2b-256 fac9c5c9b3547fa598d8dceb374bfd88b1b3a2b3ccaa500df8c898784ea3e93f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page