Skip to main content

Indobenchmark toolkit for supporting IndoNLU and IndoNLG

Project description

Indobenchmark Toolkit

Pull Requests Welcome GitHub license Contributor Covenant

Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG) resources for Bahasa Indonesia such as Institut Teknologi Bandung, Universitas Multimedia Nusantara, The Hong Kong University of Science and Technology, Universitas Indonesia, DeepMind, Gojek, and Prosa.AI.

Research Paper

IndoNLU has been accepted by AACL-IJCNLP 2020 and you can find the details in our paper https://www.aclweb.org/anthology/2020.aacl-main.85.pdf. If you are using any component on IndoNLU including Indo4B, FastText-Indo4B, or IndoBERT in your work, please cite the following paper:

@inproceedings{wilie2020indonlu,
  title={IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding},
  author={Bryan Wilie and Karissa Vincentio and Genta Indra Winata and Samuel Cahyawijaya and X. Li and Zhi Yuan Lim and S. Soleman and R. Mahendra and Pascale Fung and Syafri Bahar and A. Purwarianti},
  booktitle={Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing},
  year={2020}
}

IndoNLG has been accepted by EMNLP 2021 and you can find the details in our paper https://arxiv.org/abs/2104.08200. If you are using any component on IndoNLG including Indo4B-Plus, IndoBART, or IndoGPT in your work, please cite the following paper:

@misc{cahyawijaya2021indonlg,
      title={IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation}, 
      author={Samuel Cahyawijaya and Genta Indra Winata and Bryan Wilie and Karissa Vincentio and Xiaohong Li and Adhiguna Kuncoro and Sebastian Ruder and Zhi Yuan Lim and Syafri Bahar and Masayu Leylia Khodra and Ayu Purwarianti and Pascale Fung},
      year={2021},
      eprint={2104.08200},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

IndoNLU and IndoNLG Models

IndoBERT and IndoBERT-lite Models

We provide 4 IndoBERT and 4 IndoBERT-lite Pretrained Language Model [Link]

FastText (Indo4B)

We provide the full uncased FastText model file (11.9 GB) and the corresponding Vector file (3.9 GB)

  • FastText model (11.9 GB) [Link]
  • Vector file (3.9 GB) [Link]

We provide smaller FastText models with smaller vocabulary for each of the 12 downstream tasks

IndoBART and IndoGPT Models

We provide IndoBART and IndoGPT Pretrained Language Model [Link]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

indobenchmark-toolkit-0.0.3.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

indobenchmark_toolkit-0.0.3-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file indobenchmark-toolkit-0.0.3.tar.gz.

File metadata

  • Download URL: indobenchmark-toolkit-0.0.3.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.9.7

File hashes

Hashes for indobenchmark-toolkit-0.0.3.tar.gz
Algorithm Hash digest
SHA256 b0a951674cec15c3c6b43304310eb4a943ef9631263e433b165caa83b02eb199
MD5 6df313c7974b08b125c95a5f1070dc6f
BLAKE2b-256 ec83bbe11412066a066342f06a96313a88d6c6131291438f7b23e0e8cd7ae6cc

See more details on using hashes here.

File details

Details for the file indobenchmark_toolkit-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: indobenchmark_toolkit-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.9.7

File hashes

Hashes for indobenchmark_toolkit-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 092da59a7cfed27d68da5a5c14dbb2ffd3cb603a4f0ec13c8ec398c3dd24c89d
MD5 d642a036550b17c00c26b56a29b9eeba
BLAKE2b-256 76f3cdd53811174e72cd2d666fbaa57f891366d5852a0d223f6da98d28a14527

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page