Indobenchmark toolkit for supporting IndoNLU and IndoNLG

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Indobenchmark Toolkit

Pull Requests Welcome

Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG) resources for Bahasa Indonesia such as Institut Teknologi Bandung, Universitas Multimedia Nusantara, The Hong Kong University of Science and Technology, Universitas Indonesia, DeepMind, Gojek, and Prosa.AI.

Research Paper

IndoNLU has been accepted by AACL-IJCNLP 2020 and you can find the details in our paper https://www.aclweb.org/anthology/2020.aacl-main.85.pdf. If you are using any component on IndoNLU including Indo4B, FastText-Indo4B, or IndoBERT in your work, please cite the following paper:

@inproceedings{wilie2020indonlu,
  title={IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding},
  author={Bryan Wilie and Karissa Vincentio and Genta Indra Winata and Samuel Cahyawijaya and X. Li and Zhi Yuan Lim and S. Soleman and R. Mahendra and Pascale Fung and Syafri Bahar and A. Purwarianti},
  booktitle={Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing},
  year={2020}
}

IndoNLG has been accepted by EMNLP 2021 and you can find the details in our paper https://arxiv.org/abs/2104.08200. If you are using any component on IndoNLG including Indo4B-Plus, IndoBART, or IndoGPT in your work, please cite the following paper:

@misc{cahyawijaya2021indonlg,
      title={IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation}, 
      author={Samuel Cahyawijaya and Genta Indra Winata and Bryan Wilie and Karissa Vincentio and Xiaohong Li and Adhiguna Kuncoro and Sebastian Ruder and Zhi Yuan Lim and Syafri Bahar and Masayu Leylia Khodra and Ayu Purwarianti and Pascale Fung},
      year={2021},
      eprint={2104.08200},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

IndoNLU and IndoNLG Models

IndoBERT and IndoBERT-lite Models

We provide 4 IndoBERT and 4 IndoBERT-lite Pretrained Language Model [Link]

IndoBERT-base
- Phase 1 [Link]
- Phase 2 [Link]
IndoBERT-large
- Phase 1 [Link]
- Phase 2 [Link]
IndoBERT-lite-base
- Phase 1 [Link]
- Phase 2 [Link]
IndoBERT-lite-large
- Phase 1 [Link]
- Phase 2 [Link]

FastText (Indo4B)

We provide the full uncased FastText model file (11.9 GB) and the corresponding Vector file (3.9 GB)

FastText model (11.9 GB) [Link]
Vector file (3.9 GB) [Link]

We provide smaller FastText models with smaller vocabulary for each of the 12 downstream tasks

FastText-Indo4B [Link]
FastText-CC-ID [Link]

IndoBART and IndoGPT Models

We provide IndoBART and IndoGPT Pretrained Language Model [Link]

IndoBART [Link]
IndoBART-v2 [Link]
IndoGPT2 [Link]

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.1.7

Dec 5, 2023

0.1.6

Dec 2, 2022

0.1.5

Dec 2, 2022

0.1.4

Jun 17, 2022

0.1.3

Jun 17, 2022

0.1.2

Jun 15, 2022

0.1.1

Jun 13, 2022

0.1.0

Jun 13, 2022

0.0.9

Jun 13, 2022

0.0.8

Jun 13, 2022

0.0.7

Jun 13, 2022

0.0.6

Apr 22, 2022

0.0.5

Oct 24, 2021

0.0.4

Oct 24, 2021

This version

0.0.3

Oct 17, 2021

0.0.2

Oct 17, 2021

0.0.1

Oct 17, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

indobenchmark-toolkit-0.0.3.tar.gz (9.1 kB view details)

Uploaded Oct 17, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

indobenchmark_toolkit-0.0.3-py3-none-any.whl (8.0 kB view details)

Uploaded Oct 17, 2021 Python 3

File details

Details for the file indobenchmark-toolkit-0.0.3.tar.gz.

File metadata

Download URL: indobenchmark-toolkit-0.0.3.tar.gz
Upload date: Oct 17, 2021
Size: 9.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.9.7

File hashes

Hashes for indobenchmark-toolkit-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`b0a951674cec15c3c6b43304310eb4a943ef9631263e433b165caa83b02eb199`
MD5	`6df313c7974b08b125c95a5f1070dc6f`
BLAKE2b-256	`ec83bbe11412066a066342f06a96313a88d6c6131291438f7b23e0e8cd7ae6cc`

See more details on using hashes here.

File details

Details for the file indobenchmark_toolkit-0.0.3-py3-none-any.whl.

File metadata

Download URL: indobenchmark_toolkit-0.0.3-py3-none-any.whl
Upload date: Oct 17, 2021
Size: 8.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.9.7

File hashes

Hashes for indobenchmark_toolkit-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`092da59a7cfed27d68da5a5c14dbb2ffd3cb603a4f0ec13c8ec398c3dd24c89d`
MD5	`d642a036550b17c00c26b56a29b9eeba`
BLAKE2b-256	`76f3cdd53811174e72cd2d666fbaa57f891366d5852a0d223f6da98d28a14527`

See more details on using hashes here.

indobenchmark-toolkit 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Indobenchmark Toolkit

Research Paper

IndoNLU and IndoNLG Models

IndoBERT and IndoBERT-lite Models

FastText (Indo4B)

IndoBART and IndoGPT Models

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes