Skip to main content

Natural-Language-Toolkit for bahasa Malaysia, powered by Deep Learning Tensorflow.

Project description

Malaya is a Natural-Language-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Documentation

Proper documentation is available at https://malaya.readthedocs.io/

Installing from the PyPI

CPU version

$ pip install malaya

GPU version

$ pip install malaya-gpu

Only Python 3.6.x and above and Tensorflow 1.10 and above but not 2.0 are supported.

Features

  • Augmentation

    Augment any text using dictionary of synonym, Wordvector or Transformer-Bahasa.

  • Constituency Parsing

    Transfer learning on BERT-base-bahasa, Tiny-BERT-bahasa, Albert-base-bahasa, Albert-tiny-bahasa, XLNET-base-bahasa.

  • Dependency Parsing

    Transfer learning on BERT-base-bahasa, Tiny-BERT-bahasa, Albert-base-bahasa, Albert-tiny-bahasa, XLNET-base-bahasa, ALXLNET-base-bahasa.

  • Emotion Analysis

    Transfer learning on BERT-base-bahasa, Tiny-BERT-bahasa, Albert-base-bahasa, Albert-tiny-bahasa, XLNET-base-bahasa, ALXLNET-base-bahasa.

  • Entities Recognition

    Transfer learning on BERT-base-bahasa, Tiny-BERT-bahasa, Albert-base-bahasa, Albert-tiny-bahasa, XLNET-base-bahasa, ALXLNET-base-bahasa.

  • Generator

    Generate any texts given a context using T5-Bahasa, GPT2-Bahasa or Transformer-Bahasa.

  • Keyword Extraction

    Provide RAKE, TextRank and Attention Mechanism hybrid with Transformer-Bahasa.

  • Language Detection

    using Fast-text and Sparse Deep learning Model to classify Malay (formal and social media), Indonesia (formal and social media), Rojak language and Manglish.

  • Normalizer

    using local Malaysia NLP researches hybrid with Transformer-Bahasa to normalize any bahasa texts.

  • Num2Word

    Convert from numbers to cardinal or ordinal representation.

  • Paraphrase

    Provide Abstractive Paraphrase using T5-Bahasa and Transformer-Bahasa.

  • Part-of-Speech Recognition

    Transfer learning on BERT-base-bahasa, Tiny-BERT-bahasa, Albert-base-bahasa, Albert-tiny-bahasa, XLNET-base-bahasa, ALXLNET-base-bahasa.

  • Relevancy Analysis

    Transfer learning on BERT-base-bahasa, Tiny-BERT-bahasa, Albert-base-bahasa, Albert-tiny-bahasa, XLNET-base-bahasa, ALXLNET-base-bahasa.

  • Sentiment Analysis

    Transfer learning on BERT-base-bahasa, Tiny-BERT-bahasa, Albert-base-bahasa, Albert-tiny-bahasa, XLNET-base-bahasa, ALXLNET-base-bahasa.

  • Similarity

    Using deep Encoder, Doc2Vec, BERT-base-bahasa, Tiny-BERT-bahasa, Albert-base-bahasa, Albert-tiny-bahasa, XLNET-base-bahasa and ALXLNET-base-bahasa to build deep semantic similarity models.

  • Spell Correction

    Using local Malaysia NLP researches hybrid with Transformer-Bahasa to auto-correct any bahasa words.

  • Stemmer

    Using BPE LSTM Seq2Seq with attention state-of-art to do Bahasa stemming.

  • Subjectivity Analysis

    Transfer learning on BERT-base-bahasa, Tiny-BERT-bahasa, Albert-base-bahasa, Albert-tiny-bahasa, XLNET-base-bahasa, ALXLNET-base-bahasa.

  • Summarization

    Provide Abstractive T5-Bahasa also Extractive interface using Transformer-Bahasa, skip-thought, LDA, LSA and Doc2Vec.

  • Topic Modelling

    Provide Transformer-Bahasa, LDA2Vec, LDA, NMF and LSA interface for easy topic modelling with topics visualization.

  • Toxicity Analysis

    Transfer learning on BERT-base-bahasa, Tiny-BERT-bahasa, Albert-base-bahasa, Albert-tiny-bahasa, XLNET-base-bahasa, ALXLNET-base-bahasa.

  • Transformer

    Provide easy interface to load BERT-base-bahasa, Tiny-BERT-bahasa, Albert-base-bahasa, Albert-tiny-bahasa, XLNET-base-bahasa, ALXLNET-base-bahasa, ELECTRA-base-bahasa and ELECTRA-small-bahasa.

  • Translation

    provide Neural Machine Translation using Transformer for EN to MS and MS to EN.

  • Word2Num

    Convert from cardinal or ordinal representation to numbers.

  • Word2Vec

    Provide pretrained bahasa wikipedia and bahasa news Word2Vec, with easy interface and visualization.

  • Zero-shot classification

    Provide Zero-shot classification interface using Transformer-Bahasa to recognize texts without any labeled training data.

Pretrained Models

Malaya also released Bahasa pretrained models, simply check at Malaya/pretrained-model

Or can try use huggingface 🤗 Transformers library, https://huggingface.co/models?filter=ms

References

If you use our software for research, please cite:

@misc{Malaya, Natural-Language-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow,
  author = {Husein, Zolkepli},
  title = {Malaya},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/huseinzol05/malaya}}
}

Acknowledgement

Thanks to Im Big, LigBlou, Mesolitica and KeyReply for sponsoring AWS, GCP and private cloud to train Malaya models.

Contributing

Thank you for contributing this library, really helps a lot. Feel free to contact me to suggest me anything or want to contribute other kind of forms, we accept everything, not just code!

License

License

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

malaya-3.8.2-py3-none-any.whl (4.0 MB view details)

Uploaded Python 3

File details

Details for the file malaya-3.8.2-py3-none-any.whl.

File metadata

  • Download URL: malaya-3.8.2-py3-none-any.whl
  • Upload date:
  • Size: 4.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.7

File hashes

Hashes for malaya-3.8.2-py3-none-any.whl
Algorithm Hash digest
SHA256 02a638901de7db1f19ddb33555a0977c9238d56a58bb76fd83a5d429de9d4c02
MD5 2deee08887dc8aaacf8d5b40037660eb
BLAKE2b-256 e2e23bfdef6bef17771091800d310776511529e62fc688f3f3c6dc609d0e99c2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page