Skip to main content

An easy to use open-source library for advanced Deep Learning and Natural Language Processing

Project description

Tamnun ML

PyPI pyversions CircleCI

tamnun is a python framework for Machine and Deep learning algorithms and methods especially in the field of Natural Language Processing and Transfer Learning. The aim of tamnun is to provide an easy to use interfaces to build powerful models based on most recent SOTA methods.

For more about tamnun, feel free to read the introduction to TamnunML on Medium.

Getting Started

tamnun depends on several other machine learning and deep learning frameworks like pytorch, keras and others. To install tamnun and all it's dependencies run:

$ git clone https://github.com/hiredscorelabs/tamnun-ml
$ cd tamnun-ml
$ python setup.py install

Or using PyPI:

pip install tamnun

Jump in and try out an example:

$ cd examples
$ python finetune_bert.py

Or take a look at the Jupyer notebooks here.

BERT

BERT stands for Bidirectional Encoder Representations from Transformers which is a language model trained by Google and introduced in their paper. Here we use the excellent PyTorch-Pretrained-BERT library and wrap it to provide an easy to use scikit-learn interface for easy BERT fine-tuning. At the moment, tamnun BERT classifier supports binary and multi-class classification. To fine-tune BERT on a specific task:

from tamnun.bert import BertClassifier, BertVectorizer
from sklearn.pipeline import make_pipeline

clf = make_pipeline(BertVectorizer(), BertClassifier(num_of_classes=2)).fit(train_X, train_y)
predicted = clf.predict(test_X)

Please see this notebook for full code example.

Fitting (almost) any PyTorch Module using just one line

You can use the TorchEstimator object to fit any pytorch module with just one line:

from torch import nn
from tamnun.core import TorchEstimator

module = nn.Linear(128, 2)
clf = TorchEstimator(module, task_type='classification').fit(train_X, train_y)

See this file for a full example of fitting nn.Linear module on the MNIST (classification of handwritten digits) dataset.

Distiller Transfer Learning

This module distills a very big (like BERT) model into a much smaller model. Inspired by this paper.

from tamnun.bert import BertClassifier, BertVectorizer
from tamnun.transfer import Distiller

bert_clf =  make_pipeline(BertVectorizer(do_truncate=True, max_len=3), BertClassifier(num_of_classes=2))
distilled_clf = make_pipeline(CountVectorizer(ngram_range=(1,3)), LinearRegression())

distiller = Distiller(teacher_model=bert_clf, teacher_predict_func=bert_clf.decision_function, student_model=distilled_clf).fit(train_texts, train_y, unlabeled_X=unlabeled_texts)

predicted_logits = distiller.transform(test_texts)

For full BERT distillation example see this notebook.

Support

Getting Help

You can ask questions and join the development discussion on Github Issues

License

Apache License 2.0 (Same as Tensorflow)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tamnun-0.1.1.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

tamnun-0.1.1-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file tamnun-0.1.1.tar.gz.

File metadata

  • Download URL: tamnun-0.1.1.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3

File hashes

Hashes for tamnun-0.1.1.tar.gz
Algorithm Hash digest
SHA256 973023048316115055cdf69febf56dcfcc907cb9035476839f4d80aeeb908918
MD5 64c8c2ad2737d1ced94bf6eb8ed71793
BLAKE2b-256 ccec7d456fbefeaf3a3bdf2b6e2d9ff8b4eab92d62592c99b92b1d93d8f76d6f

See more details on using hashes here.

File details

Details for the file tamnun-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: tamnun-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 9.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.3

File hashes

Hashes for tamnun-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 82e1684a25883f660f9057bc88bf21f7b3f9db3c49faca204b3f72eb35898624
MD5 a7146149da0b315d642bb3e259a94b86
BLAKE2b-256 85003905332b6dc3cd3f1d9921d64b6a6a80dc96938624389c25f960b58333ad

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page