Training of multi-label embeddings for k-shingled input sequences. for Tensorflow2/Keras
Project description
keras-multilabel-embedding
The package contains a TensorFlow2/Keras class to train an Embedding matrix for multi-label inputs, i.e. instead of 1 ID per token (one hot encoding), N IDs per token can be provided as model input.
An PyTorch implementation can be found here: https://github.com/ulf1/torch-multilabel-embedding (pip install torch-multilabel-embedding)
Usage
Multi-label embeddings with fixed number of labels
import keras_multilabel_embedding as tml
import tensorflow as tf
# a sequence of multi-label data points
x_ids = [[1, 2, 4], [0, 1, 2], [2, 1, 4], [3, 2, 1]]
x_ids = tf.constant(x_ids)
# initialize layer
layer = tml.MultiLabelEmbedding(
vocab_size=5, embed_size=300, random_state=42)
# predict
y = layer(x_ids)
Multi-label embeddings with variable number of labels
import keras_multilabel_embedding as tml
import tensorflow as tf
# a sequence of multi-label data points
x_ids = [[1, 2, 4], [0, 1, 2], [2, 1], [3]]
# initialize layer
layer = tml.MultiLabelEmbedding(
vocab_size=5, embed_size=300, random_state=42)
# predict
y = layer(x_ids)
Appendix
Installation
The keras-multilabel-embedding git repo is available as PyPi package
pip install keras-multilabel-embedding
pip install git+ssh://git@github.com/ulf1/keras-multilabel-embedding.git
Install a virtual environment
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt --no-cache-dir
pip install -r requirements-dev.txt --no-cache-dir
pip install -r requirements-demo.txt --no-cache-dir
(If your git repo is stored in a folder with whitespaces, then don’t use the subfolder .venv. Use an absolute path without whitespaces.)
Python commands
Jupyter for the examples: jupyter lab
Check syntax: flake8 --ignore=F401 --exclude=$(grep -v '^#' .gitignore | xargs | sed -e 's/ /,/g')
Run Unit Tests: PYTHONPATH=. pytest
Publish
pandoc README.md --from markdown --to rst -s -o README.rst
python setup.py sdist
twine upload -r pypi dist/*
Clean up
find . -type f -name "*.pyc" | xargs rm
find . -type d -name "__pycache__" | xargs rm -r
rm -r .pytest_cache
rm -r .venv
Support
Please open an issue for support.
Contributing
Please contribute using Github Flow. Create a branch, add commits, and open a pull request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for keras-multilabel-embedding-0.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 97366ccca0c1aa07a5d876760be4c56cfe3b1bb8f3444a5710b6b5b03896094f |
|
MD5 | a9ce15f82f817ab80f694a1b1e6d5472 |
|
BLAKE2b-256 | cf5d00ec9180409afbd62b7954da6f662f65eb93206732670d17e6a453bfc59c |