Training of multi-label embeddings for k-shingled input sequences for PyTorch.
Project description
torch-multilabel-embedding
The package contains a TensorFlow2/Keras class to train an Embedding matrix for multi-label inputs, i.e. instead of 1 ID per token (one hot encoding), N IDs per token can be provided as model input.
An TensorFlow2/Keras implementation can be found here: https://github.com/ulf1/keras-multilabel-embedding (pip install keras-multilabel-embedding)
Usage
import torch_multilabel_embedding as tml
import torch
# a sequence of multi-label data points
x_ids = [[1, 2, 4], [0, 1, 2], [2, 1, 4], [3, 2, 1]]
x_ids = torch.tensor(x_ids)
# initialize layer
layer = tml.MultiLabelEmbedding(
vocab_size=5, embed_size=300, random_state=42)
# predict
y = layer(x_ids)
Appendix
Installation
The torch-multilabel-embedding git repo is available as PyPi package
pip install torch-multilabel-embedding
pip install git+ssh://git@github.com/ulf1/torch-multilabel-embedding.git
Install a virtual environment
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt --no-cache-dir
pip install -r requirements-dev.txt --no-cache-dir
pip install -r requirements-demo.txt --no-cache-dir
(If your git repo is stored in a folder with whitespaces, then don’t use the subfolder .venv. Use an absolute path without whitespaces.)
Python commands
Jupyter for the examples: jupyter lab
Check syntax: flake8 --ignore=F401 --exclude=$(grep -v '^#' .gitignore | xargs | sed -e 's/ /,/g')
Run Unit Tests: PYTHONPATH=. pytest
Publish
pandoc README.md --from markdown --to rst -s -o README.rst
python setup.py sdist
twine upload -r pypi dist/*
Clean up
find . -type f -name "*.pyc" | xargs rm
find . -type d -name "__pycache__" | xargs rm -r
rm -r .pytest_cache
rm -r .venv
Support
Please open an issue for support.
Contributing
Please contribute using Github Flow. Create a branch, add commits, and open a pull request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for torch-multilabel-embedding-0.1.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3c7c4b7e4786582aed136bbd303daf512b408ee454af17aa402f4225fbd7c8ce |
|
MD5 | 4160995ca14441be661c2b9a13e26b64 |
|
BLAKE2b-256 | 071556bdb3fc8575ae45b3e82dfc0e433d5fe1cd4df20b55d07ce169b426fe10 |