Hashed Random Projection layer for TF2/Keras
Project description
keras-hrp
Hashed Random Projection layer for TF2/Keras.
Usage
Hashed Random Projections (HRP), binary representations, encoding/decoding for storage (notebook)
Generate a HRP layer with a new hyperplane
The random projection or hyperplane is randomly initialized.
The initial state of the PRNG (random_state
) is required (Default: 42) to ensure reproducibility.
import keras_hrp as khrp
import tensorflow as tf
BATCH_SIZE = 32
NUM_FEATURES = 64
OUTPUT_SIZE = 1024
# demo inputs
inputs = tf.random.normal(shape=(BATCH_SIZE, NUM_FEATURES))
# instantiate layer
layer = khrp.HashedRandomProjection(
output_size=OUTPUT_SIZE,
random_state=42 # Default: 42
)
# run it
outputs = layer(inputs)
assert outputs.shape == (BATCH_SIZE, OUTPUT_SIZE)
Instiantiate HRP layer with given hyperplane
import keras_hrp as khrp
import tensorflow as tf
import numpy as np
BATCH_SIZE = 32
NUM_FEATURES = 64
OUTPUT_SIZE = 1024
# demo inputs
inputs = tf.random.normal(shape=(BATCH_SIZE, NUM_FEATURES))
# create hyperplane as numpy array
myhyperplane = np.random.randn(NUM_FEATURES, OUTPUT_SIZE)
# instantiate layer
layer = khrp.HashedRandomProjection(hyperplane=myhyperplane)
# run it
outputs = layer(inputs)
assert outputs.shape == (BATCH_SIZE, OUTPUT_SIZE)
Serialize Boolean to Int8
Python stores 1-bit boolean values always as 8-bit integers or 1-byte.
Some database technologies behave in similar way, and use up 8x-times of the theoretically required storage space (e.g., Postgres boolean
uses 1-byte instead of 1-bit).
In order to save memory or storage space, chuncks of 8 boolean vector elements can be transformed into one 1-byte int8 number.
import keras_hrp as khrp
import numpy as np
# given boolean values
hashvalues = np.array([1, 0, 1, 0, 1, 1, 0, 0])
# serialize boolean to int8
serialized = khrp.bool_to_int8(hashvalues)
# deserialize int8 to boolean
deserialized = khrp.int8_to_bool(serialized)
# check
np.testing.assert_array_equal(deserialized, hashvalues)
Appendix
Installation
The keras-hrp
git repo is available as PyPi package
pip install keras-hrp
pip install git+ssh://git@github.com/ulf1/keras-hrp.git
Install a virtual environment
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt --no-cache-dir
pip install -r requirements-dev.txt --no-cache-dir
pip install -r requirements-demo.txt --no-cache-dir
(If your git repo is stored in a folder with whitespaces, then don't use the subfolder .venv
. Use an absolute path without whitespaces.)
Python commands
- Jupyter for the examples:
jupyter lab
- Check syntax:
flake8 --ignore=F401 --exclude=$(grep -v '^#' .gitignore | xargs | sed -e 's/ /,/g')
- Run Unit Tests:
PYTHONPATH=. pytest
Publish
# pandoc README.md --from markdown --to rst -s -o README.rst
python setup.py sdist
twine upload -r pypi dist/*
Clean up
find . -type f -name "*.pyc" | xargs rm
find . -type d -name "__pycache__" | xargs rm -r
rm -r .pytest_cache
rm -r .venv
Support
Please open an issue for support.
Contributing
Please contribute using Github Flow. Create a branch, add commits, and open a pull request.
Acknowledgements
The "Evidence" project was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 433249742 (GU 798/27-1; GE 1119/11-1).
Maintenance
- till 31.Aug.2023 (v0.1.0) the code repository was maintained within the DFG project 433249742
- since 01.Sep.2023 (v0.2.0) the code repository is maintained by @ulf1.
Citation
Please cite the arXiv Preprint when using this software for any purpose.
@misc{hamster2023rediscovering,
title={Rediscovering Hashed Random Projections for Efficient Quantization of Contextualized Sentence Embeddings},
author={Ulf A. Hamster and Ji-Ung Lee and Alexander Geyken and Iryna Gurevych},
year={2023},
eprint={2304.02481},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file keras-hrp-0.2.0.tar.gz
.
File metadata
- Download URL: keras-hrp-0.2.0.tar.gz
- Upload date:
- Size: 9.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 03909b40a26c2f3270c99f649cc2e8e6aceaf7dc005ba2d73e56fafed8fbb75c |
|
MD5 | 88314389617cf68bd4531f1975dfc567 |
|
BLAKE2b-256 | 21893a3290e28f3c2d2d7b7b60db92367a7b25e84ac6ff94ecfb835307d3fb8e |