Resilient and Efficient Text Vectorizer
Project description
# RETVec: Resilient & Efficient Text Vectorizer
## Overview RETVec is a next-gen text vectorizer designed to offer built-in adversarial resilience using robust word embeddings. Read the paper here: https://arxiv.org/abs/2302.09207.
RETVec is trained to be resilient against character manipulations including insertion, deletion, typos, homoglyphs, LEET substitution, and more. The RETVec model is trained on top of a novel character embedding which can encode all UTF-8 characters and words. Thus, RETVec works out-of-the-box on over 100 languages without the need for a lookup table or fixed vocabulary size. Furthermore, RETVec is a layer, which means that it can be inserted into any TF model without the need for a separate pre-processing step.
### Getting started
#### Installation
You can use pip to install the TensorFlow version of RETVec:
`python pip install retvec `
RETVec has been tested on TensorFlow 2.6+ and python 3.7+.
### Basic Usage
training/train_tf_retvec_models.py is the RETVec model training script. Example usage:
`python train_tf_retvec_models.py --train_config <train_config_path> --model_config <model_config_path> --output_dir <output_path> `
Configurations for our base models are under the configs/ folder.
### Colab
Colab for training and releasing a new RETVec model: notebooks/train_and_relase_a_rewnet.ipynb
Hello world colab: notebooks/hello_world.ipynb
## Disclaimer This is not an official Google product.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file retvec-1.0.1.tar.gz
.
File metadata
- Download URL: retvec-1.0.1.tar.gz
- Upload date:
- Size: 26.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc307531ede17166dad78cd01729118507ac7b86090d1e2cbb006f97986ee074 |
|
MD5 | 494be4109a57dc4efc2a843545271246 |
|
BLAKE2b-256 | 7605744255da2636ebff138ba00e7ef62708341a9b1e19d24bdf2a21d51a0e3a |
File details
Details for the file retvec-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: retvec-1.0.1-py3-none-any.whl
- Upload date:
- Size: 40.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3307472805d5861ea1ffcbf8adf278f3a132d5e4ea2671ea28ef13ec5ad62a4e |
|
MD5 | 93639edf4b9900b43431019a379aca62 |
|
BLAKE2b-256 | 2a861caedb3af968653d630917ea2c53844eca790c24a0057d75ae70d6ae70a7 |