Skip to main content

Resilient and Efficient Text Vectorizer

Project description

# RETVec: Resilient & Efficient Text Vectorizer

## Overview RETVec is a next-gen text vectorizer designed to offer built-in adversarial resilience using robust word embeddings. Read the paper here: https://arxiv.org/abs/2302.09207.

RETVec is trained to be resilient against character manipulations including insertion, deletion, typos, homoglyphs, LEET substitution, and more. The RETVec model is trained on top of a novel character embedding which can encode all UTF-8 characters and words. Thus, RETVec works out-of-the-box on over 100 languages without the need for a lookup table or fixed vocabulary size. Furthermore, RETVec is a layer, which means that it can be inserted into any TF model without the need for a separate pre-processing step.

### Getting started

#### Installation

You can use pip to install the TensorFlow version of RETVec:

`python pip install retvec `

RETVec has been tested on TensorFlow 2.6+ and python 3.7+.

### Basic Usage

training/train_tf_retvec_models.py is the RETVec model training script. Example usage:

`python train_tf_retvec_models.py --train_config <train_config_path> --model_config <model_config_path> --output_dir <output_path> `

Configurations for our base models are under the configs/ folder.

### Colab

Colab for training and releasing a new RETVec model: notebooks/train_and_relase_a_rewnet.ipynb

Hello world colab: notebooks/hello_world.ipynb

## Disclaimer This is not an official Google product.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

retvec-1.0.1.tar.gz (26.4 kB view details)

Uploaded Source

Built Distribution

retvec-1.0.1-py3-none-any.whl (40.5 kB view details)

Uploaded Python 3

File details

Details for the file retvec-1.0.1.tar.gz.

File metadata

  • Download URL: retvec-1.0.1.tar.gz
  • Upload date:
  • Size: 26.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for retvec-1.0.1.tar.gz
Algorithm Hash digest
SHA256 cc307531ede17166dad78cd01729118507ac7b86090d1e2cbb006f97986ee074
MD5 494be4109a57dc4efc2a843545271246
BLAKE2b-256 7605744255da2636ebff138ba00e7ef62708341a9b1e19d24bdf2a21d51a0e3a

See more details on using hashes here.

File details

Details for the file retvec-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: retvec-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 40.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for retvec-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3307472805d5861ea1ffcbf8adf278f3a132d5e4ea2671ea28ef13ec5ad62a4e
MD5 93639edf4b9900b43431019a379aca62
BLAKE2b-256 2a861caedb3af968653d630917ea2c53844eca790c24a0057d75ae70d6ae70a7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page