Skip to main content

A python library for training a Transformer neural network to solve the Running Key Cipher, widely known in the field of cryptography.

Project description

Welcome to Text Recovery Project 👋

A python library for training a Transformer neural network to solve the Running Key Cipher, widely known in the field of cryptography.

Preview Animation

Hugging Face demo MkDocs link Python version Open In Colab PyPI version PyPi Downloads License Apache 2.0

🚀 Objective

The main goal of the project is to study the possibility of using Transformer neural network to “read” meaningful text in columns that can be compiled for a Running Key Cipher. You can read more about the problem here.

In addition, the second rather fun 😅 goal is to train a large enough model so that it can handle the case described below. Let there be an original sentence:

Hello, my name is Zendaya Maree Stoermer Coleman but you can just call me Zendaya.

The columns for this sentence will be compiled in such a way that the last seven contain from ten to thirteen letters of the English alphabet, and all the others from two to five. Thus, the last seven characters will be much harder to "read" compared to the rest. However, we can guess from the meaning of the sentence that this is the name Zendaya. In other words, the goal is also to train a model that can understand and correctly “read” the last word.

⚙ Installation

Trecover requires Python 3.8 or higher and supports both Windows and Linux platforms.

  1. Clone the repository:
git clone https://github.com/alex-snd/TRecover.git  && cd trecover
  1. Create a virtual environment:

    • Windows:
    python -m venv venv
    
    • Linux:
    python3 -m venv venv
    
  2. Activate the virtual environment:

    • Windows:
    venv\Scripts\activate.bat
    
    • Linux:
    source venv/bin/activate
    
  3. Install the package inside this virtual environment:

    • Just to run the demo:
    pip install -e ".[demo]"
    
    • To train the Transformer:
    pip install -e ".[train]"
    
    • For development and training:
    pip install -e ".[dev]"
    
  4. Initialize project's environment:

    trecover init
    

    For more options use:

    trecover init --help
    

👀 Demo

  • 🤗 Hugging Face
    You can play with a pre-trained model hosted here.
  • 🐳 Docker Compose
    • Pull from Docker Hub:
      docker-compose -f docker/compose/scalable-service.yml up
      
    • Build from source:
      docker-compose -f docker/compose/scalable-service-build.yml up
      
  • 💻 Local (requires docker)
    • Download pretrained model:
      trecover download artifacts
      
    • Launch the service:
      trecover up
      

🗃️ Data

The WikiText and WikiQA datasets were used to train the model, from which all characters except English letters were removed.
You can download the cleaned dataset:

trecover download data

💪 Train

To quickly start training the model, open the Jupyter Notebook.

  • 🕸️ Distributed
    TODO
  • 💻 Local
    After the dataset is loaded, you can start training the model:
    trecover train local \
    --project-name {project_name} \
    --exp-mark {exp_mark} \
    --train-dataset-size {train_dataset_size} \
    --val-dataset-size {val_dataset_size} \
    --vis-dataset-size {vis_dataset_size} \
    --test-dataset-size {test_dataset_size} \
    --batch-size {batch_size} \
    --n-workers {n_workers} \
    --min-noise {min_noise} \
    --max-noise {max_noise} \
    --lr {lr} \
    --n-epochs {n_epochs} \
    --epoch-seek {epoch_seek} \
    --accumulation-step {accumulation_step} \
    --penalty-coefficient {penalty_coefficient} \
    
    --pe-max-len {pe_max_len} \
    --n-layers {n_layers} \
    --d-model {d_model} \
    --n-heads {n_heads} \
    --d-ff {d_ff} \
    --dropout {dropout}
    
    For more information use trecover train local --help

✔️ Related work

TODO: what was done, tech stack.

🤝 Contributing

Contributions, issues and feature requests are welcome.
Feel free to check issues page if you want to contribute.

👏 Show your support

Please don't hesitate to ⭐️ this repository if you find it cool!

📜 License

Copyright © 2022 Alexander Shulga.
This project is Apache 2.0 licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trecover-1.0.3.tar.gz (59.0 kB view details)

Uploaded Source

Built Distribution

trecover-1.0.3-py3-none-any.whl (70.8 kB view details)

Uploaded Python 3

File details

Details for the file trecover-1.0.3.tar.gz.

File metadata

  • Download URL: trecover-1.0.3.tar.gz
  • Upload date:
  • Size: 59.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for trecover-1.0.3.tar.gz
Algorithm Hash digest
SHA256 e5bd79a6584fd8318cd28866ebb6593f0ceb832aadf42139e865e8109ad0cba1
MD5 9238a8fb3891fda84f76165c5d490274
BLAKE2b-256 5578fb0a77a2db66c432c1212301c87b2222e894ac1cccd89096a5aa6923cc86

See more details on using hashes here.

File details

Details for the file trecover-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: trecover-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 70.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for trecover-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ac37f76422adc8d126ff589b30c82655c7c38a8ca37c63839f311c8e313f53bd
MD5 4a466c07856f6ddd4ca668b10375478a
BLAKE2b-256 4773a444972f411adf39f2ae53b4c52f42cbb66a27836ec9b23e9e4e785b942f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page