Skip to main content

softmax-one - Pytorch

Project description

Multi-Modality

Quiet Attention - A Novel Modification to Softmax Function for Attention Mechanism

(\text{softmax}_1(x))_i = \frac{\exp(x_i)}{1 + \sum_j \exp(x_j)}

Attention mechanism has been a groundbreaking innovation in deep learning, and forms the backbone of the Transformer models, which powers the state-of-the-art language models like GPT4 and LLAMA. However, there is a persistent off-by-one bug in the traditional attention mechanism that can make the models harder to compress and deploy.

Introducing Quiet Attention, an innovative tweak to the traditional softmax function, allowing the attention heads to express 'no preference' and remain quiet. The slight adjustment to the denominator allows the vector to tend to zero if it prefers, rather than forcing the attention head to make an annotation.

This is a paper by Evan Miller, here's the link

Formula

Here's the modified formula for the softmax function, also referred to as "Softmax1" or "Quiet Attention" formula:

(\text{softmax}_1(x))_i = \frac{\exp(x_i)}{1 + \sum_j \exp(x_j)}

Architecture

The critical difference between Softmax1 and traditional softmax lies in their negative limit behavior. In a scenario where all the entries in a vector are significantly less than zero and the model wants to avoid an annotation altogether, softmax_one allows it, unlike softmax.

Softmax1 essentially provides an 'escape hatch' when the attention head wants to remain quiet. The total output weight from Softmax1 varies based on the vector input, as opposed to softmax, which always emits the same total weight. This can significantly improve the model's performance, especially when dealing with noisy inputs.

Installation

Clone the repository:

git clone https://github.com/kyegomez/AttentionIsOFFByOne.git
pip3 install -r requirements.txt
cd AttentionIsOFFByOne
python3 example.py

Unit Tests

This repository contains extensive unit tests that aim to cover all possible scenarios and ensure the reliability of the solution. You can run the tests using the following command:

python -m unittest test.py

Benchmarks

A benchmarking suite is included to compare the performance of the softmax_one function with the PyTorch native softmax function. We provide metrics across different tensor sizes to understand how they perform under varying loads.

To run the benchmarks, use the following command:

python benchmark.py

You can find the results in the benchmarks/results/ directory. The results include execution time and memory usage for each function across a variety of tensor sizes.

Usage

You can use the Softmax1 function just like you would use the traditional softmax function. Here's a simple example:

import torch
from softmax_one.softmax_one import softmax_one

x = torch.randn(5)
y = softmax_one(x, dim=0)

Implementation

# Define the softmax_one function with added one in the denominator , which helps to reduce
#the negative impact impact of tiny values in the softmax function and improves numerical stability
def softmax_one(x, dim=None, _stacklevel=3, dtype=None):
    #subtract the max for stability
    x = x - x.max(dim=dim, keepdim=True).values
    #compute exponentials
    exp_x = torch.exp(x)
    #compute softmax values and add on in the denominator
    return exp_x / (1 + exp_x.sum(dim=dim, keepdim=True))

Contributions

Contributions are welcome! Please submit a pull request or create an issue if you have any improvements or find any bugs.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Experiments

It's really slow in basic python I will implement it in cuda

INFO:root:Running benchmark for tensor size (10, 10)...
INFO:root:F.softmax time: 0.0022182464599609375 s
INFO:root:softmax_one time: 0.04441571235656738 s
INFO:root:Running benchmark for tensor size (100, 100)...
INFO:root:F.softmax time: 0.01704573631286621 s
INFO:root:softmax_one time: 0.07482171058654785 s
INFO:root:Running benchmark for tensor size (1000, 1000)...
INFO:root:F.softmax time: 0.060335397720336914 s
INFO:root:softmax_one time: 3.0616047382354736 s
INFO:root:Running benchmark for tensor size (10000, 10000)...
INFO:root:F.softmax time: 52.80402970314026 s
INFO:root:softmax_one time: 128.78072810173035 s
INFO:root:Chart display is off.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

softmax_one-0.0.2.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

softmax_one-0.0.2-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file softmax_one-0.0.2.tar.gz.

File metadata

  • Download URL: softmax_one-0.0.2.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for softmax_one-0.0.2.tar.gz
Algorithm Hash digest
SHA256 c4c233114aa651d3e498bfcdfeee49012cf785065a3b66cf02ef9625db408fcb
MD5 d31c40c642260a05bf3d167a59cdf607
BLAKE2b-256 71cbdb4d6f3eef68a6ea6b167ac290963486f9e5613571a8b8a96ec5f55c2258

See more details on using hashes here.

File details

Details for the file softmax_one-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: softmax_one-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for softmax_one-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d10b738474cf34ee73d449b64983079e64e4857425a60c212db0aa040120703f
MD5 da00052bff6f25028f3dcf47b6a1a4d5
BLAKE2b-256 930a95e221b8510f50aceed0bddc649425d9fd65447c70176ed5fba03306cf17

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page