softmax-one - Pytorch

These details have not been verified by PyPI

Project links

Homepage

Project description

Quiet Attention - A Novel Modification to Softmax Function for Attention Mechanism

(\text{softmax}_1(x))_i = \frac{\exp(x_i)}{1 + \sum_j \exp(x_j)}

Attention mechanism has been a groundbreaking innovation in deep learning, and forms the backbone of the Transformer models, which powers the state-of-the-art language models like GPT4 and LLAMA. However, there is a persistent off-by-one bug in the traditional attention mechanism that can make the models harder to compress and deploy.

Introducing Quiet Attention, an innovative tweak to the traditional softmax function, allowing the attention heads to express 'no preference' and remain quiet. The slight adjustment to the denominator allows the vector to tend to zero if it prefers, rather than forcing the attention head to make an annotation.

This is a paper by Evan Miller, here's the link

Formula

Here's the modified formula for the softmax function, also referred to as "Softmax1" or "Quiet Attention" formula:

(\text{softmax}_1(x))_i = \frac{\exp(x_i)}{1 + \sum_j \exp(x_j)}

Architecture

The critical difference between Softmax1 and traditional softmax lies in their negative limit behavior. In a scenario where all the entries in a vector are significantly less than zero and the model wants to avoid an annotation altogether, softmax_one allows it, unlike softmax.

Softmax1 essentially provides an 'escape hatch' when the attention head wants to remain quiet. The total output weight from Softmax1 varies based on the vector input, as opposed to softmax, which always emits the same total weight. This can significantly improve the model's performance, especially when dealing with noisy inputs.

Installation

Clone the repository:

git clone https://github.com/kyegomez/AttentionIsOFFByOne.git
pip3 install -r requirements.txt
cd AttentionIsOFFByOne
python3 example.py

Unit Tests

This repository contains extensive unit tests that aim to cover all possible scenarios and ensure the reliability of the solution. You can run the tests using the following command:

python -m unittest test.py

Benchmarks

A benchmarking suite is included to compare the performance of the softmax_one function with the PyTorch native softmax function. We provide metrics across different tensor sizes to understand how they perform under varying loads.

To run the benchmarks, use the following command:

python benchmark.py

You can find the results in the benchmarks/results/ directory. The results include execution time and memory usage for each function across a variety of tensor sizes.

Usage

You can use the Softmax1 function just like you would use the traditional softmax function. Here's a simple example:

import torch
from softmax_one.softmax_one import softmax_one

x = torch.randn(5)
y = softmax_one(x, dim=0)

Implementation

# Define the softmax_one function with added one in the denominator , which helps to reduce
#the negative impact impact of tiny values in the softmax function and improves numerical stability
def softmax_one(x, dim=None, _stacklevel=3, dtype=None):
    #subtract the max for stability
    x = x - x.max(dim=dim, keepdim=True).values
    #compute exponentials
    exp_x = torch.exp(x)
    #compute softmax values and add on in the denominator
    return exp_x / (1 + exp_x.sum(dim=dim, keepdim=True))

Contributions

Contributions are welcome! Please submit a pull request or create an issue if you have any improvements or find any bugs.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Experiments

It's really slow in basic python I will implement it in cuda

INFO:root:Running benchmark for tensor size (10, 10)...
INFO:root:F.softmax time: 0.0022182464599609375 s
INFO:root:softmax_one time: 0.04441571235656738 s
INFO:root:Running benchmark for tensor size (100, 100)...
INFO:root:F.softmax time: 0.01704573631286621 s
INFO:root:softmax_one time: 0.07482171058654785 s
INFO:root:Running benchmark for tensor size (1000, 1000)...
INFO:root:F.softmax time: 0.060335397720336914 s
INFO:root:softmax_one time: 3.0616047382354736 s
INFO:root:Running benchmark for tensor size (10000, 10000)...
INFO:root:F.softmax time: 52.80402970314026 s
INFO:root:softmax_one time: 128.78072810173035 s
INFO:root:Chart display is off.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.2

Aug 28, 2023

0.0.1

Aug 28, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

softmax_one-0.0.2.tar.gz (5.2 kB view details)

Uploaded Aug 28, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

softmax_one-0.0.2-py3-none-any.whl (5.7 kB view details)

Uploaded Aug 28, 2023 Python 3

File details

Details for the file softmax_one-0.0.2.tar.gz.

File metadata

Download URL: softmax_one-0.0.2.tar.gz
Upload date: Aug 28, 2023
Size: 5.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for softmax_one-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`c4c233114aa651d3e498bfcdfeee49012cf785065a3b66cf02ef9625db408fcb`
MD5	`d31c40c642260a05bf3d167a59cdf607`
BLAKE2b-256	`71cbdb4d6f3eef68a6ea6b167ac290963486f9e5613571a8b8a96ec5f55c2258`

See more details on using hashes here.

File details

Details for the file softmax_one-0.0.2-py3-none-any.whl.

File metadata

Download URL: softmax_one-0.0.2-py3-none-any.whl
Upload date: Aug 28, 2023
Size: 5.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for softmax_one-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d10b738474cf34ee73d449b64983079e64e4857425a60c212db0aa040120703f`
MD5	`da00052bff6f25028f3dcf47b6a1a4d5`
BLAKE2b-256	`930a95e221b8510f50aceed0bddc649425d9fd65447c70176ed5fba03306cf17`

See more details on using hashes here.

softmax-one 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Quiet Attention - A Novel Modification to Softmax Function for Attention Mechanism

Formula

Architecture

Installation

Unit Tests

Benchmarks

Usage

Implementation

Contributions

License

Experiments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes