Paper - Pytorch

These details have not been verified by PyPI

Project links

Project description

HeptaPod Non-Linear Transformer

The HeptaPod Non-Linear Transformer is a novel deep learning architecture inspired by the linguistic capabilities of the Heptapods from the movie "Arrival". This transformer aims to generate text non-linearly in all directions simultaneously, revolutionizing the way we think about sequence generation.

Install

pip3 install --upgrade nonlinear-transformer

Usage

import torch
from heptapod.model import NonLinearTransformer

x = torch.randint(0, 100, (10, 10))

model = NonLinearTransformer(
    vocab_size=100, embed_size=128, matrix_dim=10, heads=8, window_size=3, iterations=2
)

out = model(x)
print(out.shape)

Training

We're currently smoothing out some rough spots to train the model, so please help us

Introduction
Architecture Overview
Implementation
Usage Example
License

Introduction

Traditional transformers generate sequences linearly, token by token. The HeptaPod Non-Linear Transformer, however, works with 2D matrices of tokens, where each token is influenced by its neighbors in all directions. This architecture is designed to generate text resembling the Heptapod's logograms, which convey meaning non-linearly.

Architecture Overview

The main components of the HeptaPod Non-Linear Transformer are:

2D Rotary Embeddings

Positional information is crucial for transformers. Unlike 1D embeddings used in traditional transformers, the HeptaPod transformer uses 2D rotary embeddings. These embeddings capture both row-wise and column-wise positional information, ensuring every token understands its position in the 2D matrix.

Local 2D Attention

Instead of attending to all tokens in the sequence, the Local 2D Attention mechanism focuses on a localized window around each token. Each token attends only to its immediate neighbors, defined by a specified window size. This localized attention ensures that each token gathers context from its surroundings, making the generation process truly non-linear.

Non-Linear Transformer Block

This is the core of the architecture. Each block consists of:

Layer normalization
Local 2D attention mechanism
A feed-forward neural network

These blocks can be stacked to deepen the architecture, allowing the model to learn more complex patterns and relationships in the data.

Implementation

The implementation is done in PyTorch, one of the leading deep learning libraries. The design ensures modularity, allowing easy customization and experimentation.

Key features:

Modular design: Each component, like the Local 2D Attention mechanism, is implemented as a separate module, allowing for easy modifications and replacements.
Extensibility: The architecture is designed to be easily extensible. You can stack multiple Non-Linear Transformer Blocks to increase the model's depth.

Remember to adjust hyperparameters like dim, depth, and matrix_dim as per your dataset and requirements.

Deep Dive

Architecture Details

Token Representation in 2D

The representation of tokens in a 2D matrix is the foundation of the HeptaPod Non-Linear Transformer. Unlike traditional transformers that work with 1D sequences, this architecture treats input as a 2D grid. This inherently facilitates the capturing of relationships in multiple dimensions — both row-wise and column-wise.

Hierarchical Processing

One potential advancement to this model is the introduction of hierarchical processing. After processing the entire matrix at a given resolution, the model could further abstract the matrix into larger "chunks" or "blocks", treating each chunk as a super-token. This hierarchical processing can help in capturing broader context, much like pooling layers in CNNs.

Local vs. Global Attention

While the primary focus is on local attention, there could be merit in periodically applying global attention to capture long-range dependencies. A hybrid approach, where certain layers (or certain heads within layers) employ global attention, could offer a balance between local context and global understanding.

Conditional Masking

Considering the non-linear nature of the text, it might be beneficial to apply conditional masks during training. Rather than always attending to the same local window, the model could be trained to decide where to look based on the token's content, allowing dynamic context windows.

Potential Methods for Improvement

Adaptive Window Sizes

While a fixed window size offers simplicity, an adaptive window mechanism that adjusts the size based on the token's context can capture varying degrees of local information.

Multi-Scale Representation

Just as multi-scale feature maps are beneficial in image processing tasks, using multi-scale token representations could offer richer context. This involves processing the input matrix at different resolutions and integrating the results.

Cross-Attention Between Hierarchies

If hierarchical processing is employed, introducing cross-attention mechanisms between different hierarchies can ensure better information flow.

Sparse Attention Mechanisms

To efficiently capture long-range dependencies without the computational cost of global attention, sparse attention mechanisms like the ones proposed in models like the Longformer could be integrated.

Further Work

Integration with Vision Models

Given the 2D nature of the input, there's potential synergy with vision models. Combining the HeptaPod Non-Linear Transformer with architectures like Vision Transformers (ViTs) could yield models that excel in tasks involving both text and images.

Transfer Learning & Pre-training

Exploring pre-training strategies on large corpora can make the HeptaPod Non-Linear Transformer more versatile. Fine-tuning on specific tasks post pre-training can lead to better performance, leveraging knowledge from vast amounts of data.

Feedback Loops

Introducing feedback loops where the output is recursively fed back as input can help in refining the generated matrix, potentially leading to more coherent outputs.

Custom Loss Functions

Given the non-linear generation process, custom loss functions that reward coherent formation in multiple directions can be beneficial. This would be in addition to the traditional token prediction losses.

Token Merging Strategies

Post generation, there's potential in exploring strategies that merge or group tokens in the 2D matrix to form super-tokens, condensing information and making it more interpretable.

Architectural Conclusion

The HeptaPod Non-Linear Transformer represents a paradigm shift in sequence generation. While the foundation is promising, the architecture offers numerous avenues for exploration, innovation, and improvement. As with any novel approach, iterative research, experimentation, and collaboration with the broader research community will be pivotal in realizing its full potential.

License

This project is licensed under the MIT License. This ensures that the HeptaPod Non-Linear Transformer is free for all to use, modify, and distribute. We believe in open-source and encourage innovations and improvements to the concept.

Todo

Implement the 2d nonlinear training script and train the model
Benchmark and make improvements on nonlinear structures

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.3

Nov 2, 2023

0.0.2

Nov 1, 2023

0.0.1

Nov 1, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nonlinear_transformer-0.0.3.tar.gz (8.8 kB view details)

Uploaded Nov 2, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nonlinear_transformer-0.0.3-py3-none-any.whl (8.3 kB view details)

Uploaded Nov 2, 2023 Python 3

File details

Details for the file nonlinear_transformer-0.0.3.tar.gz.

File metadata

Download URL: nonlinear_transformer-0.0.3.tar.gz
Upload date: Nov 2, 2023
Size: 8.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for nonlinear_transformer-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`1c11c0bf38e2f4b7ae734ff3c8e2328ef5c9ebd767ff648d573e273fba118f09`
MD5	`a77d2759ba79966b38d0e32ebdfcc44a`
BLAKE2b-256	`a15a86418dc8dac57de74d8b1a50b5b1f5db98952fb0ae3c0d053c7fa8cf4a2e`

See more details on using hashes here.

File details

Details for the file nonlinear_transformer-0.0.3-py3-none-any.whl.

File metadata

Download URL: nonlinear_transformer-0.0.3-py3-none-any.whl
Upload date: Nov 2, 2023
Size: 8.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.3.2 CPython/3.11.0 Darwin/22.4.0

File hashes

Hashes for nonlinear_transformer-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`509bc5e0f92dd59d197cc4f255f16a5b797cb894b1d61f6769638f21ce9084ba`
MD5	`669717751ac37d52a7b3dcff43aa23fb`
BLAKE2b-256	`527e64ed185526492f6cb2ced0128610f95fe74a383538c040151857fd3a8733`

See more details on using hashes here.

nonlinear-transformer 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

HeptaPod Non-Linear Transformer

Install

Usage

Training

Table of Contents

Introduction

Architecture Overview

2D Rotary Embeddings

Local 2D Attention

Non-Linear Transformer Block

Implementation

Deep Dive

Architecture Details

Token Representation in 2D

Hierarchical Processing

Local vs. Global Attention

Conditional Masking

Potential Methods for Improvement

Adaptive Window Sizes

Multi-Scale Representation

Cross-Attention Between Hierarchies

Sparse Attention Mechanisms

Further Work

Integration with Vision Models

Transfer Learning & Pre-training

Feedback Loops

Custom Loss Functions

Token Merging Strategies

Architectural Conclusion

License

Todo

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes