High order layers in pytorch
Project description
Piecewise Polynomial and Fourier Layers in PyTorch
This is a PyTorch implementation of my tensorflow repository and is more complete due to the flexibility of PyTorch.
Lagrange Polynomial, Piecewise Lagrange Polynomial, Discontinuous Piecewise Lagrange Polynomial, Fourier Series, sum and product layers in PyTorch. The sparsity of using piecewise polynomial layers means that by adding new segments the representational power of your network increases, but the time to complete a forward step remains constant. Implementation includes simple fully connected layers, convolution layers and deconvolutional layers using these models. This is a PyTorch implementation of this paper including extension to Fourier Series and convolutional neural networks.
Idea
The idea is extremely simple  instead of a single weight at the synapse, use nweights. The nweights describe a piecewise polynomial (or other complex function) and each of the nweights can be updated independently. A Lagrange polynomial and Gauss Lobatto points are used to minimize oscillations of the polynomial. The same approach can be applied to any "functional" synapse, and I also have Fourier series synapses in this repo as well. This can be implemented as construction of a polynomial or Fourier kernel followed by a standard pytorch layer where a linear activation is used.
In the image below each "link" instead of being a single weight, is a function of both x and a set of weights. These functions can consist of an orthogonal basis functions for efficient approximation.
Why
Using higher order polynomial representations might allow networks with much fewer total weights. In physics, higher order methods can be much more efficient. Spectral and discontinuous galerkin methods are examples of this. Note that a standard neural network with relu activations is piecewise linear. Here there are no bias weights and the "nonlinearity" is in the synapse. Also, I've included discontinuous layers, in physics there are many problems that are discontinuous (most nonlinear hyperbolic conservation laws form discontinuities) in this case the method becomes a subgradient descent.
In addition, it's well known that the dendrites are also computational units in neurons, for example Dendritic action potentials and computation in human layer 2/3 cortical neurons and this is a simple way to add more computational power into the artificial neural network model. In addition it's been shown that a single pyramidal has the same computational capacity as a 5 to 8 layer convolutional NN, Single cortical neurons as deep artificial neural networks
Speed
Solving with relu layers is faster, however, sparsity may mean that there is a speed advantage in using the piecewise polynomial approach when there are many segments. There do seem to be situations where the piecewise polynomial approach is significantly better than standard relu layers. Also, combining these layers with standard relu inputs, or using piecewise polynomial layer as inputs especially for implicit representation type problems (or as "positional embeddings") and in natural language problems seems to be useful.
Fully Connected Layer Types
All polynomials are Lagrange polynomials with Chebyshev interpolation points.
A helper function is provided in selecting and switching between these layers
from high_order_layers_torch.layers import *
layer1 = high_order_fc_layers(
layer_type=layer_type,
n=n,
in_features=784,
out_features=100,
segments=segments,
alpha=linear_part
)
where layer_type
is one of
layer_type  representation 

continuous  piecewise polynomial using sum at the neuron 
continuous_prod  piecewise polynomial using products at the neuron 
discontinuous  discontinuous piecewise polynomial with sum at the neuron 
discontinuous_prod  discontinous piecewise polynomial with product at the neuron 
polynomial  single polynomial (non piecewise) with sum at the neuron 
polynomial_prod  single polynomial (non piecewise) with product at the neuron 
product  Product 
fourier  fourier series with sum at the neuron 
n
is the number of interpolation points per segment for polynomials or the number of frequencies for fourier series, segments
is the number of segments for piecewise polynomials, alpha
is used in product layers and when set to 1 keeps the linear part of the product, when set to 0 it subtracts the linear part from the product.
Convolutional Layer Types
conv_layer = high_order_convolution_layers(layer_type=layer_type, n=n, in_channels=3, out_channels=6, kernel_size=5, segments=segments, rescale_output=rescale_output, periodicity=periodicity)
All polynomials are Lagrange polynomials with Chebyshev interpolation points.
layer_type  representation 

continuous(1d,2d)  piecewise continuous polynomial 
discontinuous(1d,2d)  piecewise discontinuous polynomial 
polynomial(1d,2d)  single polynomial 
fourier(1d,2d)  fourier series convolution 
h and p refinement
p refinement is taking an existing network and increasing the polynomial order of that network without changing the network output. This allow the user to train a network at low polynomial order and then use that same network to initialize a network with higher polynomial order. This is particularly useful since a high order polynomial network will often converge poorly without the right initialization, the lower order network provides a good initial solution. The function for changing the order of a network is
from high_order_layers_torch.networks import interpolate_high_order_mlp
interpolate_high_order_mlp(
network_in: HighOrderMLP, network_out: HighOrderMLP
current implementation only works with high order MLPs, not with convnets. A similar function exists for h refinement. h refinement is refining the number of segments in a layer, and is used for similar reasoning. Layers with lots of segments may be slow to converge so the user starts with a small number of segments (1 or 2) and then increases the number of segments (h) using the lower initialization. The following function currently only works for high order MLPs, not with convnets
from high_order_layers_torch.network import hp_refine_high_order_mlp
hp_refine_high_order_mlp(
network_in: HighOrderMLP, network_out: HighOrderMLP
)
Installing
Installing locally
This repo uses poetry, so run
poetry install
and then
poetry shell
Installing from pypi
pip install highorderlayerstorch
or
poetry add highorderlayerstorch
Examples
Simple function approximation
Approximating a simple function using a single input and single output (single layer) with no hidden layers to approximate a function using continuous and discontinuous piecewise polynomials (with 5 pieces) and simple polynomials and fourier series. The standard approach using ReLU is non competitive. To see more complex see the implicit representation page here.
python examples/function_example.py
XOR : 0.5 for x*y > 0 else 0.5
Simple XOR problem using the standard network structure (2 inputs 2 hidden 1 output) this will also work with no hidden layers. The function is discontinuous along the axis and we try and fit that function. Using piecewise discontinuous layers the model can match the function exactly. With piecewise continuous it doesn't work quite as well. Polynomial doesn't work well at all (expected).
MNIST (convolutional)
python examples/mnist.py max_epochs=1 train_fraction=0.1 layer_type=continuous2d n=4 segments=2
CIFAR100 (convolutional)
python examples/cifar100.py m max_epochs=20 train_fraction=1.0 layer_type=polynomial segments=2 n=7 nonlinearity=False rescale_output=False periodicity=2.0 lr=0.001 linear_output=False
Variational Autoencoder
Still a WIP. Does work, but needs improvement.
python examples/variational_autoencoder.py m max_epochs=300 train_fraction=1.0
run with nevergrad for parameter tuning
python examples/variational_autoencoder.py m
Invariant MNIST (fully connected)
Without polynomial refinement
python examples/invariant_mnist.py max_epochs=100 train_fraction=1 mlp.layer_type=continuous mlp.n=5 mlp.p_refine=False mlp.hidden.layers=4
with polynomial refinement (prefinement)
python examples/invariant_mnist.py max_epochs=100 train_fraction=1 layer_type=mlp.continuous mlp.n=2 mlp.target_n=5 mlp.p_refine=True
I've also added hp refinement, but it needs a lot of testing.
Implicit Representation
An example of implicit representation for image compression, language generation can be found here. I intend to explore generative models in natural language further here
PDEs in Fluid Dynamics
An example using implicit representation to solve hyperbolic (nonlinear) wave equations can be found here
Natural Language Generation
Examples using these networks for natural language generation can be found here
Generative music
Work in progress here
Test and Coverage
After installing and running
poetry shell
run
pytest
for coverage, run
coverage run m pytest
and then
coverage report
A note on the unit
The layers used here do not require additional activation functions and use a simple sum or product in place of the activation. I almost always use sum units, but product units are performed in this manner
$$ product=1+\prod_{i}(1 + f_{i})+(1\alpha)\sum_{i}f_{i} $$
The 1 is added to each function output to as each of the sub products is also computed. The linear part is controlled by the alpha parameter.
Notes on normalization
Although you can use batchnorm, layernorm etc... I've found that you can actually just use the infinity norm ("max_abs" norm) which has no parameters for this formulation (same approach seems not to work very well for standard relu networks  but need to investigate this further). The max_abs normalization is defined this way
normalized_x = x/(max(abs(x))+eps)
where the normalization is done per sample (as opposed to per batch). The way the layers are formulated, we don't want the neuron values to extend beyond [1, 1] as the polynomial values grow rapidly beyond that range. I also use mirror periodicity to keep the values within from growing rapidly. We want the values to cover the entire range [1, 1] of the polynomials as the weights are packed towards the edges of each segment. Normalizing by the sample L2 norm pushes most of the values towards zero, which I don't want.
Reference
@misc{Loverich2020,
author = {Loverich, John},
title = {High Order Layers Torch},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/jloveric/highorderlayerstorch}},
}
Project details
Release history Release notifications  RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for high_order_layers_torch1.3.2.tar.gz
Algorithm  Hash digest  

SHA256  6f3d7f1a7568f9732b9d72108fb4d51ef960a2267ea55600d4dbbed07deac565 

MD5  39d820bddfec7ad9360bae370735147b 

BLAKE2b256  0789404164d785aff136fe90dbc09ba44e110b13c4ff70cb5459805baf8fc6fb 
Hashes for high_order_layers_torch1.3.2py3noneany.whl
Algorithm  Hash digest  

SHA256  13e925af0fb0469467d28f24e015b37028a9a53d6cdcd0feb4438ccff5c5777e 

MD5  ea8b947cd6269298595e5fb68f1a7ea1 

BLAKE2b256  4133b34fd8636a43c8a2be52e46ab20bea818b6f2d0a0d21accfa24cbb85260f 