HDTorch: Accelerating Hyperdimensional Computing with GP-GPUs for Design Space Exploration

# HDTorch

HDTorch is PyTorch-based hyperdimensional (HD) computing library for HD learning. It includes custom CUDA extensions for speeding up hypervector operations, namely, bit-(un)packing and bit-array summation in the horizontal/vertical dimensions.

In the paper HDTorch: Accelerating Hyperdimensional Computing with GP-GPUs for Design Space Exploration (ICCAD 2022), we demonstrate HDTorch’s utility by analyzing four HDC benchmark datasets in terms of accuracy, runtime, and memory consumption, utilizing both classical and online HD training methodologies.

## Installation

HDTorch is hosted on PyPi and can be installed via the following command:

   pip install hdtorch


## Basics of Hyperdimensional computing (HDC)

HD computing is a machine learning strategy whose defining feature is its representation of data points as long (’hyper’) vectors, which enables learning by ’accumulation’ of said vectors belonging to the same class. HD computing relies on two conditions; first, any two randomly generated HD vectors are with high probability orthogonal, and second, a vector generated by vector accumulation will be more similar to its components than a vector not of its class.

Binary and bipolar vectors are two common flavors of HD vector, consisting of values 0/1 and -1/1, respectively. In practice, tertiary (-1,0,1) or integer/float vectors are sometimes used; however, this library focuses on binary and bipolar vectors

The typical HD workflow consists of several steps:

1. Initialize basis vectors in memory that will be used to encode features. They represent the basic units we need, such as class vectors. If we have more complex data, such as EEG data, where we also have channels, we can have basis vectors for each of the channels too.

2. Data (feature) values have to be discretized into several bins. Each of those values will have its own vector that was initialized in the previous step.

3. Discretized features are encoded to HD vectors so that for each sample of features, we instead now have HD vector representing that sample.

4. Learning is performed using all encoded data samples. Several approaches to learning are possible, but the most simple/classic approach is to accumulate all HD vectors representing samples of the same class. After accumulation and normalization to regain binary vectors, these vectors are called 'model' vectors of classes. A more complex form of training is 'online' training, which differs in that the class vectors are updated after every datapoint by multiplying its similarity to the target class by the vector before accumulating it into the class.

5. Inference is performed by first encoding a test sample to an HD vector, then comparing it with learned 'model' vectors. Comparison can be done via various metrics such as cosine, dot, or hamming similarity, but for binary vectors, hamming is the most memory and computation friendly. The label of the most similar 'model' vector is given as a prediction.

## Generating Hypervectors

Encoding data such as a set of features to HD vectors can be done in several ways, but most of them, the first step is generating basis hypervectors that are further combined to generate the final HD vector representation of the original data.

Here we provide several ways to initialize basis hypervectors, since the data that they represent can have different structures and relationships. For example, if we want to represent different categories with no inter-relationship, we can generate each HD vector randomly and independently. In contrast, values with inter-relationships may be mapped such that the distance between values is proportional to the distance between corresponding vectors.

Thus, several options to generate a set of basis vectors that our code supports now are:

• 'random' - where every vector is randomly and independently generated

• 'sandwich' - where every two neighboring vectors have half of the vector the same, but the rest of the vector is random. Here, vectors share 50% similarity with neighboring vectors but not with the ones further.

• 'scale' - or alternatively called 'level' initialization, where the inter-distance between in values the vectors represent is mapped to the similarity between those vectors.

• 'scaleWithRadius' - similar to 'scale' initialization, but for vectors that are closer than the given 'radius' distance. Thus, vectors closer than 'radius' are similar proportionally to their distance, but beyond this 'radius' they are orthogonal.

Example of basis vectors generation:

    import hdtorch

# Generate 5 random hypervectors with dimension 10000 (not packed, on 'cuda')
vecs = hdtorch.HDmodel.generateBasisHDVectors('random',5,10000,0,'cuda')

# Generate 20 hypervectors with dimension 500 (not packed, on 'cuda') in which two vectors are similar reverse-proportionally to their distance. Bits that are different between neighboring vectors are chosen in an increasing manner (instead of randomly) and the whole vector is eventually flipped. If the factor at the end was e.g. 2 only half of the total vector would be flipped.
vecs = hdtorch.HDmodel.generateBasisHDVectors('scaleNoRand1',20,500,0,'cuda')

# Generate 100 hypervectors with dimension 10000 (not packed, on 'cuda') who are similar in proportion to their distance up to the surrounding 10 vectors, and with all vectors further than that nearly orthogonal


## Custom CUDA functions

In order to significantly lower computation time and memory usage when operating with hypervectors, we implemented custom CUDA functions for packing, unpacking and manipulating them.

Below is shown how these functions are used:

    import hdtorch

# Generate random HD vectors of dimension 10000
vecs = hdtorch.HDmodel.generateBasisHDVectors('random',5,10000,0,'cuda')

# Compress vectors to arrays with dimension [5,313], dtype = int32, (CUDA accelerated, 8x memory reduction). Dimension 313 is a result of ceil(10000/32)
packed_vecs = hdtorch.pack(vecs)

# Decompress vector to array with dimension [5,10000], dtype = int8 (CUDA accelerated)
unpacked_vecs = hdtorch.unpack(packed_vecs, 10000)


Next, as encoding and learning in HDC are based on bitwise summing vectors in horizontal and vertical dimensions, we implement these functions for packed vectors. This additionally reduces computation time for encoding and training.

Using those C-based functions is as follows:

    import hdtorch

# Generate random HD vectors of dimension 10000
vecs = hdtorch.HDmodel.generateBasisHDVectors('random',5,10000,0,'cuda')

# Compress vectors to arrays with dimension [5,313], dtype = int32, (CUDA accelerated, 8x memory reduction).
packed_vecs = hdtorch.pack(vecs)

# Horizontal summation of packed vector (CUDA accelerated), the result is array with dimension [5]
h_count = hdtorch.hcount(packed_vecs)

# Vertical summation of packed vector (CUDA accelerated), the result is array with dimension [10000]
v_count = hdtorch.vcount(packed_vecs,10000)


## Data encoding

In order to learn from training data or infer test data labels, data has to be encoded to HD vectors. This means that instead of having data in the form of a 2D matrix [numSampl, numFeat] where each column is one feature, we represent it in the form of 2D matrix of corresponding HD vectors [numSampl, D]. For every sample, numFeat features are encoded to one D-dimensional hypervector.

There is many proposed encoding algorithms, but the most typical is what we call 'FeatXORVal', where each features has an ID vector and n value vectors, where n is the range of values to which data samples are discretized. Data is encoded by binding for the feature ID vector to the value vector corresponding to the data's discretized value, typically via the XOR function. Finally, bound vectors are bundled for all features, generally via bitwise summing and normalizing by the number of summed vectors to regain binary vectors.

This method is demonstrated in the code below:

    import torch
import hdtorch

numFeat=30
D=10000
numSegmentationLevels=20

# initialize data (100 samples, with 30 features, having values between 0 and 256)
features=torch.randint(0,256,(100, numFeat)).to(device='cuda')

# initialize basis vectors
featureIDs = hdtorch.HDmodel.generateBasisHDVectors('random',numFeat,D,0,'cuda') #randomly generated feature ID vectors, 1 for each of 30 features, with with D=10000, non packed
featureVals = hdtorch.HDmodel.generateBasisHDVectors('scaleNoRand1',numSegmentationLevels,D,0,'cuda') #generated feature value vectors, using 'scale' method, 20 possible values, 1, with with D=10000, non packed

#normalize data
minFeat=torch.min(features, dim=0)[0]
maxFeat=torch.max(features, dim=0)[0]
featuresNorm = hdtorch.HDutil.normalizeAndDiscretizeData(features,minFeat, maxFeat, numSegmentationLevels )

#encode features using 'FeatXORVal' approach
(encodedData, _) = hdtorch.HDencoding.EncodeDataToVectors (featuresNorm, featureIDs, featureVals, 'binary', 0, 'FeatXORVal', D)
# or using e.g. 'FeatPermute' approach
(encodedData, _) = hdtorch.HDencoding.EncodeDataToVectors (featuresNorm, featureIDs, featureVals, 'binary', 0, 'FeatPermute', D)


## HD computing learning and inference

Finally, to use HD vectors to perform learning and inference, we show the whole process on an training and inference example using MNIST data:

    import torch
import hdtorch
from torchvision import datasets
import torchvision.transforms as transforms

# Setting various parameters
class HDParams():
HDFlavor = 'binary'  # 'binary', 'bipol' #binary 0,1, bipolar -1,1
D = 10000  # dimension of hypervectors
numFeat = 784
numClasses = 10
device = 'cuda'  # device to use (cpu, cuda)
packed = True
numSegmentationLevels = 20 # defines number of discretization levels to which data is discretized
similarityType = 'hamming'  # 'hamming','cosine' #similarity measure used for comparing HD vectors
levelVecType = 'random'  # 'random','sandwich','scaleNoRand1','scaleNoRand2','scaleRand1', ,'scaleRand2'... 'scaleWithRadius3', #defines how HD vectors are initialized
IDVecType = 'random'
encodingStrat =  'FeatXORVal' # 'FeatXORVal' 'FeatAppend' 'FeatPermute'   #defines how HD vectors encoded
hdParams = HDParams()
batchSize = 1000 #learn in batches

t = transforms.Compose([transforms.ToTensor(), transforms.ConvertImageDtype(torch.uint8)])
kwargs = {'num_workers': 1, 'pin_memory': True} if HDParams.device == 'cuda' else {}
dataTrain = datasets.MNIST(root = './data', train = True, transform = t, download = True)
dataTest  = datasets.MNIST(root = './data', train = False, transform = t, download = True)

# Calculate min and max valus on train set - will used for also normalizing test set

# Initialize HD classifier
HDModel = hdtorch.HD_classifier(HDParams)

# Training HD model in batches
print("Training Model")
print(f'Training batch {x}')
data = data.to(HDParams.device).view(-1,784)
data = hdtorch.HDutil.normalizeAndDiscretizeData(data,minFeat, maxFeat, HDParams.numSegmentationLevels )
HDModel.trainModelVecOnData(data,labels.to(HDParams.device))

# Testing performance
print("Testing Model")
data = data.to(HDParams.device).view(-1,784)
data = hdtorch.HDutil.normalizeAndDiscretizeData(data, minFeat, maxFeat, HDParams.numSegmentationLevels)
(testPredictions,testDistances) = HDModel.givePrediction(data)
acc_test = (testPredictions == labels.to(HDParams.device)).sum().item()/len(labels)
print(f'Batch {x}: Acc: {acc_test}')


## Documentation

More documentation on HDTorch's individual features can be found on its Read the Docs page.

## Citations

If you like this work and use it in your own research, it would be appreciated to cite our following work:

@INPROCEEDINGS{iccad2022,
author={Simon, William Andrew and Pale, Una and Teijeiro, Tomas and Atienza, David},
booktitle={2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)},
title={HDTorch: Accelerating Hyperdimensional Computing with GP-GPUs for Design Space Exploration},
year={2022}}


## Project details

Uploaded Source
Uploaded Python 3