Skip to main content

An embedding toolkit that can perform multiple embedding process which are low-dimensional embedding (dimension reduction), categorical variable embedding, and financial time-series embedding.

Project description

Embedding Tool

An embedding toolkit that can perform multiple embedding process which are low-dimensional embedding (dimension reduction), categorical variable embedding, and financial time-series embedding.

Install

pip install embedding-tool

from embedding_tool.core import *

How to use

Dimension Reduction: dimensionReducer class

The function performs dimensionality reduction, pre-processing the data and comparing the reconstruction error via PCA and autoencoder.

Input data: The input matrix has a size of 863 $\times$ 768.

print ("Data's size: ", testing_data.shape)
Data's size:  (863, 768)

Performing dimension reduction: we will reduce the number of dimension from 768 to 2. The learning rate of 0.002 will be use for the ADAM optimizer for the autoencoder model fitting.

dim_reducer = dimensionReducer(testing_data, 2, 0.002)
dim_reducer.fit()

Calculating the MSE of the reconstructed vectors

dim_reducer.rmse_result
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
PCA 1AE 2AE
MSE 0.740122 0.741265 0.65168
dim_reducer.rmse_result.T.sort_values('MSE').head(1).values[0][0]
0.6516801665399286

Here we can see that the two-layers autoencoder has the best performance with the lowest MSE of 0.64.

Observing the loss for each epoch: If we see that the MSE doesn't converge fast enough, we could adjust the learning rate parameter. The default is 0.002. Try increase it to 0.005 if it doesn't converge or decrease to 0.001 if it converges way too fast and oscillating.

dim_reducer.plot_autoencoder_performance()

png

png

Result (Reduced Dimension Output): There are three outputs from three different methods, which are PCA, 1-layer AE, and 2-layers AE.

### Embedding from PCA
dim_reducer.dfLowDimPCA.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
0 1
0 -16.078718 -6.701481
1 -8.858150 9.354204
2 4.305739 -0.464707
3 -11.514311 -0.687461
4 1.212006 6.537965
### Embedding from 1-layer autoencoder
dim_reducer.dfLowDim1AE.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
0 1
0 -6.178097 4.734626
1 2.075333 5.529111
2 0.953502 -1.667776
3 -2.488155 4.001960
4 3.183654 0.589496
### Embedding from 2-layers autoencoder
dim_reducer.dfLowDim2AE.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
0 1
0 32.622066 54.652271
1 35.649811 40.493984
2 15.314294 5.869064
3 19.667603 37.821194
4 36.183212 25.429262

Plotting the embedding

### Embedding from 2-layers autoencoder
plot_output(dim_reducer.dfLowDim2AE)

png

### Embedding from 1-layer autoencoder
plot_output(dim_reducer.dfLowDim1AE)

png


Reference:


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embedding_tool-0.1.1.tar.gz (13.5 kB view hashes)

Uploaded Source

Built Distribution

embedding_tool-0.1.1-py3-none-any.whl (10.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page