Skip to main content

A package for generalized online tensor decomposition and completion

Project description

GOCPT: Generalized Online CP Tensor Learning

Real-world tensor (multi-dimensional data) can evolve in various ways (as shown below). We provide this GOCPT package to handle the most general online Canonical Polyadic (CP) tensor factorization and completion problem.

  • This package GOCPT can unify most of the existing tensor factorization and completion scenarios and can support the following evolving patterns in the online setting:
    • mode growth along one or more dimensions
    • missing value filling for incomplete tensors
    • value updates for previously incorrect inputs
    • rank changes during the evolution
  • This package provides two model for handling generalized online tensor factorization or completion problems. These two models have comparable or better accuracy/fitness/complexity than previous baselines.
    • GOCPT: need to preserve all previous data and provides more accurate fit
    • GOCPTE (economy version): no previous data is needed and can provide fast speed

1. Package Installation

$ pip install GOCPT

To look up for help, directly type "GOCPT" in the cmd and the help message will pop up.

$ GOCPT

We provide the following modules:

  • GOCPT.datasets: external real tensor loader, synthetic tensor generation
  • GOCPT.otf: baseline models for common online tensor factorization
  • GOCPT.otc: baseline models for common online tensor completion
  • GOCPT.gotf: our models for generalized online tensor factorization
  • GOCPT.gotc: our models for generalized online tensor completion
  • GOCPT.simulation: our simulation environments for all online tensor settings

2. Generalized Online Tensor Factorization (GOTF)

The GOTF setting is a generalization of online tensor factorization (OTF) setting. Here, multiple evolution patterns can appear simultaneously:

  • mode growth: one or more tensor modes can grow
  • rank change: the rank of the maintained factors can increase or decrease

We provide the following two models for the GOTF setting:

  • gotf.GOCPTE: no previous data is needed and can provide fast speed
  • gotf.GOCPT: need to preserve all previous data and provides more accurate fit

and the following models for the common OTF setting:

  • Support arbitrary order tensors: otf.OnlineCPD, otf.CPStream, otf.MAST
  • Support only 3-order tensors and 1 new slice at each step: otf.SDT, otf.RLST

A Quick Example

from GOCPT import datasets, simulation, otf, gotf
import numpy as np

# load Columbia FACE 3D tensor
X = datasets.FACE_3D()

# simulate one mode growth (typically the last mode)
# use 30% as preparation and increase 3 slices at each step
[X_0, X_inc_ls] = simulation.growth_1d(X, prep=0.3, inc=3)

# initialize the model
model = gotf.GOCPTE(X_0, R=5, iters=100)

# update model with new slices, new_R can change during for-loop
for increments in X_inc_ls[:10]:
    new_R = np.random.choice([5,6,7,8])
    model.update(increments, new_R=new_R)

3. Generalized Online Tensor Completion (GOTC)

The GOTC setting is a generalization of online tensor completion (OTC) setting. Here, multiple evolution patterns can appear simultaneously:

  • mode growth: one or more tensor modes can grow
  • value update: previously observed entries may change due to new information
  • missing filling: some previous missing values may be received (delayed feedback)
  • rank change: the rank of the maintained factors can increase or decrease

We provide the following two models for the GOTC setting:

  • gotc.GOCPTE: no previous data is needed and can provide fast speed
  • gotc.GOCPT: need to preserve all previous data and provides more accurate fit

and the following models for the OTC setting: otc.OLSTEC, otc.OnlineSGD

A Quick Example

from GOCPT import datasets, simulation, otc, gotc
import numpy as np

# load synthetic data
[masked_X, mask] = datasets.syn_data(R=5, size=(5, 10, 15, 100), dist='unif', sparsity=0.95)

# simulate one mode growth (typically the last mode)
# use 30% as preparation and increase 3 slices at each step
[[X_0, mask_0], [X_inc_ls, mask_inc_ls]] = simulation.growth_1d([masked_X, mask], prep=0.3, inc=7)

# initialize the model
model = gotc.GOCPTE([X_0, mask_0], R=5, iters=100)

# update model with new slices, new_R can change during for-loop
for increments in zip(X_inc_ls, mask_inc_ls):
    # simulate value update, missing filling,change of rank
    new_value_update = simulation.value_update(model.X, model.mask, percent=10, amp=0.05)
    new_missing_fill = simulation.missing_fill(model.X, model.mask, percent=10, \
        factors=model.factors)
    new_R = np.random.choice([5,6,7,8])
    model.update(increments, new_R=new_R, value_update=new_value_update, \
        miss_fill=new_missing_fill)

4. How to use the modules?

4.1. Data Loader GOCPT.datasets

  • datasets.GCSS
  • datasets.FACE_3D

The tensor data is formatted as np.ndarry type (the current version is compatible with numpy only, we are building torch version to support CUDA). They can be loaded from external data or can be synthetically generated from our scripts with various distribution.

# create data from external sources
from GOCPT import datasets
GCSS = datasets.GCSS()
FACE_3D = datasets.FACE_3D()
"""
INPUT:
    - <int> R: tensor rank
    - <list/tuple> size: tensor mode specification, e.g., [5,10,15]
    - <str> dist: in which distribution
    - <float> sparsity: sparsity of the tensor, default is None
OUTPUT:
    if sparsity is not None:
        - <tensor> masked_X
        - <tensor> mask
    if sparsity is None:
        - <tensor> full_tensor
"""
# creating full synthetic tensors
full_tensor = datasets.syn_data(R=5, size=(5,10,15,20), dist='unif')

# creating incomplete synthetic tensors with mask
[masked_tensor, mask] = datasets.syn_data(R=5, size=(15, 5, 8), dist='normal', with_mask=0.95)

4.2. Simulation GOCPT.simulation

For real world applications, the base tensor (sometimes with tensor mask), new tensor slices per time step, change of rank, new value update, new missing fillings will be given. In order to simulating all different settings, we provide the following functions

  • simulation.growth_1d: currently, we only support one mode growth
  • simulation.value_update: simulate value updates in previous entries
  • simulation.missing_fill: simulate values to fill previous missing entries
"""
INPUT:
    - <tensor> X or [<tensor>, <tensor>] masked_X, mask: for factorization or completion
    - <int> prep: the percentage of preparation data (along the last mode)
    - <int> inc: how many new slices at the next step (along the last mode)
OUTPUT:
    - <tensor> X_0 or [<tensor>, <tensor>] X0, mask0: the prepration tensor
    - <tensor list> X_inc_ls or [<tensor list>, <tensor list>] X_inc_ls, mask_inc_ls : a list \
    of new tensors that appear later
"""
# simulate mode growth for factorization
[X_0, X_inc_ls] = simulation.growth_1d(X, prep=0.3, inc=3)
# simulate mode growth for completion
[[X_0, mask_0], [X_inc_ls, mask_inc_ls]] = simulation.growth_1d([masked_X, mask], prep=0.3, inc=3)
"""
INPUT:
    - <tensor> X: the masked tensor
    - <tensor> mask: the mask itself
    - <int> or <float>: percentage of changed elements or how many elements to change
    - <float> amp: the amplitude of uniform noise to the value
OUTPUT:
    - <list> coords: coordinate list of the changed elements
    - <list> values: new value list of the changed elements
"""
new_value_update = simulation.value_update(model.X, model.mask, percent=10, amp=0.05)
"""
INPUT:
    - <tensor> X: the masked tensor
    - <tensor> mask: the mask itself
    - <int> or <float>: percentage of changed elements or how many elements to fill
    - <matrix list> factors: it is not necessary. However, using factors during the \
        simulation can provide a smoothed missing filling. If factors is None, then \
        we random sample existing elements
OUTPUT:
    - <list> coords: coordinate list of the changed elements
    - <list> values: new value list of the changed elements
"""
new_missing_fill = simulation.missing_fill(model.X, model.mask, percent=10, factors=model.factors)

4.3. Factorization Model Config GOCPT.gotf, GOCPT.otf

All models in this package will be fed on the initial tensor and store an intial list of factors. Then, during the evolution of the tensor, the model (a list of low-rank factors) is updated based on the all the accessible new information (from up to two different evolutions). For calculating the reconstruction error (percentage of fitness, PoF), we will still store all the information in the model class, though they will only be used in gotf.GOCPT.

  • the stats of initialization and each updates will be summarized after optimization
# model initialization
"""
INPUT:
    - <tensor> X_0: the initial tensor
    - <int> R: tensor rank
    - <int> iters: [optinal default=50] for initial cpd-als iterations 
"""
model = gotf.GOCPT(X_0, R=5)
model = gotf.GOCPTE(X_0, R=5, alpha=1) # a special **alpha** is for weighting the previous results
baseline = otf.BASELINE(X_0, R=5)

# model update
"""
INPUT:
    - <tensor> increments: new tensor slices (along the last mode)
    - <int> new_R: [optional] new tensor rank change, only work for GOCPT.gotf model
"""
model.update(increments, new_R=new_R)
baseline.update(increments)

4.4. Completion Model Config GOCPT.gotc, GOCPT.otc

All models in this package will be fed on the initial tensor and the mask and store an intial list of factors. Then, during the evolution of the tensor, the model (a list of low-rank factors) is updated based on the all the accessible new information (from up to four different evolutions). For calculating the reconstruction error (percentage of fitness, PoF), we will still store all the information in the model class, though they will only be used in gotc.GOCPT.

  • the stats of initialization and each updates will be summarized after optimization
# model initialization
"""
INPUT:
    - [<tensor>, <tensor>] X_0, mask_0: the initial tensor and initial mask
    - <int> R: tensor rank
    - <int> iters: [optinal default=50] for initial cpc-als iterations 
"""
model1 = gotc.GOCPT([X_0, mask_0], R=5)
model = gotf.GOCPTE([X_0, mask_0], R=5, alpha=1) # a spefical **alpha** here is for weighting previous results
baseline = otc.BASELINE([X_0, mask_0], R=5)

# model update
"""
INPUT:
    - [<tensor>, <tensor>] X_increments, mask_increments: new tensor and mask slices (along the last mode)
    - <int> new_R: [optional] new tensor rank change, only work for GOCPT.gotf model
"""
**kwargs = {
    "new_R": new_R, # new rank change
    "value_update": [value_update_coordinates, value_update_values], # new value updates
    "missing_fill": [missing_fill_coordinates, missing_fill_values]  # new missing fill
}
model.update([X_increments, mask_increments], **kwargs)
baseline.update([X_increments, mask_increments])

5. Future work

  • We plan to refactorize the code into torch version and support cuda to accelerate our computation
  • We will consider more tensor evolving patterns to extend the coverage of our package.
  • As a long-term plan, we plan to support sparse tensor implementations for subsequent research.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

GOCPT-0.1.1.tar.gz (16.7 kB view details)

Uploaded Source

Built Distribution

GOCPT-0.1.1-py3-none-any.whl (21.9 kB view details)

Uploaded Python 3

File details

Details for the file GOCPT-0.1.1.tar.gz.

File metadata

  • Download URL: GOCPT-0.1.1.tar.gz
  • Upload date:
  • Size: 16.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.7.7

File hashes

Hashes for GOCPT-0.1.1.tar.gz
Algorithm Hash digest
SHA256 bb3cd9e7d266e4332aef1327aa1523d60cc1df2554fd9aadc93e7f96924ce0df
MD5 1a849cfc6f59e25bf54d4be125a61192
BLAKE2b-256 f6dfd94fd08ee8196e5df0f41982b398032cfa91dfbbb6c9a6c6659edbbee207

See more details on using hashes here.

File details

Details for the file GOCPT-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: GOCPT-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 21.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.7.7

File hashes

Hashes for GOCPT-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a02b448c90b4db92585145485eb736c74130d2553767827dfba3aa3f4eb5c80b
MD5 9eb7299906f32853ea9d182ee7890718
BLAKE2b-256 8b7dc5693cf5587704d3ac8c0bb38c2febfdcd29e91f5b662f62160a8fa17d5e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page