Skip to main content

A Python module for data fusion built on top of factorized models.

Project description

Travis

scikit-fusion is a Python module for data fusion based on recent collective latent factor models.

Dependencies

scikit-fusion is tested to work under Python 3.

The required dependencies to build the software are Numpy >= 1.7, SciPy >= 0.12, PyGraphviz >= 1.3 (needed only for drawing data fusion graphs) and Joblib >= 0.8.4.

Install

This package uses distutils, which is the default way of installing python modules. To install in your home directory, use:

python setup.py install --user

To install for all users on Unix/Linux:

python setup.py build
sudo python setup.py install

For development mode use:

python setup.py develop

Usage

Let’s generate three random data matrices describing three different object types:

>>> import numpy as np
>>> R12 = np.random.rand(50, 100)
>>> R13 = np.random.rand(50, 40)
>>> R23 = np.random.rand(100, 40)

Next, we define our data fusion graph:

>>> from skfusion import fusion
>>> t1 = fusion.ObjectType('Type 1', 10)
>>> t2 = fusion.ObjectType('Type 2', 20)
>>> t3 = fusion.ObjectType('Type 3', 30)
>>> relations = [fusion.Relation(R12, t1, t2),
                 fusion.Relation(R13, t1, t3),
                 fusion.Relation(R23, t2, t3)]
>>> fusion_graph = fusion.FusionGraph()
>>> fusion_graph.add_relations_from(relations)

and then collectively infer the latent data model:

>>> fuser = fusion.Dfmf()
>>> fuser.fuse(fusion_graph)
>>> print(fuser.factor(t1).shape)
(50, 10)

Afterwards new data might arrive:

>>> new_R12 = np.random.rand(10, 100)
>>> new_R13 = np.random.rand(10, 40)

for which we define the fusion graph:

>>> new_relations = [fusion.Relation(new_R12, t1, t2),
                     fusion.Relation(new_R13, t1, t3)]
>>> new_graph = fusion.FusionGraph(new_relations)

and transform new objects to the latent space induced by the fuser:

>>> transformer = fusion.DfmfTransform()
>>> transformer.transform(t1, new_graph, fuser)
>>> print(transformer.factor(t1).shape)
(10, 10)

scikit-fusion is distributed with a few working data fusion scenarios:

>>> from skfusion import datasets
>>> dicty = datasets.load_dicty()
>>> print(dicty)
FusionGraph(Object types: 3, Relations: 3)
>>> print(dicty.object_types)
{ObjectType(GO term), ObjectType(Experimental condition), ObjectType(Gene)}
>>> print(dicty.relations)
{Relation(ObjectType(Gene), ObjectType(GO term)),
 Relation(ObjectType(Gene), ObjectType(Gene)),
 Relation(ObjectType(Gene), ObjectType(Experimental condition))}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit-fusion-0.2.1.tar.gz (6.8 MB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page