Skip to main content

A Fast and Flexible Graph-Fused Lasso Solver

Project description

A Fast, Flexible Algorithm for the Graph-Fused Lasso
----------------------------------------------------

<p align="center">
<img src="https://github.com/tansey/gfl/blob/master/img/example1.png?raw=true" alt="Example GFL Solution"/>
</p>

The goal in the graph-fused lasso (GFL) is to find a solution to the following convex optimization problem:

<p align="center">
<img src="https://github.com/tansey/gfl/blob/master/img/eq1.png?raw=true" alt="GFL Convex Optimization Problem"/>
</p>

where __l__ is a smooth, convex loss function. The problem assumes you are given a graph structure of edges and nodes, where each node corresponds to a variable and edges between nodes correspond to constraints on the first differences between the variables. The objective function then seeks to find a solution to the above problem that minimizes the loss function over the vertices plus the sum of the first differences defined by the set of edges __E__.

The solution implemented here is based on the graph-theoretic trail decomposition and ADMM algorithm implemented in [1]. The code relies on a slightly modified version of a linear-time dynamic programming solution to the 1-d (i.e. chain) GFL [2].

Python requirements
===================
The python wrapper requires `numpy`, `scipy`, and `networkx` to be able to run everything.

Installing
==========
The package can be installed via Pip:

`pip install pygfl`

or directly from source:

```
python setup.py build
python setup.py install
```

Note that the installation has not been tested on anything other than Mac OS X and Ubuntu. The underlying solver is implemented in pure C and should be cross-platform compatible.

Running
=======
There are two steps in running the GFL solver (once installed). First, you need to decompose your graph into a set of trails then you need to run the C-based GFL solver.

#### 1) Trail decomposition
Suppose you have a graph file like the one in `example/edges.csv`, where each edge is specified on a new line, with a comma separating vertices:

```
0,1
1,2
3,4
2,4
5,4
6,0
3,6
...
```

You can then decompose this graph by running the command line `maketrails` script:

```
trails file --infile example/edges.csv --savet example/trails.csv
```

This will create a file in `example/trails.csv` containing a set of distinct, non-overlapping trails which minimally decomposes the original graph. Next you need to run the solver.

#### 2) Solving via ADMM
Given a set of trails in `example/trails.csv` and a vector of observations in `example/data.csv`, you can run the `graphfl` script to execute the GFL solver:

```
graphfl example/data.csv example/edges.csv --trails example/trails.csv --o example/smoothed.csv
```

This will run a solution path to auto-tune the value of the penalty parameter (the λ in equation 1). The results will be saved in `example/smoothed.csv`. The results should look something like the image at the top of the readme.


Compiling the C solver lib separately
=====================================
To compile the C solver as a standalone library, you just need to run the make file from the `cpp` directory:

`make all`

Then you will need to make sure that you have the `cpp/lib` directory in your `LD_LIBRARY_PATH`:

`export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/my/path/to/gfl/cpp/lib/`

Note the above instructions are for *nix users.

Licensing
=========
This library / package is distributed under the GNU Lesser General Public License, version 3. Note that a subset of code from [2] was modified and is included in the C source.

References
==========
[1] W. Tansey and J. G. Scott. "[A Fast and Flexible Algorithm for the Graph-Fused Lasso](http://arxiv.org/abs/1505.06475)," arXiv:1505.06475, May 2015.

[2] [glmgen](https://github.com/statsmaths/glmgen)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pygfl-1.0.1-cp27-none-macosx_10_9_x86_64.whl (28.0 kB view details)

Uploaded CPython 2.7macOS 10.9+ x86-64

File details

Details for the file pygfl-1.0.1-cp27-none-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for pygfl-1.0.1-cp27-none-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 6f635ee77d77aa8d6cb0fe63ffe4976d02318bcb560d9facaaa774f8cc9f3aa5
MD5 254b6b4a77f7cbe56eabe5ede390e9a0
BLAKE2b-256 3c587bcf034ae5a55cb8faee18e7264edcd9ddaafb3549afb2de166bbec8e6b3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page