compress graphs with answer-set-programming
Project description
# PowerGrASP
Graph compression.
Note that this is a full reimplementation of PowerGrASP,
taking advantage of ASP and Python lifting and simplifications.
For the version published in 2017, see [this repository](https://github.com/aluriak/PowerGrASP-1).
## CLI
python -m powergrasp mygraph.gml -o compressed.bbl
### help !
python -m powergrasp --help
## API
```python
import powergrasp
with open('compressed.bbl', 'w') as fd:
for line in powergrasp.compress_by_cc('mygraph.gml'):
fd.write(line + '\n')
```
### help !
Sorry, no technical doc for the moment.
## Bubble files usage
The bubble file is the main output of PowerGrASP, and describes a power graph.
The bubble is a format handled by the [CyOog plugin](http://www.biotec.tu-dresden.de/research/schroeder/powergraphs/download-cytoscape-plugin.html) for [Cytoscape 2](http://cytoscape.org/), allowing one to load a bubble formatted file and to visualize the corresponding graph.
As a consequence, you have to [install Cytoscape 2](http://chianti.ucsd.edu/Cyto-2%5F8%5F3/), and put the `CyOog.jar` file into `path/to/cytoscape/install/dir/plugins/`.
Another way to go is to convert the bubble to other formats handling hierarchical graphs, like [gexf](https://gephi.org/gexf/format/),
[dot](https://www.graphviz.org/doc/info/lang.html) or the API of [cytoscape.js](http://js.cytoscape.org/).
This is a task implemented by [bubbletools](https://github.com/Aluriak/bubble-tools/).
## Configuration
PowerGrASP has some [configuration values](powergrasp/constants.py),
that can be overwritten by a `powergrasp.cfg` file placed in the working directory.
The configuration may be printed in stdout using:
python -m powergrasp --config
The config file may be either in ini or json format, as shown in [`test/powergrasp.oneshot.cfg`](test/powergrasp.oneshot.cfg) and [`test/powergrasp.manyoptions.cfg`](test/powergrasp.manyoptions.cfg).
Configuration allow user to define how much text will be outputed by powergrasp,
and to tune the core compression and related optimizations.
See complete list in the [options section](#Options).
## installation
pip install powergrasp
On random error, try adding `--no-cache-dir` at the end of the command,
and check that you are using python 3.
You must have [`clingo`](https://potassco.org/doc/start/) in your path. Depending on your OS, it might be done with a system installation,
or through [downloading](https://github.com/potassco/clingo/releases) and (compilation and) manual installation.
## Changelog
- 8.17
- support for [recipes options](#recipes), like `breakable` or `last`
- 8.11
- slight bounds improvements for non-star-biclique motif
- statistics/timers now provides time needed to search for each motif
- support for [recipes](#recipes), that small file that allows to define the compressions to do
- 8.10
- option [parallel cc compression](#parallel-cc-compression) to enable parallel compression of connected components
- option [bubble embeds cc](#bubble-embeds-cc), to put each cc in a dedicated powernode
- option [bubble with simple edges](#bubble-with-simple-edges), to keep or discard the simple edges from output
- statistics on time and compression metrics are computed by cc and for all graph. See new options.
- option [cc statistic file](#cc-statistic-file), to retrieve stats and metrics for connected components.
- option [global statistics](#global-statistics), to compute statistics/metrics over all connected components.
- option [bubble with statistics](#bubble-with-statistics), to include stats and metrics in output bubble file
- [motif type order](#motif-type-order) option, allowing user to tune the order in which motifs are searched.
- [parallel motif search](#parallel-motif-search) option, making motifs search in different threads, and the overhaul compression much slower.
- CLI: --config flag to print the full configuration in stdout
- handling spaces instead of _ in fields names for INI format
- do not filter out nodes alone in their connected component (cf [keep-single-nodes option](#keep-single-nodes))
- perf gain: the first motif to reach the best score is compressed, instead of the last
- improve handling of statistic files, that now contains cc idx and motif bounds
- expose `__version__` attribute in the package
- add a list of available options in the README
- graph filtering: optimization leading to general performance gain when enabled
- improvements on {low,upp}erbounds computations
- 8.9
- many bugfixes
- CLINGO_OPTIONS can be a dict, mapping motif name to arbitrary parameters
- 8.8
- bugfix: lowerbound can't go under 2 (3 for cliques)
- clique search: lowerbound is initially computed as the maximal degree of node of clustering coefficient equals to 1
- use new versions of clyngor and bubbletools
- 8.7
- raise error when an invalid key is found in config file
- StarSearch is dedicated to find stars, simplifying BicliqueSearch's job (enabled by default, user may use Biclique instead of both Star and NonStarBiclique)
- specify a prefix for all (power) nodes with OUTPUT_NODE_PREFIX config field
- arbitrary options for clingo using CLINGO_OPTIONS config field
- bugfix on string options given in ini file, when unecessary quotes are kept
- 8.6
- bugfixes for INI format
- 8.5
- config may now be given in INI format
- improve logging
- config allow optimization target between memory or CPU (currently define a constraint implementation in ASP)
- bugfix about edges yielded by ASP and multithreading parameter
- no statistics file to write by default
- timer for output writing
- config allow user to define the number of CPU for clingo to use
- simplify ASP code by removing useless arg in block/4 atoms
- 8.4
- replace ASP code constraint to achieve much more efficient grounding
- configuration allow for improved lowerbound computation of bicliques
- generated bubble allow client to give heading comments
- handle keyboard interruption during search with grace
- implement timers and statistics recording
- enable multishot motif search by default
- various bugfixes
## Recipes
A recipe is a description of the motifs to compress.
For instance, the [data/recipe-test.txt](data/recipe-test.txt):
biclique c g h i
biclique a b d e
biclique a b f
biclique c d e
Using the `--recipe` option in CLI, user can provide a recipe for its own graph,
allowing to specify prior motifs to search.
The order of second and third columns does not have any influence.
The first column usually contains the motif types separated by comma
(altough it is not necessary), and allows to provide some options (separated by comma):
- `primer`: allows the program to extend the given motif if possible
- `breakable`: allows the program to compress the line with multiple motifs
- `optional`: if not found, ignore instead of stopping
- `last`: stop compression here (useful to avoid the compression process to take over)
The recipe file will be read line by line.
Once all lines have been applied, the normal compression compression takes over
(unless `last` option is used).
For a living example, see [data/recipe-option-test.txt](data/recipe-option-test.txt).
## Options
Description of all powergrasp options.
All values example are given for INI format.
### test integrity
Run some integrity tests during runtime to ensure that compression is working well.
May slow the compression a lot, and is mainly a debugging tool.
Default value:
test_integrity = true
### show story
Print in stdout the main steps of compression.
Default value:
show_story = true
### show motif handling
Print in stdout the motif transformation.
Default value:
show_motif_handling = false
### timers
Maintain and print in stdout some timers.
Default value:
timers = true
### statistics file
A file in which some statistics will be written in CSV format, giving among others size of compressed motifs, and their compression times (if *timers* option is enabled).
Default value:
statistics_file = None
Example value:
statistics_file = statistics.csv
### cc statistic file
Filename in which the statistics about each connected component will be written.
Data will contain one row for each connected component, with the following data:
- connected component index
- the convertion rate
- the edge reduction
- the compression rate
- time needed to compress the connected component (if TIMERS is enabled)
Default value stands for *no file*:
cc_statistic_file = None
Example value:
cc_statistic_file = statistics-cc.csv
### global statistics
When enabled, statistics and metrics computation on the overall graph will be performed at the end of compression.
Default value:
global_statistics = True
### bubble with statistics
When enabled, output bubble will contain statistics about each connected component
and the overall graph (if global_statistics is enabled).
Default value:
bubble_with_statistics = True
### bubble for each step
Generate and save a bubble representation of the graph at each step.
Mainly used for debugging.
Default value:
bubble_for_each_step = false
### output node prefix
Prefix to add to all (power)nodes names in output.
Default value:
output_node_prefix = ''
### show debug
Show full trace of the compression. Useful for debugging.
Default value:
show_debug = False
### covered edges from ASP
Recover covered edges from ASP. If falsy, will ask motif searcher to compute the edges, which may be quicker.
Default value:
covered_edges_from_asp = False
### bubble with nodes
Nodes and sets are optional in the output bubble.
Default value:
bubble_with_nodes = True
bubble_with_sets = True
### bubble poweredge factor
Edges in bubble are associated to a factor.
Default value:
bubble_poweredge_factor = '1.0'
bubble_edge_factor = '1.0'
### bubble embeds cc
If enabled, put each connected component in a dedicated powernode.
Default value:
bubble_embeds_cc = no
### bubble simplify quotes
When possible, delete the quotes around identifiers in the bubble. May lead to node name collision.
Default value:
bubble_simplify_quotes = True
### bubble with simple edges
If disabled, will discard simple (i.e. non-power) edges of output.
Default value:
bubble_with_simple_edges = True
### config file
Load options found in given config file, if it exists.
Default value:
config_file = 'powergrasp.cfg'
### multishot motif search
Search for multiple motif in a single search.
Accelerate the solving for graph with lots of equivalent motifs. It is generally a good option to enable.
Default value:
multishot_motif_search = True
### biclique lowerbound maxnei
Optimization on biclique lowerbound computation. Can be costly. Deactivate with 2. With value at n, up to n neighbors are considered.
Default value:
biclique_lowerbound_maxnei = 2
### clingo options
Arbitrary parameters to give to clingo (note that some, like multithreading or optmode, may already be set by other options).
Default value:
clingo_options = {}
Give a particular configuration for bicliques search:
clingo_options = {'no-star-biclique': '--configuration=handy'}
### clingo multithreading
Number of CPU available to clingo, or 0 for autodetect number of CPU.
Default value:
clingo_multithreading = 1
Use as many thread there are available CPU:
clingo_multithreading = 0
Specify a competing search between 4 threads:
clingo_multithreading = '4,compete'
### parallel cc compression
To use to compress connected components in different processes.
Default value (optimized in memory):
parallel_cc_compression = 1
Compress with 4 processes :
parallel_cc_compression = 4
Compress with one process for each connected component :
parallel_cc_compression = 0
### use star motif
Two different motifs for stars and bicliques, so the search space for bicliques is smaller. Yields good performance improvements on big graphs.
Default value:
use_star_motif = True
### optimize for memory
When possible, prefer memory over CPU.
Default value:
optimize_for_memory = False
### graph filtering
Ignore edges dynamically determined as impossible to compress.
Default value:
graph_filtering = True
### keep single nodes
If a node is found to be alone in its connected component, it will be kept nonetheless.
Default value:
keep_single_nodes = True
### parallel motif search
Use multithreading to search for all motifs in the same time, instead of sequentially.
Interestingly, this yields very poor results, and it's even worse with multiprocessing.
parallel_motif_search = False
### motif type order
Define in which order the motifs are searched, e.g. cliques, then bicliques then stars.
motif_type_order = star,clique,non-star-biclique,biclique
Instead of expliciting the exact motif names and order, it is also possible to specify a bound-based order:
motif_type_order = greatest-upperbound-first
Or any variation with `worst`, `lowerbound` and `last`.
Graph compression.
Note that this is a full reimplementation of PowerGrASP,
taking advantage of ASP and Python lifting and simplifications.
For the version published in 2017, see [this repository](https://github.com/aluriak/PowerGrASP-1).
## CLI
python -m powergrasp mygraph.gml -o compressed.bbl
### help !
python -m powergrasp --help
## API
```python
import powergrasp
with open('compressed.bbl', 'w') as fd:
for line in powergrasp.compress_by_cc('mygraph.gml'):
fd.write(line + '\n')
```
### help !
Sorry, no technical doc for the moment.
## Bubble files usage
The bubble file is the main output of PowerGrASP, and describes a power graph.
The bubble is a format handled by the [CyOog plugin](http://www.biotec.tu-dresden.de/research/schroeder/powergraphs/download-cytoscape-plugin.html) for [Cytoscape 2](http://cytoscape.org/), allowing one to load a bubble formatted file and to visualize the corresponding graph.
As a consequence, you have to [install Cytoscape 2](http://chianti.ucsd.edu/Cyto-2%5F8%5F3/), and put the `CyOog.jar` file into `path/to/cytoscape/install/dir/plugins/`.
Another way to go is to convert the bubble to other formats handling hierarchical graphs, like [gexf](https://gephi.org/gexf/format/),
[dot](https://www.graphviz.org/doc/info/lang.html) or the API of [cytoscape.js](http://js.cytoscape.org/).
This is a task implemented by [bubbletools](https://github.com/Aluriak/bubble-tools/).
## Configuration
PowerGrASP has some [configuration values](powergrasp/constants.py),
that can be overwritten by a `powergrasp.cfg` file placed in the working directory.
The configuration may be printed in stdout using:
python -m powergrasp --config
The config file may be either in ini or json format, as shown in [`test/powergrasp.oneshot.cfg`](test/powergrasp.oneshot.cfg) and [`test/powergrasp.manyoptions.cfg`](test/powergrasp.manyoptions.cfg).
Configuration allow user to define how much text will be outputed by powergrasp,
and to tune the core compression and related optimizations.
See complete list in the [options section](#Options).
## installation
pip install powergrasp
On random error, try adding `--no-cache-dir` at the end of the command,
and check that you are using python 3.
You must have [`clingo`](https://potassco.org/doc/start/) in your path. Depending on your OS, it might be done with a system installation,
or through [downloading](https://github.com/potassco/clingo/releases) and (compilation and) manual installation.
## Changelog
- 8.17
- support for [recipes options](#recipes), like `breakable` or `last`
- 8.11
- slight bounds improvements for non-star-biclique motif
- statistics/timers now provides time needed to search for each motif
- support for [recipes](#recipes), that small file that allows to define the compressions to do
- 8.10
- option [parallel cc compression](#parallel-cc-compression) to enable parallel compression of connected components
- option [bubble embeds cc](#bubble-embeds-cc), to put each cc in a dedicated powernode
- option [bubble with simple edges](#bubble-with-simple-edges), to keep or discard the simple edges from output
- statistics on time and compression metrics are computed by cc and for all graph. See new options.
- option [cc statistic file](#cc-statistic-file), to retrieve stats and metrics for connected components.
- option [global statistics](#global-statistics), to compute statistics/metrics over all connected components.
- option [bubble with statistics](#bubble-with-statistics), to include stats and metrics in output bubble file
- [motif type order](#motif-type-order) option, allowing user to tune the order in which motifs are searched.
- [parallel motif search](#parallel-motif-search) option, making motifs search in different threads, and the overhaul compression much slower.
- CLI: --config flag to print the full configuration in stdout
- handling spaces instead of _ in fields names for INI format
- do not filter out nodes alone in their connected component (cf [keep-single-nodes option](#keep-single-nodes))
- perf gain: the first motif to reach the best score is compressed, instead of the last
- improve handling of statistic files, that now contains cc idx and motif bounds
- expose `__version__` attribute in the package
- add a list of available options in the README
- graph filtering: optimization leading to general performance gain when enabled
- improvements on {low,upp}erbounds computations
- 8.9
- many bugfixes
- CLINGO_OPTIONS can be a dict, mapping motif name to arbitrary parameters
- 8.8
- bugfix: lowerbound can't go under 2 (3 for cliques)
- clique search: lowerbound is initially computed as the maximal degree of node of clustering coefficient equals to 1
- use new versions of clyngor and bubbletools
- 8.7
- raise error when an invalid key is found in config file
- StarSearch is dedicated to find stars, simplifying BicliqueSearch's job (enabled by default, user may use Biclique instead of both Star and NonStarBiclique)
- specify a prefix for all (power) nodes with OUTPUT_NODE_PREFIX config field
- arbitrary options for clingo using CLINGO_OPTIONS config field
- bugfix on string options given in ini file, when unecessary quotes are kept
- 8.6
- bugfixes for INI format
- 8.5
- config may now be given in INI format
- improve logging
- config allow optimization target between memory or CPU (currently define a constraint implementation in ASP)
- bugfix about edges yielded by ASP and multithreading parameter
- no statistics file to write by default
- timer for output writing
- config allow user to define the number of CPU for clingo to use
- simplify ASP code by removing useless arg in block/4 atoms
- 8.4
- replace ASP code constraint to achieve much more efficient grounding
- configuration allow for improved lowerbound computation of bicliques
- generated bubble allow client to give heading comments
- handle keyboard interruption during search with grace
- implement timers and statistics recording
- enable multishot motif search by default
- various bugfixes
## Recipes
A recipe is a description of the motifs to compress.
For instance, the [data/recipe-test.txt](data/recipe-test.txt):
biclique c g h i
biclique a b d e
biclique a b f
biclique c d e
Using the `--recipe` option in CLI, user can provide a recipe for its own graph,
allowing to specify prior motifs to search.
The order of second and third columns does not have any influence.
The first column usually contains the motif types separated by comma
(altough it is not necessary), and allows to provide some options (separated by comma):
- `primer`: allows the program to extend the given motif if possible
- `breakable`: allows the program to compress the line with multiple motifs
- `optional`: if not found, ignore instead of stopping
- `last`: stop compression here (useful to avoid the compression process to take over)
The recipe file will be read line by line.
Once all lines have been applied, the normal compression compression takes over
(unless `last` option is used).
For a living example, see [data/recipe-option-test.txt](data/recipe-option-test.txt).
## Options
Description of all powergrasp options.
All values example are given for INI format.
### test integrity
Run some integrity tests during runtime to ensure that compression is working well.
May slow the compression a lot, and is mainly a debugging tool.
Default value:
test_integrity = true
### show story
Print in stdout the main steps of compression.
Default value:
show_story = true
### show motif handling
Print in stdout the motif transformation.
Default value:
show_motif_handling = false
### timers
Maintain and print in stdout some timers.
Default value:
timers = true
### statistics file
A file in which some statistics will be written in CSV format, giving among others size of compressed motifs, and their compression times (if *timers* option is enabled).
Default value:
statistics_file = None
Example value:
statistics_file = statistics.csv
### cc statistic file
Filename in which the statistics about each connected component will be written.
Data will contain one row for each connected component, with the following data:
- connected component index
- the convertion rate
- the edge reduction
- the compression rate
- time needed to compress the connected component (if TIMERS is enabled)
Default value stands for *no file*:
cc_statistic_file = None
Example value:
cc_statistic_file = statistics-cc.csv
### global statistics
When enabled, statistics and metrics computation on the overall graph will be performed at the end of compression.
Default value:
global_statistics = True
### bubble with statistics
When enabled, output bubble will contain statistics about each connected component
and the overall graph (if global_statistics is enabled).
Default value:
bubble_with_statistics = True
### bubble for each step
Generate and save a bubble representation of the graph at each step.
Mainly used for debugging.
Default value:
bubble_for_each_step = false
### output node prefix
Prefix to add to all (power)nodes names in output.
Default value:
output_node_prefix = ''
### show debug
Show full trace of the compression. Useful for debugging.
Default value:
show_debug = False
### covered edges from ASP
Recover covered edges from ASP. If falsy, will ask motif searcher to compute the edges, which may be quicker.
Default value:
covered_edges_from_asp = False
### bubble with nodes
Nodes and sets are optional in the output bubble.
Default value:
bubble_with_nodes = True
bubble_with_sets = True
### bubble poweredge factor
Edges in bubble are associated to a factor.
Default value:
bubble_poweredge_factor = '1.0'
bubble_edge_factor = '1.0'
### bubble embeds cc
If enabled, put each connected component in a dedicated powernode.
Default value:
bubble_embeds_cc = no
### bubble simplify quotes
When possible, delete the quotes around identifiers in the bubble. May lead to node name collision.
Default value:
bubble_simplify_quotes = True
### bubble with simple edges
If disabled, will discard simple (i.e. non-power) edges of output.
Default value:
bubble_with_simple_edges = True
### config file
Load options found in given config file, if it exists.
Default value:
config_file = 'powergrasp.cfg'
### multishot motif search
Search for multiple motif in a single search.
Accelerate the solving for graph with lots of equivalent motifs. It is generally a good option to enable.
Default value:
multishot_motif_search = True
### biclique lowerbound maxnei
Optimization on biclique lowerbound computation. Can be costly. Deactivate with 2. With value at n, up to n neighbors are considered.
Default value:
biclique_lowerbound_maxnei = 2
### clingo options
Arbitrary parameters to give to clingo (note that some, like multithreading or optmode, may already be set by other options).
Default value:
clingo_options = {}
Give a particular configuration for bicliques search:
clingo_options = {'no-star-biclique': '--configuration=handy'}
### clingo multithreading
Number of CPU available to clingo, or 0 for autodetect number of CPU.
Default value:
clingo_multithreading = 1
Use as many thread there are available CPU:
clingo_multithreading = 0
Specify a competing search between 4 threads:
clingo_multithreading = '4,compete'
### parallel cc compression
To use to compress connected components in different processes.
Default value (optimized in memory):
parallel_cc_compression = 1
Compress with 4 processes :
parallel_cc_compression = 4
Compress with one process for each connected component :
parallel_cc_compression = 0
### use star motif
Two different motifs for stars and bicliques, so the search space for bicliques is smaller. Yields good performance improvements on big graphs.
Default value:
use_star_motif = True
### optimize for memory
When possible, prefer memory over CPU.
Default value:
optimize_for_memory = False
### graph filtering
Ignore edges dynamically determined as impossible to compress.
Default value:
graph_filtering = True
### keep single nodes
If a node is found to be alone in its connected component, it will be kept nonetheless.
Default value:
keep_single_nodes = True
### parallel motif search
Use multithreading to search for all motifs in the same time, instead of sequentially.
Interestingly, this yields very poor results, and it's even worse with multiprocessing.
parallel_motif_search = False
### motif type order
Define in which order the motifs are searched, e.g. cliques, then bicliques then stars.
motif_type_order = star,clique,non-star-biclique,biclique
Instead of expliciting the exact motif names and order, it is also possible to specify a bound-based order:
motif_type_order = greatest-upperbound-first
Or any variation with `worst`, `lowerbound` and `last`.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
powergrasp-0.8.18.tar.gz
(34.3 kB
view details)
Built Distribution
File details
Details for the file powergrasp-0.8.18.tar.gz
.
File metadata
- Download URL: powergrasp-0.8.18.tar.gz
- Upload date:
- Size: 34.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5b8b359ec03407633a8d35939c71f17d36111e3c767c13258f52831e3a77bcf5 |
|
MD5 | 6534c64ba470abfab724b57d6535d950 |
|
BLAKE2b-256 | ea86ddcbbdfe099a5fb03e65926e08e42a6d0a38701777d2635b4cd8d666eb5a |
File details
Details for the file powergrasp-0.8.18-py3-none-any.whl
.
File metadata
- Download URL: powergrasp-0.8.18-py3-none-any.whl
- Upload date:
- Size: 41.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1ccb6c648162a9e52221a45cb7bec4e980b645caa81b4edc8e84be2b5768718b |
|
MD5 | 506b05fe64f694e76b6bc50682bd9a30 |
|
BLAKE2b-256 | 2d6b742421567b90d4223f35a8804d7b69dd92ce7a53d641e6549119e94f33fb |