compress graphs with answer-set-programming
Project description
# PowerGrASP
Graph compression.
Note that this is a full reimplementation of PowerGrASP,
taking advantage of ASP and Python lifting and simplifications.
For the version published in 2017, see [this repository](https://github.com/aluriak/PowerGrASP-1).
## CLI
python -m powergrasp mygraph.gml -o compressed.bbl
### help !
python -m powergrasp --help
## API
```python
import powergrasp
with open('compressed.bbl', 'w') as fd:
for line in powergrasp.compress_by_cc('mygraph.gml'):
fd.write(line + '\n')
```
### help !
Sorry, no technical doc for the moment.
## Configuration
PowerGrASP has some [configuration values](powergrasp/constants.py),
that can be overwritten by a `powergrasp.cfg` file placed in the working directory.
The configuration may be printed in stdout using:
python -m powergrasp --config
The config file may be either in ini or json format, as shown in [`test/powergrasp.oneshot.cfg`](test/powergrasp.oneshot.cfg) and [`test/powergrasp.manyoptions.cfg`](test/powergrasp.manyoptions.cfg).
Configuration allow user to define how much text will be outputed by powergrasp,
and to tune the core compression and related optimizations.
See complete list in the [options section](#Options).
## installation
pip install powergrasp
On random error, try adding `--no-cache-dir` at the end of the command.
## TODO
- [x] unit tests
- [x] CLI
- [x] run on big graph
- [x] search multiple motif in the same run (speed up observed on bio graph)
- [x] timers for solving, full run, output writing
- [ ] timers for extraction
- [ ] technical documentation
- [ ] search and specific compression of trees
- [ ] search and specific compression of triangle-free graphs
- [x] search multiple motif in the same run (speed up observed on bio graph)
## Changelog
- 8.10
- option [parallel cc compression](#parallel-cc-compression) to enable parallel compression of connected components
- option [bubble embeds cc](#bubble-embeds-cc), to put each cc in a dedicated powernode
- option [bubble with simple edges](#bubble-with-simple-edges), to keep or discard the simple edges from output
- statistics on time and compression metrics are computed by cc and for all graph. See new options.
- option [cc statistic file](#cc-statistic-file), to retrieve stats and metrics for connected components.
- option [global statistics](#global-statistics), to compute statistics/metrics over all connected components.
- option [bubble with statistics](#bubble-with-statistics), to include stats and metrics in output bubble file
- [motif type order](#motif-type-order) option, allowing user to tune the order in which motifs are searched.
- [parallel motif search](#parallel-motif-search) option, making motifs search in different threads, and the overhaul compression much slower.
- CLI: --config flag to print the full configuration in stdout
- handling spaces instead of _ in fields names for INI format
- do not filter out nodes alone in their connected component (cf [keep-single-nodes option](#keep-single-nodes))
- perf gain: the first motif to reach the best score is compressed, instead of the last
- improve handling of statistic files, that now contains cc idx and motif bounds
- expose `__version__` attribute in the package
- add a list of available options in the README
- graph filtering: optimization leading to general performance gain when enabled
- improvements on {low,upp}erbounds computations
- 8.9
- many bugfixes
- CLINGO_OPTIONS can be a dict, mapping motif name to arbitrary parameters
- 8.8
- bugfix: lowerbound can't go under 2 (3 for cliques)
- clique search: lowerbound is initially computed as the maximal degree of node of clustering coefficient equals to 1
- use new versions of clyngor and bubbletools
- 8.7
- raise error when an invalid key is found in config file
- StarSearch is dedicated to find stars, simplifying BicliqueSearch's job (enabled by default, user may use Biclique instead of both Star and NonStarBiclique)
- specify a prefix for all (power) nodes with OUTPUT_NODE_PREFIX config field
- arbitrary options for clingo using CLINGO_OPTIONS config field
- bugfix on string options given in ini file, when unecessary quotes are kept
- 8.6
- bugfixes for INI format
- 8.5
- config may now be given in INI format
- improve logging
- config allow optimization target between memory or CPU (currently define a constraint implementation in ASP)
- bugfix about edges yielded by ASP and multithreading parameter
- no statistics file to write by default
- timer for output writing
- config allow user to define the number of CPU for clingo to use
- simplify ASP code by removing useless arg in block/4 atoms
- 8.4
- replace ASP code constraint to achieve much more efficient grounding
- configuration allow for improved lowerbound computation of bicliques
- generated bubble allow client to give heading comments
- handle keyboard interruption during search with grace
- implement timers and statistics recording
- enable multishot motif search by default
- various bugfixes
## Options
Description of all powergrasp options.
All values example are given for INI format.
### test integrity
Run some integrity tests during runtime to ensure that compression is working well.
May slow the compression a lot, and is mainly a debugging tool.
Default value:
test_integrity = true
### show story
Print in stdout the main steps of compression.
Default value:
show_story = true
### show motif handling
Print in stdout the motif transformation.
Default value:
show_motif_handling = false
### timers
Maintain and print in stdout some timers.
Default value:
timers = true
### statistics file
A file in which some statistics will be written in CSV format, giving among others size of compressed motifs, and their compression times (if *timers* option is enabled).
Default value:
statistics_file = None
Example value:
statistics_file = statistics.csv
### cc statistic file
Filename in which the statistics about each connected component will be written.
Data will contain one row for each connected component, with the following data:
- connected component index
- the convertion rate
- the edge reduction
- the compression rate
- time needed to compress the connected component (if TIMERS is enabled)
Default value stands for *no file*:
cc_statistic_file = None
Example value:
cc_statistic_file = statistics-cc.csv
### global statistics
When enabled, statistics and metrics computation on the overall graph will be performed at the end of compression.
Default value:
global_statistics = True
### bubble with statistics
When enabled, output bubble will contain statistics about each connected component
and the overall graph (if global_statistics is enabled).
Default value:
bubble_with_statistics = True
### bubble for each step
Generate and save a bubble representation of the graph at each step.
Mainly used for debugging.
Default value:
bubble_for_each_step = false
### output node prefix
Prefix to add to all (power)nodes names in output.
Default value:
output_node_prefix = ''
### show debug
Show full trace of the compression. Useful for debugging.
Default value:
show_debug = False
### covered edges from ASP
Recover covered edges from ASP. If falsy, will ask motif searcher to compute the edges, which may be quicker.
Default value:
covered_edges_from_asp = False
### bubble with nodes
Nodes and sets are optional in the output bubble.
Default value:
bubble_with_nodes = True
bubble_with_sets = True
### bubble poweredge factor
Edges in bubble are associated to a factor.
Default value:
bubble_poweredge_factor = '1.0'
bubble_edge_factor = '1.0'
### bubble embeds cc
If enabled, put each connected component in a dedicated powernode.
Default value:
bubble_embeds_cc = no
### bubble simplify quotes
When possible, delete the quotes around identifiers in the bubble. May lead to node name collision.
Default value:
bubble_simplify_quotes = True
### bubble with simple edges
If disabled, will discard simple (i.e. non-power) edges of output.
Default value:
bubble_with_simple_edges = True
### config file
Load options found in given config file, if it exists.
Default value:
config_file = 'powergrasp.cfg'
### multishot motif search
Search for multiple motif in a single search.
Accelerate the solving for graph with lots of equivalent motifs. It is generally a good option to enable.
Default value:
multishot_motif_search = True
### biclique lowerbound maxnei
Optimization on biclique lowerbound computation. Can be costly. Deactivate with 2. With value at n, up to n neighbors are considered.
Default value:
biclique_lowerbound_maxnei = 2
### clingo options
Arbitrary parameters to give to clingo (note that some, like multithreading or optmode, may already be set by other options).
Default value:
clingo_options = {}
Give a particular configuration for bicliques search:
clingo_options = {'no-star-biclique': '--configuration=handy'}
### clingo multithreading
Number of CPU available to clingo, or 0 for autodetect number of CPU.
Default value:
clingo_multithreading = 1
Use as many thread there are available CPU:
clingo_multithreading = 0
Specify a competing search between 4 threads:
clingo_multithreading = '4,compete'
### parallel cc compression
To use to compress connected components in different processes.
Default value (optimized in memory):
parallel_cc_compression = 1
Compress with 4 processes :
parallel_cc_compression = 4
Compress with one process for each connected component :
parallel_cc_compression = 0
### use star motif
Two different motifs for stars and bicliques, so the search space for bicliques is smaller. Yields good performance improvements on big graphs.
Default value:
use_star_motif = True
### optimize for memory
When possible, prefer memory over CPU.
Default value:
optimize_for_memory = False
### graph filtering
Ignore edges dynamically determined as impossible to compress.
Default value:
graph_filtering = True
### keep single nodes
If a node is found to be alone in its connected component, it will be kept nonetheless.
Default value:
keep_single_nodes = True
### parallel motif search
Use multithreading to search for all motifs in the same time, instead of sequentially.
Interestingly, this yields very poor results, and it's even worse with multiprocessing.
parallel_motif_search = False
### motif type order
Define in which order the motifs are searched, e.g. cliques, then bicliques then stars.
motif_type_order = star,clique,non-star-biclique,biclique
Instead of expliciting the exact motif names and order, it is also possible to specify a bound-based order:
motif_type_order = greatest-upperbound-first
Or any variation with `worst`, `lowerbound` and `last`.
Graph compression.
Note that this is a full reimplementation of PowerGrASP,
taking advantage of ASP and Python lifting and simplifications.
For the version published in 2017, see [this repository](https://github.com/aluriak/PowerGrASP-1).
## CLI
python -m powergrasp mygraph.gml -o compressed.bbl
### help !
python -m powergrasp --help
## API
```python
import powergrasp
with open('compressed.bbl', 'w') as fd:
for line in powergrasp.compress_by_cc('mygraph.gml'):
fd.write(line + '\n')
```
### help !
Sorry, no technical doc for the moment.
## Configuration
PowerGrASP has some [configuration values](powergrasp/constants.py),
that can be overwritten by a `powergrasp.cfg` file placed in the working directory.
The configuration may be printed in stdout using:
python -m powergrasp --config
The config file may be either in ini or json format, as shown in [`test/powergrasp.oneshot.cfg`](test/powergrasp.oneshot.cfg) and [`test/powergrasp.manyoptions.cfg`](test/powergrasp.manyoptions.cfg).
Configuration allow user to define how much text will be outputed by powergrasp,
and to tune the core compression and related optimizations.
See complete list in the [options section](#Options).
## installation
pip install powergrasp
On random error, try adding `--no-cache-dir` at the end of the command.
## TODO
- [x] unit tests
- [x] CLI
- [x] run on big graph
- [x] search multiple motif in the same run (speed up observed on bio graph)
- [x] timers for solving, full run, output writing
- [ ] timers for extraction
- [ ] technical documentation
- [ ] search and specific compression of trees
- [ ] search and specific compression of triangle-free graphs
- [x] search multiple motif in the same run (speed up observed on bio graph)
## Changelog
- 8.10
- option [parallel cc compression](#parallel-cc-compression) to enable parallel compression of connected components
- option [bubble embeds cc](#bubble-embeds-cc), to put each cc in a dedicated powernode
- option [bubble with simple edges](#bubble-with-simple-edges), to keep or discard the simple edges from output
- statistics on time and compression metrics are computed by cc and for all graph. See new options.
- option [cc statistic file](#cc-statistic-file), to retrieve stats and metrics for connected components.
- option [global statistics](#global-statistics), to compute statistics/metrics over all connected components.
- option [bubble with statistics](#bubble-with-statistics), to include stats and metrics in output bubble file
- [motif type order](#motif-type-order) option, allowing user to tune the order in which motifs are searched.
- [parallel motif search](#parallel-motif-search) option, making motifs search in different threads, and the overhaul compression much slower.
- CLI: --config flag to print the full configuration in stdout
- handling spaces instead of _ in fields names for INI format
- do not filter out nodes alone in their connected component (cf [keep-single-nodes option](#keep-single-nodes))
- perf gain: the first motif to reach the best score is compressed, instead of the last
- improve handling of statistic files, that now contains cc idx and motif bounds
- expose `__version__` attribute in the package
- add a list of available options in the README
- graph filtering: optimization leading to general performance gain when enabled
- improvements on {low,upp}erbounds computations
- 8.9
- many bugfixes
- CLINGO_OPTIONS can be a dict, mapping motif name to arbitrary parameters
- 8.8
- bugfix: lowerbound can't go under 2 (3 for cliques)
- clique search: lowerbound is initially computed as the maximal degree of node of clustering coefficient equals to 1
- use new versions of clyngor and bubbletools
- 8.7
- raise error when an invalid key is found in config file
- StarSearch is dedicated to find stars, simplifying BicliqueSearch's job (enabled by default, user may use Biclique instead of both Star and NonStarBiclique)
- specify a prefix for all (power) nodes with OUTPUT_NODE_PREFIX config field
- arbitrary options for clingo using CLINGO_OPTIONS config field
- bugfix on string options given in ini file, when unecessary quotes are kept
- 8.6
- bugfixes for INI format
- 8.5
- config may now be given in INI format
- improve logging
- config allow optimization target between memory or CPU (currently define a constraint implementation in ASP)
- bugfix about edges yielded by ASP and multithreading parameter
- no statistics file to write by default
- timer for output writing
- config allow user to define the number of CPU for clingo to use
- simplify ASP code by removing useless arg in block/4 atoms
- 8.4
- replace ASP code constraint to achieve much more efficient grounding
- configuration allow for improved lowerbound computation of bicliques
- generated bubble allow client to give heading comments
- handle keyboard interruption during search with grace
- implement timers and statistics recording
- enable multishot motif search by default
- various bugfixes
## Options
Description of all powergrasp options.
All values example are given for INI format.
### test integrity
Run some integrity tests during runtime to ensure that compression is working well.
May slow the compression a lot, and is mainly a debugging tool.
Default value:
test_integrity = true
### show story
Print in stdout the main steps of compression.
Default value:
show_story = true
### show motif handling
Print in stdout the motif transformation.
Default value:
show_motif_handling = false
### timers
Maintain and print in stdout some timers.
Default value:
timers = true
### statistics file
A file in which some statistics will be written in CSV format, giving among others size of compressed motifs, and their compression times (if *timers* option is enabled).
Default value:
statistics_file = None
Example value:
statistics_file = statistics.csv
### cc statistic file
Filename in which the statistics about each connected component will be written.
Data will contain one row for each connected component, with the following data:
- connected component index
- the convertion rate
- the edge reduction
- the compression rate
- time needed to compress the connected component (if TIMERS is enabled)
Default value stands for *no file*:
cc_statistic_file = None
Example value:
cc_statistic_file = statistics-cc.csv
### global statistics
When enabled, statistics and metrics computation on the overall graph will be performed at the end of compression.
Default value:
global_statistics = True
### bubble with statistics
When enabled, output bubble will contain statistics about each connected component
and the overall graph (if global_statistics is enabled).
Default value:
bubble_with_statistics = True
### bubble for each step
Generate and save a bubble representation of the graph at each step.
Mainly used for debugging.
Default value:
bubble_for_each_step = false
### output node prefix
Prefix to add to all (power)nodes names in output.
Default value:
output_node_prefix = ''
### show debug
Show full trace of the compression. Useful for debugging.
Default value:
show_debug = False
### covered edges from ASP
Recover covered edges from ASP. If falsy, will ask motif searcher to compute the edges, which may be quicker.
Default value:
covered_edges_from_asp = False
### bubble with nodes
Nodes and sets are optional in the output bubble.
Default value:
bubble_with_nodes = True
bubble_with_sets = True
### bubble poweredge factor
Edges in bubble are associated to a factor.
Default value:
bubble_poweredge_factor = '1.0'
bubble_edge_factor = '1.0'
### bubble embeds cc
If enabled, put each connected component in a dedicated powernode.
Default value:
bubble_embeds_cc = no
### bubble simplify quotes
When possible, delete the quotes around identifiers in the bubble. May lead to node name collision.
Default value:
bubble_simplify_quotes = True
### bubble with simple edges
If disabled, will discard simple (i.e. non-power) edges of output.
Default value:
bubble_with_simple_edges = True
### config file
Load options found in given config file, if it exists.
Default value:
config_file = 'powergrasp.cfg'
### multishot motif search
Search for multiple motif in a single search.
Accelerate the solving for graph with lots of equivalent motifs. It is generally a good option to enable.
Default value:
multishot_motif_search = True
### biclique lowerbound maxnei
Optimization on biclique lowerbound computation. Can be costly. Deactivate with 2. With value at n, up to n neighbors are considered.
Default value:
biclique_lowerbound_maxnei = 2
### clingo options
Arbitrary parameters to give to clingo (note that some, like multithreading or optmode, may already be set by other options).
Default value:
clingo_options = {}
Give a particular configuration for bicliques search:
clingo_options = {'no-star-biclique': '--configuration=handy'}
### clingo multithreading
Number of CPU available to clingo, or 0 for autodetect number of CPU.
Default value:
clingo_multithreading = 1
Use as many thread there are available CPU:
clingo_multithreading = 0
Specify a competing search between 4 threads:
clingo_multithreading = '4,compete'
### parallel cc compression
To use to compress connected components in different processes.
Default value (optimized in memory):
parallel_cc_compression = 1
Compress with 4 processes :
parallel_cc_compression = 4
Compress with one process for each connected component :
parallel_cc_compression = 0
### use star motif
Two different motifs for stars and bicliques, so the search space for bicliques is smaller. Yields good performance improvements on big graphs.
Default value:
use_star_motif = True
### optimize for memory
When possible, prefer memory over CPU.
Default value:
optimize_for_memory = False
### graph filtering
Ignore edges dynamically determined as impossible to compress.
Default value:
graph_filtering = True
### keep single nodes
If a node is found to be alone in its connected component, it will be kept nonetheless.
Default value:
keep_single_nodes = True
### parallel motif search
Use multithreading to search for all motifs in the same time, instead of sequentially.
Interestingly, this yields very poor results, and it's even worse with multiprocessing.
parallel_motif_search = False
### motif type order
Define in which order the motifs are searched, e.g. cliques, then bicliques then stars.
motif_type_order = star,clique,non-star-biclique,biclique
Instead of expliciting the exact motif names and order, it is also possible to specify a bound-based order:
motif_type_order = greatest-upperbound-first
Or any variation with `worst`, `lowerbound` and `last`.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
powergrasp-0.8.10.tar.gz
(28.1 kB
view hashes)
Built Distribution
Close
Hashes for powergrasp-0.8.10-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 21f0868b6e8396e1a3a1e019a8a51e29000c7c97c8937486e1dc61646cafb3eb |
|
MD5 | e5d2cf2adef9847c6b34a84210487c5e |
|
BLAKE2b-256 | 87711efd04e26fa0a6ab12a65b40deed64995926fdad04bdfad55438b7d6617a |