Skip to main content

No project description provided

Project description

AlgBench: A Python-util to run benchmarks for the empirical evaluation of algorithms.

There are a number of challenges when performing benchmarks for (long-running) algorithms.

  • Saving all information requires a lot of boilerplate code and often you forget something.
  • If you add some further instances or want to compare an additional parameter, you have to check which data is already available to skip existing entries. Same if you need to interrupt the benchmark.
  • Just piping the results into a file can create a huge amount of data, no longer fitting into memory.
  • Proper benchmarks often take days or even weeks to run, such that parallelization is necessary (e.g., with slurm) which requires a thread-safe database.
  • Many file formats and databases are difficult to access or impossible to repair once corrupted.

AlgBench tries to ease your life by

  • saving a lot of the information and data (function arguments, return values, runtime, environment information, stdout, etc.) automatically with a single line of code
  • remembering which function arguments have already run and skipping those
  • providing a compressible database to save memory, and saving highly redundant information, e.g., of the environment, only once
  • providing an NFS-compatible parallel database and compatibility to distribution libraries, such as slurminade
  • using a simple format based on JSON and Zip to allow simple parsing and even repairing broken databases by hand

There is a predecessor project, called AeMeasure. AeMeasure made saving the data easy, but required more boilerplate code and reading the data was more difficult and less efficient.

Other things you should know about for empirical/experimental evaluations

The following tools I consider essential for empirical evaluations (of algorithms):

  • pandas: Simple and powerful tool for working with data tables. Do your experiments and parse the important data into a pandas DataFrame.
  • seaborn and matplotlib: Creating beautiful plots from pandas DataFrames with little work.
  • JupyterLab: Interactive Python+Markdown documents. Great for analyzing data and sharing the insights. Works great with pandas and seaborn.

AlgBench essentially takes over the part of saving the information from the runs and allowing you to easily extract pandas DataFrames from it. For very simple studies, you could also directly save your data into a Pandas DataFrame but even for nearly every serious experiment, you run into the problems mentioned in the beginning.

Note that the actual algorithms can also be writen in another, more efficient programming language. It is reasonably easy to create Python-bindings, e.g., for C++ with PyBind11, or just call the binaries with Python.

Publishable evaluations often require extensive experiments that are best performed on a cluster of shared workstations. Many institutes and companies are using slurm to schedule and distribute the workloads. The data is usually shared via a network file system (NFS), for which AlgBench is designed. While you usually also have databases available, they are not made for just dumping all the data you may need for analyzis and potentially debugging into. We developed an additonal tool slurminade that allows you to distribute your experiments with just a few additional lines. You can see this in an example: original script vs script with slurminade.

Let me further recommend the books A Guide To Experimental Algorithmics by Catherine McGeoch here that gives a good introduction into the big picture of performing empirical evluations for algorithms. If you want to know more about actually implementing complex algorithms for difficult problems, I recommend to read In Pursuit of the Traveling Salesman by Bill Cook or The Traveling Salesman Problem: A Computational Study by Appelgate et al. to really go into details. The Traveling Salesman Problem is an excellent example for this because it is probably had gotten the most attention of any NP-hard combinatorial problems. However, it can also be intimidating as you probably won't have the funds to look into any problem as deep as the Travelings Salesman Problem has been looked at. Maybe you want to read some papers from the SIAM Symposium on Algorithm Engineering and Experiments (ALENEX) to see how smaller studies can be performed (though, for most papers you will find aspects that could be improved).

Installation

You can install AlgBench using pip

pip install -U algbench

Usage

There is one important class Benchmark to run the benchmark, and two important functions describe and read_as_pandas to analyze the results.

 
class Benchmark(builtins.object)
    Benchmark(path: str) -> None
 
This is the heart of the library. It allows to run, save, and load
a benchmark.
 
The function `add` will run a configuration, if it is not
already in the database. You can also split this into `check` and
`run`. This may be advised if you want to distribute the execution.
 
```python
from algbench import Benchmark
 
benchmark = Benchmark("./test_benchmark")
 
 
def f(x, _test=2, default="default"):
    print(x)  # here you would run your algorithm
    return {"r1": x, "r2": "test"}
 
 
benchmark.add(f, 1, _test=None)
benchmark.add(f, 2)
benchmark.add(f, 3, _test=None)
 
benchmark.compress()
 
for entry in benchmark:
    print(entry["parameters"], entry["data"])
```
 
The following functions are thread-safe:
- exists
- run
- add
- insert
- front
- __iter__
 
Don't call any of the other functions while the benchmark is
running. It could lead to data loss.
 
  Methods defined here:
__init__(self, path: str) -> None
Just specify the path of where to put the
database and everything else happens magically.
Make sure not to use the same path for different
databases, as they will get mixed.
__iter__(self) -> Generator[Dict, NoneType, NoneType]
Iterate over all entries in the benchmark.
Use `front` to get a preview on how an entry looks like.
add(self, func: Callable, *args, **kwargs)
Will add the function call with the arguments
to the benchmark if not yet contained.
 
Combination of `check` and `run`.
Will only call `run` if the arguments are not
yet in the benchmark.
clear(self)
Clears all entries of the benchmark, without deleting
the benchmark itself. You can continue to use it afterwards.
 
NOT THREAD-SAFE!
compress(self)
Compress the data of the benchmark to take less disk space.
 
NOT THREAD-SAFE!
delete(self)
Delete the benchmark and all its files. Do not use it afterwards,
there are no files left to write results into.
If you just want to delete the content, use `clear.
 
NOT THREAD-SAFE!
delete_if(self, condition: Callable[[Dict], bool])
Delete entries if a specific condition is met.
This is currently inefficiently, as always a copy
of the benchmark is created.
Use `front` to get a preview on how an entry that is
passed to the condition looks like.
 
NOT THREAD-SAFE!
exists(self, func: Callable, *args, **kwargs) -> bool
Use this function to check if an entry already exist and thus
does not have to be run again. If you want to have multiple
samples, add a sample index argument.
Caveat: This function may have false negatives. i.e., says that it
  does not exist despite it existing (only for fresh data).
front(self) -> Optional[Dict]
Return the first entry of the benchmark.
Useful for checking its content.
insert(self, entry: Dict)
Insert a raw entry, as returned by `__iter__` or `front`.
repair(self)
Repairs the benchmark in case it has some broken entries.
 
NOT THREAD-SAFE!
run(self, func: Callable, *args, **kwargs)
Will add the function call with the arguments
to the benchmark.

Data descriptors defined here:
__dict__
dictionary for instance variables (if defined)
__weakref__
list of weak references to the object (if defined)

You can find an example for graph coloring. The important parts are shown below.

Running a benchmark

from _utils import InstanceDb
from algbench import Benchmark
import networkx as nx

benchmark = Benchmark("03_benchmark_data")
instances = InstanceDb("./01_instances.zip")


def load_instance_and_run(instance_name: str, alg_params):
    # load the instance outside the actual measurement
    g = instances[instance_name]

    def eval_greedy_alg(instance_name: str, alg_params, _instance: nx.Graph):
        # arguments starting with `_` are not saved.
        coloring = nx.coloring.greedy_coloring.greedy_color(_instance, **alg_params)
        return {  # the returned values are saved to the database
            "num_vertices": _instance.number_of_nodes(),
            "num_edges": _instance.number_of_edges(),
            "coloring": coloring,
            "n_colors": max(coloring.values()) + 1,
        }

    benchmark.add(eval_greedy_alg, instance_name, alg_params, g)


alg_params_to_evaluate = [
    {"strategy": "largest_first", "interchange": True},
    {"strategy": "largest_first", "interchange": False},
    {"strategy": "random_sequential", "interchange": True},
    {"strategy": "random_sequential", "interchange": False},
    {"strategy": "smallest_last", "interchange": True},
    {"strategy": "smallest_last", "interchange": False},
    {"strategy": "independent_set"},
    {"strategy": "connected_sequential_bfs", "interchange": True},
    {"strategy": "connected_sequential_bfs", "interchange": False},
    {"strategy": "connected_sequential_dfs", "interchange": True},
    {"strategy": "connected_sequential_dfs", "interchange": False},
    {"strategy": "saturation_largest_first"},
]

if __name__ == "__main__":
    for instance_name in instances:
        print(instance_name)
        for conf in alg_params_to_evaluate:
            load_instance_and_run(instance_name, conf)
    benchmark.compress()

Analyzing the data

from algbench import describe, read_as_pandas, Benchmark

describe("./03_benchmark_data/")
Output
 result:
| num_vertices: 68
| num_edges: 697
| coloring:
|| 0: 7
|| 1: 8
|| 2: 2
|| 3: 5
|| 4: 3
|| 5: 7
|| 6: 7
|| 7: 6
|| 8: 5
|| 9: 4
|| 10: 5
|| 11: 4
|| 12: 0
|| 13: 6
|| 14: 0
|| 15: 3
|| 16: 5
|| 17: 5
|| 18: 7
|| 19: 0
|| ...
| n_colors: 9
 timestamp: 2023-05-25T21:58:39.201553
 runtime: 0.002952098846435547
 stdout: 
 stderr: 
 env_fingerprint: 53ad3b5b29d082d7e2bca6881ec9fe35fe441ae1
 args_fingerprint: 10ce65b7a61d5ecbfcb1f4e390d72122f7a1f6ec
 parameters:
| func: eval_greedy_alg
| args:
|| instance_name: graph_0
|| alg_params:
||| strategy: largest_first
||| interchange: True
 argv: ['02_run_benchmark.py']
 env:
| hostname: workstation-r7
| python_version: 3.10.9 (main, Jan 11 2023, 15:21:40) [GCC 11.2.0]
| python: /home/krupke/anaconda3/envs/mo310/bin/python3
| cwd: /home/krupke/Repositories/AlgBench/examples/graph_coloring
| environment: [{'name': 'virtualenv', 'path': '/home/krupke/.local/lib/python3.10/site-pack...
| git_revision: 5357426feb4b49174c313ffa33e2cadf6a83e226
| python_file: /home/krupke/Repositories/AlgBench/examples/graph_coloring/02_run_benchmark.py
# we can also see the raw data of the first entry using `front`
Benchmark("./03_benchmark_data/").front()
Output
{'result': {'num_vertices': 68,
  'num_edges': 697,
  'coloring': {'0': 7,
   '1': 8,
   '2': 2,
   '3': 5,
   '4': 3,
   '5': 7,
   '6': 7,
   '7': 6,
   '8': 5,
   '9': 4,
   '10': 5,
   '11': 4,
   '12': 0,
   '13': 6,
   '14': 0,
   '15': 3,
   '16': 5,
   '17': 5,
   '18': 7,
   '19': 0,
   '20': 2,
   '21': 3,
    ...},
  'n_colors': 9},
 'timestamp': '2023-05-25T21:58:39.201553',
 'runtime': 0.002952098846435547,
 'stdout': '',
 'stderr': '',
 'env_fingerprint': '53ad3b5b29d082d7e2bca6881ec9fe35fe441ae1',
 'args_fingerprint': '10ce65b7a61d5ecbfcb1f4e390d72122f7a1f6ec',
 'parameters': {'func': 'eval_greedy_alg',
  'args': {'instance_name': 'graph_0',
   'alg_params': {'strategy': 'largest_first', 'interchange': True}}},
 'argv': ['02_run_benchmark.py'],
 'env': {'hostname': 'workstation-r7',
  'python_version': '3.10.9 (main, Jan 11 2023, 15:21:40) [GCC 11.2.0]',
  'python': '/home/krupke/anaconda3/envs/mo310/bin/python3',
  'cwd': '/home/krupke/Repositories/AlgBench/examples/graph_coloring',
  'environment': [{'name': 'virtualenv',
    'path': '/home/krupke/.local/lib/python3.10/site-packages',
    'version': '20.14.1'},
   {'name': 'cfgv',
    'path': '/home/krupke/.local/lib/python3.10/site-packages',
    'version': '3.3.1'},
  ...],
  'git_revision': '5357426feb4b49174c313ffa33e2cadf6a83e226',
  'python_file': '/home/krupke/Repositories/AlgBench/examples/graph_coloring/02_run_benchmark.py'}}
# we can extract a full pandas tables using `read_as_pandas`
t = read_as_pandas(
    "./03_benchmark_data/",
    lambda result: {
        "instance": result["parameters"]["args"]["instance_name"],
        "strategy": result["parameters"]["args"]["alg_params"]["strategy"],
        "interchange": result["parameters"]["args"]["alg_params"].get(
            "interchange", None
        ),
        "colors": result["result"]["n_colors"],
        "runtime": result["runtime"],
        "num_vertices": result["result"]["num_vertices"],
        "num_edges": result["result"]["num_edges"],
    },
)
print(t)
Output
       instance                  strategy interchange  colors   runtime ...
0       graph_0             largest_first        True       9  0.002952  
1       graph_0             largest_first       False      10  0.000183
2       graph_0         random_sequential        True       9  0.003562
3       graph_0         random_sequential       False      12  0.000173
4       graph_0             smallest_last        True       9  0.003813
...         ...                       ...         ...     ...       ...
5995  graph_499  connected_sequential_bfs        True       3  0.000216
5996  graph_499  connected_sequential_bfs       False       3  0.000132
5997  graph_499  connected_sequential_dfs        True       3  0.000231
5998  graph_499  connected_sequential_dfs       False       4  0.000132
5999  graph_499  saturation_largest_first        None       3  0.000202


[6000 rows x 7 columns]

Which information is saved?

The following information is saved automatically:

  • function name
  • all arguments that do not begin with "_"
  • the returned values
  • runtime
  • current date and time
  • hostname
  • Python version
  • Python binary path
  • current working directory
  • stdout and stderr
  • all installed modules and their versions
  • git revision
  • path of the python file

On doing good empirical evaluations of algorithms

To get a feeling on the interesting instances and parameters, or generally on where to look deeper, you should first perform an explorative study. For such an explorative study, you should select some random parameters and instances, and just look how the numbers look. Iteratively change the parameters and instances, until you know what to evaluate properly. At that point, you can state some research questions and design corresponding workhorse studies to answer them.

Here are some general hints:

  • Create a separate folder for every study.
  • Add a README.md into each folder that describes the study. At least describe in a sentence, who created this study when in which context.
  • Have separated, numerated files for preparing, running, processing, checking, and evaluating the study.
  • Extract a simplified pandas table from the database with only the important data (e.g., stdout or environment information are only necessary for debugging and don't need to be shared for evaluation). You can save pandas tables as .json.zip such that they are small and can simply be added to your Git, even when the full data is too large.
  • The file for checking the generated data should also describe it.
  • Use a separate Jupyter-notebook for each family of plots you want to generate.
  • Save the plots into files whose name you can easily trace back to the generating notebook. You will probably copy them later into some paper and half a year later, when you receive the reviews and want to do some changes, you have to find the code that generated them.

Using Git LFS for the data

The data are large binary files. Use Git LFS to add them to your repository more efficiently.

You can find a guide here on how to install Git LFS.

Run

git lfs install

to set up git LFS and

git lfs track "*.zip"

to manage all zips via LFS.

Alternatively, you can also just edit .gitattributes by hand

*.zip filter=lfs diff=lfs merge=lfs -text

Finally, add .gitattributes to git via

git add .gitattributes

Version History

  • 1.1.0 Some changes for efficiency turned out to be less robust in case of, e.g., keyboard interrupt. Fixed that.
  • 1.0.0 Changing the database layout, making it more efficient (breaking change!).
  • 0.2.0 Changing database slightly to contain meta data and doing more caching. Saving some more information.
  • 0.1.3 Fixed bug in arg fingerprint set.
  • 0.1.2 Fixed bug with empty rows in pandas table.
  • 0.1.1 Fixed bug with delete_if.
  • 0.1.0 First complete version

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

AlgBench-1.1.0.tar.gz (17.7 kB view hashes)

Uploaded Source

Built Distribution

AlgBench-1.1.0-py3-none-any.whl (20.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page