Skip to main content

Estimate Asymptotic Runtime Complexity from Bytecode executions

Project description

The Python Performance Analysis Library (py-pal) is a profiling tool for the Python programming language. With py-pal one can approximate the time complexity (big O notation) of Python functions in an empirical way. The arguments of the function and the executed opcodes serve as a basis for the analysis.

To the docs.

Installation

Requirements

Install py-pal via pip by running:

This project requires CPython and a C compiler to run. Install CPython >= 3.7, then install py-pal by running:

pip install py-pal

or

python -m pip install py-pal

Command line usage of the py-pal module

python -m py_pal <target-module/file>

or

py-pal <target-module/file>

There are multiple aliases to the same command: py-pal, py_pal and pypal. If py-pal is executed this way, all functions called in the code are captured and analyzed. The output is in the form of a pandas data frame.

See the help message:

py-pal -h

py-pal can perform cost analysis on a line-by-line basis:

py-pal <file> -l/–line

The –separate flag can be used to examine the cost function of individual arguments (caution: this function assumes the independence of the arguments):

py-pal <file> -s/–separate

The output of the results can be noisy, to limit this you can use –filter-function to filter the desired functions from the result. Regular expressions are supported:

py-pal <file> -ff/–filter-function .*_sort

Similarly, the result can also be filtered by modules with –filter-module, e.g. to exclude importlib modules

py-pal <file> -fm/–filter-module “^(?!<frozen.*>).*”

To save the results in a specified folder use –out:

py-pal <file> -o/–out results

The output format can be changed with –format:

py-pal <file> -o/–out results –format json

With the additional specification of the –plot flag, the cost functions of the result set are stored as images:

py-pal <file> -o/–out results -p/–plot

For the –log-level flag see the development docs.

Example, creating plots for selected functions:

py-pal tests/examples/sort.py -o results -p -f sort

Programmatic usage of the py-pal module

To profile a single function and get the complexity estimate there is profile_function.

from py_pal.core import profile_function
from py_pal.data_collection.opcode_metric import OpcodeMetric
from py_pal.datagen import gen_random_growing_lists
from algorithms.sort import bubble_sort

profile_function(OpcodeMetric(), gen_random_growing_lists(), bubble_sort)

The profile decorator:

from py_pal.core import profile, DecoratorStore

@profile
def test():
    pass

# Must be called at some point
test()

estimator = AllArgumentEstimator(DecoratorStore.get_call_stats(), DecoratorStore.get_opcode_stats())
res = estimator.export()

By using the profile decorator, it is possible to annotate Python functions such that only the annotated Python functions will be profiled. It acts similar to a whitelist filter.

Another possibility is to use the context-manager protocol:

from py_pal.analysis.estimator import AllArgumentEstimator
from py_pal.data_collection.tracer import Tracer

with Tracer() as t:
    pass

estimator = AllArgumentEstimator(t.get_call_stats(), t.get_opcode_stats())
res = estimator.export()

# Do something with the resulting DataFrame
print(res)

The most verbose way to use the py-pal API:

from py_pal.analysis.estimator import AllArgumentEstimator
from py_pal.data_collection.tracer import Tracer


t = Tracer()
t.trace()

# Your function
pass

t.stop()
estimator = AllArgumentEstimator(t.get_call_stats(), t.get_opcode_stats())
res = estimator.export()

# Do something with the resulting DataFrame
print(res)

All examples instantiate a tracer object that is responsible for collecting the data. After execution, the collected data is passed to the analysis module. Finally, an estimate of the asymptotic runtime of the functions contained in the code is obtained.

Modes

In the current version py-pal offers only the profiling mode. Although py_pal.datagen offers some functions for generating inputs, py-pal must be combined with appropriate test cases to realize a performance testing mode. An automatic detection and generation of appropriate test inputs does not exist at the moment.

Limitations

The profiling approach implemented by the py-pal modules does not distinguish between different threads executing a Python function. Actually it is a major problem to profile a Python script which makes use of threads. The bytecode counting strategy will increase all counters of Python functions on the current call stack no matter what threads is executing it. Thus, the data points will not be accurate to what really happened during the profiled execution of the script.

Licensing Notes

This work integrates some code from the big_O project. More specifically, most code in py_pal.analysis.complexity, py_pal.datagen and py_pal.analysis.estimator.Estimator.infer_complexity is adapted from bigO.

Changelog

What’s New in Py-PAL 1.3.0

Command line interface changes:

  • Renamed -f/–function to -ff/–filter-function

  • Added -fm/–filter-module functionality to filter results by module

Py-PAL 1.2.0

  • Improved the statistics and plotting output

Command line interface changes:

  • Deprecated –save flag in favor of -o/–out

  • Renamed -V/–visualize to -p/–plot

  • Change functionality of -f/–function from executing and profiling a specific function inside a python file to applying the analysis to a selected function. Regular expressions are suported.

Py-PAL 1.1.0

  • Improved Data Collection: The heuristic for determining the size of function arguments has been improved.

  • More tests

  • More documentation

  • More argument generation functions in py_pal.datagen

  • Replaced command line option –debug with –log-level for more configurable log output

Refactoring

Project structure changes, overall CLI interface is unchanged. API changes:

  • py_pal.tracer moved to py_pal.data_collection.tracer

  • py_pal.complexity and py_pal.estimator moved to the py_pal.analysis package.

  • py_pal.analysis.estimator.Estimator now takes call and opcode stats as arguments.

Py-PAL 1.0.0

  • More thorough testing from different combinations of requirements and Python versions.

  • Bug fixes

Py-PAL 0.2.1

Refactoring

The estimator module was refactored which introduces a slight change to the API. Classes inheriting from Estimator now only specify how to transform the collected data with respect to the arguments of the function.

Instead of ComplexityEstimator you should use the AllArgumentEstimator class. Additionally there is the SeparateArgumentEstimator which is experimental.

Py-PAL 0.1.6

More accurate Data Collection

The Tracer is enhanced by measuring builtin function calls with AdvancedOpcodeMetric.

Opcodes resembling a function call .e.g FUNCTION_CALL are filtered for built in function calls. If the called function is found in the complexity mapping a synthetic Opcode weight gets assigned. A builtin function call is evaluated using its argument and a pre-defined runtime complexity e.g. O(n log n) for sort().

  • The feature is enabled by default

  • The calculation produces a performance overhead and can be disabled by providing a OpcodeMetric instance to the Tracer

  • The AdvancedOpcodeMetric instance assigned to the Tracer provides statistics about how many builtin function calls were observed and how many were found in the complexity map

Bugfixes

  • Cleaning data after normalization introduced wrong data points

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py-pal-1.3.0.tar.gz (262.6 kB view hashes)

Uploaded Source

Built Distributions

py_pal-1.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

py_pal-1.3.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl (959.8 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.5+ x86-64

py_pal-1.3.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.2 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64 manylinux: glibc 2.5+ x86-64

py_pal-1.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

py_pal-1.3.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (999.2 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.5+ x86-64

py_pal-1.3.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.3 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64 manylinux: glibc 2.5+ x86-64

py_pal-1.3.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

py_pal-1.3.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (937.8 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.5+ x86-64

py_pal-1.3.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64 manylinux: glibc 2.5+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page