Skip to main content

Scalene: A high-resolution, low-overhead CPU and memory profiler for Python

Project description

scalene

scalene: a high-performance CPU and memory profiler for Python

by Emery Berger


About Scalene

Scalene is a high-performance CPU and memory profiler for Python that does a few things that other Python profilers do not and cannot do. It runs orders of magnitude faster than other profilers while delivering far more detailed information.

  1. Scalene is fast. It uses sampling instead of instrumentation or relying on Python's tracing facilities. Its overhead is typically no more than 10-20% (and often less).
  2. Scalene is precise. Unlike most other Python profilers, Scalene performs CPU profiling at the line level, pointing to the specific lines of code that are responsible for the execution time in your program. This level of detail can be much more useful than the function-level profiles returned by most profilers.
  3. Scalene separates out time spent running in Python from time spent in native code (including libraries). Most Python programmers aren't going to optimize the performance of native code (which is usually either in the Python implementation or external libraries), so this helps developers focus their optimization efforts on the code they can actually improve.
  4. Scalene profiles memory usage. In addition to tracking CPU usage, Scalene also points to the specific lines of code responsible for memory growth. It accomplishes this via an included specialized memory allocator.

Installation

Scalene is distributed as a pip package and works on Linux and Mac OS X platforms. You can install it as follows:

  % pip install scalene

NOTE: Currently, installing Scalene in this way does not install its memory profiling library, so you will only be able to use it to perform CPU profiling. To take advantage of its memory profiling capability, you will need to download this repository.

Usage

The following command will run Scalene to only perform line-level CPU profiling on a provided example program.

  % python -m scalene test/testme.py

To perform both line-level CPU and memory profiling, you first need to build the specialized memory allocator by running make:

  % make

Profiling on a Mac OS X system:

  % DYLD_INSERT_LIBRARIES=$PWD/libscalene.dylib PYTHONMALLOC=malloc python -m scalene test/testme.py

Profiling on a Linux system:

  % LD_PRELOAD=$PWD/libscalene.so PYTHONMALLOC=malloc python -m scalene test/testme.py

Comparison to Other Profilers

Performance and Features

Below is a table comparing various profilers to scalene, running on an example Python program (benchmarks/julia1_nopil.py) from the book High Performance Python, by Gorelick and Ozsvald. All of these were run on a 2016 MacBook Pro.

Time (seconds) Slowdown Line-level? CPU? Separates Python from native? Memory? Unmodified code?
original program 6.71s 1.0x
cProfile 11.04s 1.65x function-level :heavy_check_mark: :heavy_check_mark:
Profile 202.26s 30.14x function-level :heavy_check_mark: :heavy_check_mark:
pyinstrument 9.83s 1.46x function-level :heavy_check_mark: :heavy_check_mark:
line_profiler 78.0s 11.62x :heavy_check_mark: :heavy_check_mark: needs @profile decorators
pprofile (deterministic) 403.67s 60.16x :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
pprofile (statistical) 7.47s 1.11x :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
yappi (CPU) 127.53s 19.01x function-level :heavy_check_mark: :heavy_check_mark:
yappi (wallclock) 21.45s 3.2x function-level :heavy_check_mark: :heavy_check_mark:
scalene (CPU only) 6.98s 1.04x :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
scalene (CPU + memory) 7.68s 1.14x :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:

Output

Scalene prints annotated source code for the program being profiled and any modules it uses in the same directory or subdirectories. Here is a snippet from pystone.py, just using CPU profiling:

benchmarks/pystone.py: % of CPU time =  98.78% out of   3.47s.
         | CPU %    | CPU %    | 
  Line   | (Python) | (C)      | [benchmarks/pystone.py]
--------------------------------------------------------------------------------
  [... lines omitted ...]
   137   |   0.87%  |   0.13%  | def Proc1(PtrParIn):
   138   |   1.46%  |   0.36%  |     PtrParIn.PtrComp = NextRecord = PtrGlb.copy()
   139   |          |          |     PtrParIn.IntComp = 5
   140   |   0.87%  |   0.04%  |     NextRecord.IntComp = PtrParIn.IntComp
   141   |   1.46%  |   0.30%  |     NextRecord.PtrComp = PtrParIn.PtrComp
   142   |   2.33%  |   0.26%  |     NextRecord.PtrComp = Proc3(NextRecord.PtrComp)
   143   |   1.46%  |  -0.00%  |     if NextRecord.Discr == Ident1:
   144   |   0.29%  |   0.04%  |         NextRecord.IntComp = 6
   145   |   1.75%  |   0.40%  |         NextRecord.EnumComp = Proc6(PtrParIn.EnumComp)
   146   |   1.75%  |   0.29%  |         NextRecord.PtrComp = PtrGlb.PtrComp
   147   |   0.58%  |   0.12%  |         NextRecord.IntComp = Proc7(NextRecord.IntComp, 10)
   148   |          |          |     else:
   149   |          |          |         PtrParIn = NextRecord.copy()
   150   |   0.87%  |   0.15%  |     NextRecord.PtrComp = None
   151   |          |          |     return PtrParIn

And here is an example with memory profiling enabled, running the Julia benchmark.

benchmarks/julia1_nopil.py: % of CPU time =  99.22% out of  12.06s.
         | CPU %    | CPU %    | Memory (MB) |
  Line   | (Python) | (C)      |             | [benchmarks/julia1_nopil.py]
--------------------------------------------------------------------------------
     1   |          |          |             | # Pasted from Chapter 2, High Performance Python - O'Reilly Media;
     2   |          |          |             | # minor modifications for Python 3 by Emery Berger
     3   |          |          |             | 
     4   |          |          |             | """Julia set generator without optional PIL-based image drawing"""
     5   |          |          |             | import time
     6   |          |          |             | # area of complex space to investigate
     7   |          |          |             | x1, x2, y1, y2 = -1.8, 1.8, -1.8, 1.8
     8   |          |          |             | c_real, c_imag = -0.62772, -.42193
     9   |          |          |             | 
    10   |          |          |             | #@profile
    11   |          |          |             | def calculate_z_serial_purepython(maxiter, zs, cs):
    12   |          |          |             |     """Calculate output list using Julia update rule"""
    13   |   0.08%  |   0.02%  |      0.06   |     output = [0] * len(zs)
    14   |   0.25%  |   0.01%  |      9.50   |     for i in range(len(zs)):
    15   |          |          |             |         n = 0
    16   |   1.34%  |   0.05%  |     -9.88   |         z = zs[i]
    17   |   0.50%  |   0.01%  |     -8.44   |         c = cs[i]
    18   |   1.25%  |   0.04%  |             |         while abs(z) < 2 and n < maxiter:
    19   |  68.67%  |   2.27%  |     42.50   |             z = z * z + c
    20   |  18.46%  |   0.74%  |    -33.62   |             n += 1
    21   |          |          |             |         output[i] = n
    22   |          |          |             |     return output

Positive memory numbers indicate total memory allocation in megabytes; negative memory numbers indicate memory reclamation. Note that because of the way Python's memory management works, frequent allocation and de-allocation (as in lines 19-20 above) show up as high positive memory on one line followed by an (approximately) corresponding negative memory on the following line(s).

Acknowledgements

Logo created by Sophia Berger.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scalene-0.7.1.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scalene-0.7.1-py3-none-any.whl (13.9 kB view details)

Uploaded Python 3

File details

Details for the file scalene-0.7.1.tar.gz.

File metadata

  • Download URL: scalene-0.7.1.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.7.4

File hashes

Hashes for scalene-0.7.1.tar.gz
Algorithm Hash digest
SHA256 02fa437c5c060214ecbb8540cedecd5df28d7cc9327999b18849f37ace68a336
MD5 b53d432bd7f37d4e498609a1308bc0f5
BLAKE2b-256 672eba98ccb6e9af58bb2f6e08d2076ae18d4281b55231ec70dc39f8298bb6f3

See more details on using hashes here.

File details

Details for the file scalene-0.7.1-py3-none-any.whl.

File metadata

  • Download URL: scalene-0.7.1-py3-none-any.whl
  • Upload date:
  • Size: 13.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.7.4

File hashes

Hashes for scalene-0.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7232298959a37ed3158bc33d92c89fd7efc2bdf87660d395054445d07b6c7b00
MD5 5c5dc4d19f588b6c175aa8f74f5348e1
BLAKE2b-256 1c2892d3e67beef6a47bdb46469a3c69efb82dcab254e02cee91b1d2b9be62e4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page