Skip to main content

A memory profiler for data batch processing applications.

Project description

The Fil memory profiler for Python

Your code reads some data, processes it, and uses too much memory. In order to reduce memory usage, you need to figure out:

  1. Where peak memory usage is, also known as the high-water mark.
  2. What code was responsible for allocating the memory that was present at that peak moment.

That's exactly what Fil will help you find. Fil an open source memory profiler designed for data processing applications written in Python, and includes native support for Jupyter.

At the moment it only runs on Linux and macOS, and while it supports threading, it does not yet support multiprocessing or multiple processes in general.

"Within minutes of using your tool, I was able to identify a major memory bottleneck that I never would have thought existed. The ability to track memory allocated via the Python interface and also C allocation is awesome, especially for my NumPy / Pandas programs."

—Derrick Kondo

For more information, including an example of the output, see https://pythonspeed.com/products/filmemoryprofiler/

Fil vs. other Python memory tools

There are two distinct patterns of Python usage, each with its own source of memory problems.

In a long-running server, memory usage can grow indefinitely due to memory leaks. That is, some memory is not being freed.

  • If the issue is in Python code, tools like tracemalloc and Pympler can tell you which objects are leaking and what is preventing them from being leaked.
  • If you're leaking memory in C code, you can use tools like Valgrind.

Fil, however, is not aimed at memory leaks, but at the other use case: data processing applications. These applications load in data, process it somehow, and then finish running.

The problem with these applications is that they can, on purpose or by mistake, allocate huge amounts of memory. It might get freed soon after, but if you allocate 16GB RAM and only have 8GB in your computer, the lack of leaks doesn't help you.

Fil will therefore tell you, in an easy to understand way:

  1. Where peak memory usage is, also known as the high-water mark.
  2. What code was responsible for allocating the memory that was present at that peak moment.
  3. This includes C/Fortran/C++/whatever extensions that don't use Python's memory allocation API (tracemalloc only does Python memory APIs).

This allows you to optimize that code in a variety of ways.

Installation

Assuming you're on macOS or Linux, and are using Python 3.6 or later, you can use either Conda or pip (or any tool that is pip-compatible and can install manylinux2010 wheels).

Conda

To install on Conda:

$ conda install -c conda-forge filprofiler

Pip

To install the latest version of Fil you'll need Pip 19 or newer. You can check like this:

$ pip --version
pip 19.3.0

If you're using something older than v19, you can upgrade by doing:

$ pip install --upgrade pip

If that doesn't work, try running that in a virtualenv.

Assuming you have a new enough version of pip:

$ pip install filprofiler

Using Fil

Measuring peak (high-water mark) memory usage in Jupyter

To measure memory usage of some code in Jupyter you need to do three things:

  1. Use an alternative kernel, "Python 3 with Fil". You can choose this kernel when you create a new notebook, or you can switch an existing notebook in the Kernel menu; there should be a "Change Kernel" option in there in both Jupyter Notebook and JupyterLab.
  2. Load the extension by doing %load_ext filprofiler.
  3. Add the %%filprofile magic to the top of the cell with the code you wish to profile.

Screenshot of JupyterLab

Measuring peak (high-water mark) memory usage for Python scripts

Instead of doing:

$ python yourscript.py --input-file=yourfile

Just do:

$ fil-profile run yourscript.py --input-file=yourfile

And it will generate a report.

As of version 0.11, you can also run it like this:

$ python -m filprofiler run yourscript.py --input-file=yourfile

Debugging out-of-memory crashes

First, run free to figure out how much memory is available—in this case about 6.3GB—and then set a corresponding limit on virtual memory with ulimit:

$ free -h
       total   used   free  shared  buff/cache  available
Mem:   7.7Gi  1.1Gi  6.3Gi    50Mi       334Mi      6.3Gi
Swap:  3.9Gi  3.0Gi  871Mi
$ ulimit -Sv 6300000

Then, run your program under Fil, and it will generate a SVG at the point in time when memory runs out:

$ fil-profile run oom.py 
...
=fil-profile= Wrote memory usage flamegraph to fil-result/2020-06-15T12:37:13.033/out-of-memory.svg

Reducing memory usage in your code

You've found where memory usage is coming from—now what?

If you're using data processing or scientific computing libraries, I have written a relevant guide to reducing memory usage.

How Fil works

Fil uses the LD_PRELOAD/DYLD_INSERT_LIBRARIES mechanism to preload a shared library at process startup. This shared library captures all memory allocations and deallocations and keeps track of them.

At the same time, the Python tracing infrastructure (used e.g. by cProfile and coverage.py) to figure out which Python callstack/backtrace is responsible for each allocation.

For performance reasons, only the largest allocations are reported, with a minimum of 99% of allocated memory reported. The remaining <1% is highly unlikely to be relevant when trying to reduce usage; it's effectively noise.

Fil and threading, with notes on NumPy and Zarr {#threading}

In general, Fil will track allocations in threads correctly.

First, if you start a thread via Python, running Python code, that thread will get its own callstack for tracking who is responsible for a memory allocation.

Second, if you start a C thread, the calling Python code is considered responsible for any memory allocations in that thread. This works fine... except for thread pools. If you start a pool of threads that are not Python threads, the Python code that created those threads will be responsible for all allocations created during the thread pool's lifetime.

Therefore, in order to ensure correct memory tracking, Fil disables thread pools in BLAS (used by NumPy), BLOSC (used e.g. by Zarr), OpenMP, and numexpr. They are all set to use 1 thread, so calls should run in the calling Python thread and everything should be tracked correctly.

This has some costs:

  1. This can reduce performance in some cases, since you're doing computation with one CPU instead of many.
  2. Insofar as these libraries allocate memory proportional to number of threads, the measured memory usage might be wrong.

Fil does this for the whole program when using fil-profile run. When using the Jupyter kernel, anything run with the %%filprofile magic will have thread pools disabled, but other code should run normally.

What Fil tracks

Fil will track memory allocated by:

  • Normal Python code.
  • C code using malloc()/calloc()/realloc()/posix_memalign().
  • C++ code using new (including via aligned_alloc()).
  • Anonymous mmap()s.
  • Fortran 90 explicitly allocated memory (tested with gcc's gfortran).

Still not supported, but planned:

  • mremap() (resizing of mmap()).
  • File-backed mmap(). The semantics are somewhat different than normal allocations or anonymous mmap(), since the OS can swap it in or out from disk transparently, so supporting this will involve a different kind of resource usage and reporting.
  • Other forms of shared memory, need to investigate if any of them allow sufficient allocation.
  • Anonymous mmap()s created via /dev/zero (not common, since it's not cross-platform, e.g. macOS doesn't support this).
  • memfd_create(), a Linux-only mechanism for creating in-memory files.
  • Possibly memalign, valloc(), pvalloc(), reallocarray(). These are all rarely used, as far as I can tell.

License

Copyright 2020 Hyphenated Enterprises LLC

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

filprofiler-0.13.1-cp39-cp39-manylinux2010_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.12+ x86-64

filprofiler-0.13.1-cp39-cp39-macosx_10_14_x86_64.whl (311.5 kB view details)

Uploaded CPython 3.9macOS 10.14+ x86-64

filprofiler-0.13.1-cp38-cp38-manylinux2010_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.12+ x86-64

filprofiler-0.13.1-cp38-cp38-macosx_10_14_x86_64.whl (311.5 kB view details)

Uploaded CPython 3.8macOS 10.14+ x86-64

filprofiler-0.13.1-cp37-cp37m-manylinux2010_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.7mmanylinux: glibc 2.12+ x86-64

filprofiler-0.13.1-cp37-cp37m-macosx_10_14_x86_64.whl (311.5 kB view details)

Uploaded CPython 3.7mmacOS 10.14+ x86-64

filprofiler-0.13.1-cp36-cp36m-manylinux2010_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.12+ x86-64

filprofiler-0.13.1-cp36-cp36m-macosx_10_14_x86_64.whl (311.5 kB view details)

Uploaded CPython 3.6mmacOS 10.14+ x86-64

File details

Details for the file filprofiler-0.13.1-cp39-cp39-manylinux2010_x86_64.whl.

File metadata

  • Download URL: filprofiler-0.13.1-cp39-cp39-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.9, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.8.6

File hashes

Hashes for filprofiler-0.13.1-cp39-cp39-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 d38bd8af8c0b79e872e4fa36efc29dbd15b3c2201e0980d65c9577c7faed218b
MD5 581590f1354053bf88b29996da5c0cc7
BLAKE2b-256 5a992206eadb3b1f9f86ce9ddda2f0ffb5fe3728a59db06a95939713adb99cb7

See more details on using hashes here.

File details

Details for the file filprofiler-0.13.1-cp39-cp39-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: filprofiler-0.13.1-cp39-cp39-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 311.5 kB
  • Tags: CPython 3.9, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.9.0

File hashes

Hashes for filprofiler-0.13.1-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 21d96a8336f24055710e80f0a417f36e35933f13b4f22d6e8bea1bd75c7001a8
MD5 00d9b99d02fca53af86906578742492f
BLAKE2b-256 aabe36dc4cb53d5bd90fca2f39572d241ae75c16efcaad8d3babf26cfee6e379

See more details on using hashes here.

File details

Details for the file filprofiler-0.13.1-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

  • Download URL: filprofiler-0.13.1-cp38-cp38-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.8, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.8.6

File hashes

Hashes for filprofiler-0.13.1-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 619057f67d8dcb985a3fc89b8543a841ae0a52fec7311b46ea37de161b7bfb9e
MD5 3ceee983288331b68edca532c8219642
BLAKE2b-256 769b1bfebe49a2087c7027204209456a5fe45e9bc01befdba8829a848aabbb0c

See more details on using hashes here.

File details

Details for the file filprofiler-0.13.1-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: filprofiler-0.13.1-cp38-cp38-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 311.5 kB
  • Tags: CPython 3.8, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.8.6

File hashes

Hashes for filprofiler-0.13.1-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 f3a4662417829b4d259f68ecde8d308db71cbc0ccdf7653d7653563545b7133a
MD5 63e05ea3196ec3cb819ad7ee87a2f328
BLAKE2b-256 09a41c20ddd592e9df8a5541c5b5fbc2a174dff6f4831b26471d816e5c8d8044

See more details on using hashes here.

File details

Details for the file filprofiler-0.13.1-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: filprofiler-0.13.1-cp37-cp37m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.7m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.8.6

File hashes

Hashes for filprofiler-0.13.1-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 00a65a72cfdcb7feca101dab747857aef05881730b3349423d9d311ddc9cd172
MD5 1a920c439148ba51b584b28b2623901f
BLAKE2b-256 b1986b7b121a3e53381423b957cbd1ec0e0081deef8a4f1245cbf4ee024e4da5

See more details on using hashes here.

File details

Details for the file filprofiler-0.13.1-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: filprofiler-0.13.1-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 311.5 kB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.7.9

File hashes

Hashes for filprofiler-0.13.1-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 409790018b4f32b6030c666c9b7225aacf122ce19dcf03e1b9764e30c0ce1031
MD5 d835436085782fdaf6d63aa7ac509bec
BLAKE2b-256 323a23a1d280f8b09d5592ad073989bfd921de2ac5a85fb7fdda479b68cdfb56

See more details on using hashes here.

File details

Details for the file filprofiler-0.13.1-cp36-cp36m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: filprofiler-0.13.1-cp36-cp36m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.6m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.8.6

File hashes

Hashes for filprofiler-0.13.1-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 bee5e3c265a35acd7ffb1e4d871909d7188b33bcbe52d11caa5847ef812bcbc5
MD5 25ac9c99e6c3de298a27226ac614e57a
BLAKE2b-256 70896e67b018b55891f11fa97faed96590e3ffc4eda8446f0d5db9cb87313603

See more details on using hashes here.

File details

Details for the file filprofiler-0.13.1-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: filprofiler-0.13.1-cp36-cp36m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 311.5 kB
  • Tags: CPython 3.6m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.6.12

File hashes

Hashes for filprofiler-0.13.1-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 a7f2ab82af1e4e62eb69bbcc60deadfaa31886538821a0dfa80dc6902642d74f
MD5 0e15758123fd44a1fd32ae9c3058c596
BLAKE2b-256 64dbc7297d7fbffda0719aa6487d05a5727b8deee627d9324a654aa9dfd539b8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page