A precise profiler for Python, optimized for data processing tasks in high-performance computing. Capable of sampling with metadata, using minimal instrumentation.
Project description
TraceQ
TraceQ is a specialized tool designed to provide accurate metrics measurements for Python-based data processing applications.
It integrates with the Linux /proc
filesystem to deliver granular and detailed memory profiling, essential for optimizing resource allocation and improving the efficiency of large-scale computational tasks.
Developed as part of a comprehensive study on memory management in Python, TraceQ is particularly effective in high-performance computing settings where precise memory profiling is critical.
Features
- High accuracy memory profiling using direct measurements from the Linux
/proc
filesystem. - Support of multiple backends for memory profiling, including
psutil
andtracemalloc
. - Granular and detailed memory usage analysis.
- Optimized for data processing tasks.
- Useful in high-performance computing environments for optimizing resource allocation.
Installation
To install TraceQ, you can use pip
:
pip install traceq
Alternatively, you can clone the repository and install it manually:
git clone https://github.com/discovery-unicamp/traceq.git
cd traceq
pip install .
Usage
TraceQ is designed to be easy to integrate into your existing Python projects. Below are some basic usage examples:
Profiling a Python Function
To profile memory usage of a specific function, you can use the profile
decorator provided by TraceQ.
from traceq import profile
@profile
def task(data):
# You function goes here
pass
Configuration
All the behavior of TraceQ is controlled by a global configuration. Users have multiple options to set and customize this configuration according to their needs:
Configuration File
TraceQ uses a configuration file named traceq.toml
, which should be placed in the root of your project directory.
This file allows you to specify various settings to control the behavior of TraceQ.
You can check all the available options on the traceq.toml file in this repository.
Below is an example of a traceq.toml
configuration file:
Example Customization
Here’s an example of how you can customize some fields in the traceq.toml
file:
output_dir = "./traceq_reports"
[logger]
enabled_transports = "console,file"
level = "debug"
[profiler]
enabled_metrics = "memory_usage"
sign_traces = "true"
precision = "3"
[profiler.memory_usage]
enabled_backends = "psutil,tracemalloc"
In this example, the output directory for reports is changed, logging is enabled to both console and file with a debug level, only the memory usage metric is enabled, trace signing is turned on, and the precision for profiling is increased. Finally, memory usage backends are limited to psutil
and tracemalloc
.
Runtime Configuration
Alternatively, you can load the configuration file at runtime using the load_config
function provided by TraceQ.
This allows you to dynamically inject configuration settings while your application is running.
from traceq import load_config
load_config({
"output_dir": "./traceq_reports",
"logger": {
"enabled_transports": "console,file",
"level": "debug"
},
"profiler": {
"enabled_metrics": "memory_usage",
"sign_traces": "true",
"precision": "3",
"memory_usage": {
"enabled_backends": "psutil,tracemalloc"
}
}
})
Environment Variables
You can also set configuration options using environment variables.
All environment variables should be prefixed with TRACEQ_
. This method is useful for dynamically setting configurations without modifying the code or configuration files.
Example Environment Variables
export TRACEQ_OUTPUT_DIR="./traceq_reports"
export TRACEQ_LOGGER_ENABLED_TRANSPORTS="console,file"
export TRACEQ_LOGGER_LEVEL="debug"
export TRACEQ_PROFILER_ENABLED_METRICS="memory_usage"
export TRACEQ_PROFILER_SIGN_TRACES="true"
export TRACEQ_PROFILER_PRECISION="3"
export TRACEQ_PROFILER_MEMORY_USAGE_ENABLED_BACKENDS="psutil,tracemalloc"
This flexibility allows you to tailor TraceQ's behavior to fit the specific requirements of your seismic data processing tasks, ensuring optimal performance and resource utilization.
Report
After the execution of your Python script, TraceQ will generate a report containing all the metrics collected during the execution.
The report will be a .prof
file, which is encoded as a Gzipped Message Pack file.
We are still under development, and we are working on a tool to visualize the reports generated by TraceQ.
Contributing
We welcome contributions to TraceQ! If you have any ideas, suggestions, or bug reports, please open an issue on the Github repository. If you would like to contribute code, please fork the repository and submit a pull request.
License
TraceQ is licensed under the MIT License. See the LICENSE file for more details.
Acknowledgments
This tool was developed as part of a comprehensive study on memory management in Python-based seismic data processing applications, conducted by Daniel L. Fonseca and Edson Borin at the Institute of Computing, Unicamp, Brazil. Special thanks to Petrobras for their support and collaboration.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.