Skip to main content

Tracking and predicting the carbon footprint of training deep learning models.

Project description

carbontracker

Build PyPI Python Unit Tests License Downloads

Website

About

carbontracker is a tool for tracking and predicting the energy consumption and carbon footprint of training deep learning models as described in Anthony et al. (2020).

Citation

Kindly cite our work if you use carbontracker in a scientific publication:

@misc{anthony2020carbontracker,
  title={Carbontracker: Tracking and Predicting the Carbon Footprint of Training Deep Learning Models},
  author={Lasse F. Wolff Anthony and Benjamin Kanding and Raghavendra Selvan},
  howpublished={ICML Workshop on Challenges in Deploying and monitoring Machine Learning Systems},
  month={July},
  note={arXiv:2007.03051},
  year={2020}}

_

Installation

PyPI

pip install carbontracker

Optional Dependencies

To generate PDF reports from carbontracker logs, install with the pdfreport extra:

pip install 'carbontracker[pdfreport]'

Basic usage

Command Line Mode

Wrap any of your scripts (python, bash, etc.):

carbontracker python script.py

Embed into Python Scripts

Required arguments

  • epochs: Total epochs of your training loop.

Optional arguments

  • epochs_before_pred (default=1): Epochs to monitor before outputting predicted consumption. Set to -1 for all epochs. Set to 0 for no prediction.
  • monitor_epochs (default=1): Total number of epochs to monitor. Outputs actual consumption when reached. Set to -1 for all epochs. Cannot be less than epochs_before_pred or equal to 0.
  • update_interval (default=10): Interval in seconds between power usage measurements are taken.
  • interpretable (default=True): If set to True then the CO2eq are also converted to interpretable numbers such as the equivalent distance travelled in a car, etc. Otherwise, no conversions are done.
  • stop_and_confirm (default=False): If set to True then the main thread (with your training loop) is paused after epochs_before_pred epochs to output the prediction and the user will need to confirm to continue training. Otherwise, prediction is output and training is continued instantly.
  • ignore_errors (default=False): If set to True then all errors will cause energy monitoring to be stopped and training will continue. Otherwise, training will be interrupted as with regular errors.
  • components (default="all"): Comma-separated string of which components to monitor. Options are: "all", "gpu", "cpu", or "gpu,cpu".
  • devices_by_pid (default=False): If True, only devices (under the chosen components) running processes associated with the main process are measured. If False, all available devices are measured (see Section 'Notes' for jobs running on SLURM or in containers). Note that this requires your devices to have active processes before instantiating the CarbonTracker class.
  • log_dir (default=None): Path to the desired directory to write log files. If None, then no logging will be done.
  • log_file_prefix (default=""): Prefix to add to the log file name.
  • verbose (default=1): Sets the level of verbosity.
  • decimal_precision (default=6): Desired decimal precision of reported values.
  • sim_cpu (default=None): Name of the simulated CPU. If set, will use simulated CPU power measurements.
  • sim_cpu_tdp (default=None): Thermal Design Power (TDP) in Watts for the simulated CPU. Required if sim_cpu is set.
  • sim_cpu_util (default=None): CPU utilization factor between 0 and 1. If not set, defaults to 0.5 (50% utilization).
  • sim_gpu (default=None): Name of the simulated GPU. If set, will use simulated GPU power measurements.
  • sim_gpu_watts (default=None): Power consumption in Watts for the simulated GPU. Required if sim_gpu is set.
  • sim_gpu_util (default=None): GPU utilization factor between 0 and 1. If not set, defaults to 0.5 (50% utilization).

Example usage

from carbontracker.tracker import CarbonTracker

tracker = CarbonTracker(epochs=max_epochs)

# Training loop.
for epoch in range(max_epochs):
    tracker.epoch_start()
    
    # Your model training.

    tracker.epoch_end()

# Optional: Add a stop in case of early termination before all monitor_epochs has
# been monitored to ensure that actual consumption is reported.
tracker.stop()

Example output

Default settings
CarbonTracker: 
Actual consumption for 1 epoch(s):
        Time:   0:00:10
        Energy: 0.000038 kWh
        CO2eq:  0.003130 g
        This is equivalent to:
        0.000026 km travelled by car
CarbonTracker: 
Predicted consumption for 1000 epoch(s):
        Time:   2:52:22
        Energy: 0.038168 kWh
        CO2eq:  4.096665 g
        This is equivalent to:
        0.034025 km travelled by car
CarbonTracker: Finished monitoring.
verbose=2
CarbonTracker: The following components were found: CPU with device(s) cpu:0.
CarbonTracker: Average carbon intensity during training was 82.00 gCO2eq/kWh at detected location: Copenhagen, Capital Region, DK.
CarbonTracker: 
Actual consumption for 1 epoch(s):
        Time:   0:00:10
        Energy: 0.000041 kWh
        CO2eq:  0.003357 g
        This is equivalent to:
        0.000028 km travelled by car
CarbonTracker: Carbon intensity for the next 2:59:06 is predicted to be 107.49 gCO2eq/kWh at detected location: Copenhagen, Capital Region, DK.
CarbonTracker: 
Predicted consumption for 1000 epoch(s):
        Time:   2:59:06
        Energy: 0.040940 kWh
        CO2eq:  4.400445 g
        This is equivalent to:
        0.036549 km travelled by car
CarbonTracker: Finished monitoring.

Parsing log files

Aggregating log files

carbontracker supports aggregating all log files in a specified directory to a single estimate of the carbon footprint.

Example usage

from carbontracker import parser

parser.print_aggregate(log_dir="./my_log_directory/")

Example output

The training of models in this work is estimated to use 4.494 kWh of electricity contributing to 0.423 kg of CO2eq. This is equivalent to 3.515 km travelled by car. Measured by carbontracker (https://github.com/lfwa/carbontracker).

Convert logs to dictionary objects

Log files can be parsed into dictionaries using parser.parse_all_logs() or parser.parse_logs().

Example usage

from carbontracker import parser

logs = parser.parse_all_logs(log_dir="./logs/")
first_log = logs[0]

print(f"Output file name: {first_log['output_filename']}")
print(f"Standard file name: {first_log['standard_filename']}")
print(f"Stopped early: {first_log['early_stop']}")
print(f"Measured consumption: {first_log['actual']}")
print(f"Predicted consumption: {first_log['pred']}")
print(f"Measured GPU devices: {first_log['components']['gpu']['devices']}")

Example output

Output file name: ./logs/2020-05-17T19:02Z_carbontracker_output.log
Standard file name: ./logs/2020-05-17T19:02Z_carbontracker.log
Stopped early: False
Measured consumption: {'epochs': 1, 'duration (s)': 8.0, 'energy (kWh)': 6.5e-05, 'co2eq (g)': 0.019201, 'equivalents': {'km travelled by car': 0.000159}}
Predicted consumption: {'epochs': 3, 'duration (s)': 25.0, 'energy (kWh)': 1000.000196, 'co2eq (g)': 10000.057604, 'equivalents': {'km travelled by car': 10000.000478}}
Measured GPU devices: ['Tesla T4']

Generating PDF reports

Carbontracker can generate detailed PDF reports from log files. This feature requires the optional reportlab dependency.

Note: You must install the PDF report dependencies first:

pip install 'carbontracker[pdfreport]'

Example usage

from carbontracker.report import generate_report_from_log

generate_report_from_log("./logs/carbontracker.log", "./report.pdf")

The generated PDF includes:

  • Energy consumption metrics and visualizations
  • Carbon footprint analysis with CO2eq calculations
  • Power usage breakdown by component (CPU/GPU)
  • Training duration and efficiency metrics

Compatibility

carbontracker is compatible with:

Notes

Availability of GPUs and Slurm

  • Available GPU devices are determined by first checking the environment variable CUDA_VISIBLE_DEVICES (only if devices_by_pid=False otherwise we find devices by PID). This ensures that for Slurm we only fetch GPU devices associated with the current job and not the entire cluster. If this fails we measure all available GPUs.
  • NVML cannot find processes for containers spawned without --pid=host. This affects the devices_by_pid parameter and means that it will never find any active processes for GPUs in affected containers.

Extending carbontracker

See CONTRIBUTING.md.

Star History

Star History Chart

carbontracker in media

  • Official press release from University of Copenhagen can be obtained here: en da

  • Carbontracker has received some attention in popular science forums within, and outside of, Denmark [1][2][3][4][5][6][7][8]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

carbontracker-2.4.5.tar.gz (93.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

carbontracker-2.4.5-py3-none-any.whl (67.5 kB view details)

Uploaded Python 3

File details

Details for the file carbontracker-2.4.5.tar.gz.

File metadata

  • Download URL: carbontracker-2.4.5.tar.gz
  • Upload date:
  • Size: 93.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for carbontracker-2.4.5.tar.gz
Algorithm Hash digest
SHA256 fc549447116cacb9cbb2199a3248efcac4385d2131733da4776500e5bd0b0c94
MD5 b7281988330e4e638aadeb0d90860d62
BLAKE2b-256 e0bab3fbd2549a464debfc912c1655efa135a608dc1d55e9663b865667b649dc

See more details on using hashes here.

File details

Details for the file carbontracker-2.4.5-py3-none-any.whl.

File metadata

  • Download URL: carbontracker-2.4.5-py3-none-any.whl
  • Upload date:
  • Size: 67.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for carbontracker-2.4.5-py3-none-any.whl
Algorithm Hash digest
SHA256 1ceda8667a44c8719765705aba7031a1f26deb666555a1aa367664dec45c4726
MD5 8981c168e70f6571f5a0c3f16f178acb
BLAKE2b-256 be62f8a1e5cc09ca5ce6d9818f5ec9db78b7f4c47b1adbefe1f4b08bc9326997

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page