a tool to quantify and communicate the carbon footprint of machine learning methods
Project description
A tool to quantify and report the carbon footprint of machine learning computations and communication in academia and healthcare
Aim
Raise awareness about the carbon footprint of machine learning methods and to encourage further optimization and the rationale use of AI-powered tools. This work advocates for sustainable AI and the rational use of IT systems.
Key Carbon Indicators
One hour of GPU load is equivalent to 112 gCO2eq
1 GB of data traffic through a data center is equivalent to 31 gCO2eq
Install and use
Free software: MIT license
pip install cumulator <- installs CUMULATOR
from cumulator import base <- imports the script
cumulator = base.Cumulator() <- creates an Cumulator instance
Measure cost of computations. Activate or deactivate chronometer by using cumulator.on(), cumulator.off() whenever you perform ML computations (typically within each interation). It will automatically record each time duration in cumulator.time_list and sum it in cumulator.cumulated_time(). Then return carbon footprint due to all computations using cumulator.computation_costs().
Measure cost of communications. Each time your models sends a data file to another node of the network, record the size of the file which is communicated (in kilo bytes) using cumulator.data_transferred(file_size). The amount of data transferred is automatically recorded in cumulator.file_size_list and accumulated in cumulator.cumulated_data_traffic. Then return carbon footprint due to all communications using cumulator.communication_costs().
Return the total carbon footprint using cumulator.total_carbon_footprint(). You can also display the carbon footprint in terminal using display_carbon_footprint()
Default assumptions (can be manually modified for better estimation):
self.hardware_load = 250 / 3.6e6 <- computation costs: power consumption of a typical GPU in Watts converted to kWh/s
self.one_byte_model = 6.894E-8 <- communication costs: average energy impact of traffic in a typical data centers, kWh/kB
self.carbon_intensity = 447 <- conversion to carbon footprint: average carbon intensity value in gCO2eq/kWh in the EU in 2014
self.n_gpu = 1 <- number of GPU used in parallel
Project Structure
src/ ├── cumulator ├── base.py <- implementation of the Cumulator class └── bonus.py <- Impact Statement Protocol
ChangeLog
18.06.2020: 0.0.6 update README.rst
11.06.2020: 0.0.5 add number of processors (0.0.4 failed)
08.06.2020: 0.0.3 added bonus.py carbon impact statement
07.06.2020: 0.0.2 added communication costs and cleaned src/
21.05.2020: 0.0.1 deployment on PypI and integration with Alg-E
Links
Changelog
0.0.0 (2020-05-14)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for cumulator-0.0.7-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9abee597480ba19fc026598da341442f37007f5a439ef8563950dc36df2790b2 |
|
MD5 | be4f025045b53f88fd8df2157f999700 |
|
BLAKE2b-256 | 9898dee57c88399d99f46062a7de8c3ff2385565c809d0264c56db510eb2c8cb |