This is a Snap collector plugin providing storage related metrics through smartmontools.
Project description
This Snap plugin collects metrics from the Self-Monitoring, Analysis and Reporting Technology (S.M.A.R.T.) leveraging the pySMART library. The purpose of S.M.A.R.T. is to monitor the reliability of the hard drive and predict drive failures, and to carry out different types of drive self-tests.
It’s used in the Snap framework.
Getting Started
System Requirements
golang 1.6+ (needed only for building)
Operating systems
All OSs currently supported by snap: * Linux/amd64 * Darwin/amd64
Installation
Download psutil plugin binary:
You can get the pre-built binaries for your OS and architecture under the plugin’s release page. For Snap, check here.
To build the plugin binary:
Fork https://github.com/intelsdi-x/snap-plugin-collector-pysmart
Clone repo into $GOPATH/src/github.com/intelsdi-x/:
$ git clone https://github.com/<yourGithubID>/snap-plugin-collector-pysmart.git
Configuration and Usage
Set up the Snap framework
Documentation
There are a number of other resources you can review to learn to use this plugin: * Snap pysmart examples
Collected Metrics
This plugin will identify all the devices on the node which have S.M.A.R.T. enabled and automatically populate the list of collected metrics based on which are being exposed by the device. This will be different per manufacturer and per device.
Ensure that S.M.A.R.T. is enabled on the device
Below is an example of the metrics being gathered by the Intel 3700 SSD * Note: $deviceName will be dependent on the path (i.e. /dev/sda1)
Namespace |
Description (optional) |
---|---|
/intel/smartmon/devices/\(deviceName/Reserve_Block_Count | available reserved space raw value /intel/smartmon/devices/\)deviceName/Program_Fail_Count |
shows total count of program fails |
/intel/smartmon/devices/\(deviceName/Unexpected_Power_Loss_Count | reports number of unclean shutdowns, cumulative over the life of the ssd /intel/smartmon/devices/\)deviceName/Power_Loss_Cap_Test |
last test result as microseconds to discharge cap |
/intel/smartmon/devices/\(deviceName/SATA_Downshift_Count | reports number of times SATA interface selected lower signaling rate due to error /intel/smartmon/devices/\)deviceName/Temperature_Case |
reports SSD case temperature statistics |
/intel/smartmon/devices/\(deviceName/Unsafe_Shutdown_Count | reports the cumulative number of unsafe (unclean) shutdown events over the life of the device /intel/smartmon/devices/\)deviceName/Temperature_Internal |
reports internal temperature of the SSD in degrees Celsius |
/intel/smartmon/devices/\(deviceName/CRC_Error_Count | shows total number of encountered SATA interface cyclic redundancy check (CRC) errors /intel/smartmon/devices/\)deviceName/Host_Writes_32mb |
reports total number of sectors written by the host system |
/intel/smartmon/devices/\(deviceName/Timed_Workload_Host_ReadWrite_Ratio | shows the percentage of I/O operations that are read operations /intel/smartmon/devices/\)deviceName/Timed_Workload_Timer |
measures the elapsed time, number of minutes since starting this workload timer |
/intel/smartmon/devices/\(deviceName/Thermal_Throttle | reports Percent Throttle Status and Count of events /intel/smartmon/devices/\)deviceName/Host_Writes_32mb_Total_LBAs_Written |
reports the total number of sectors written by the host system |
/intel/smartmon/devices/\(deviceName/Host_Reads_32mb_Total_LBAs_Read | reports the total number of sectors read by the host system /intel/smartmon/devices/\)deviceName/NAND_Writes_32mb |
reports the total number of sectors writen by the host system |
Examples
This is an example running psutil and writing data to a file. It is assumed that you are using the latest Snap binary and plugins.
The example is run from a directory which includes snaptel, snapteld, along with the plugins and task file.
Before starting the Snap daemon, install smartmontools using:
$ brew install smartmontools
Run the smartctl command using:
$ smartctl --scan IOService:/AppleACPIPlatformExpert/PCI0@0/AppleACPIPCI/RP06@1C,5/IOPP/SSD0@0/AppleAHCI/PRT0@0/IOAHCIDevice@0/AppleAHCIDiskDriver/IOAHCIBlockStorageDevice -d ata # IOService:/AppleACPIPlatformExpert/PCI0@0/AppleACPIPCI/RP06@1C,5/IOPP/SSD0@0/AppleAHCI/PRT0@0/IOAHCIDevice@0/AppleAHCIDiskDriver/IOAHCIBlockStorageDevice, ATA device
Enable SMART, for example:
smartctl -s on IOService:/AppleACPIPlatformExpert/PCI0@0/AppleACPIPCI/RP06@1C,5/IOPP/SSD0@0/AppleAHCI/PRT0@0/IOAHCIDevice@0/AppleAHCIDiskDriver/IOAHCIBlockStorageDevice
Start the Snap daemon:
$ snapteld -l 1 -t 0
The option “-l 1” is for setting the debugging log level and “-t 0” is for disabling plugin signing.
In another terminal window: Load pysmart plugin
$ snaptel plugin load snap_pysmart/plugin.py Plugin loaded Name: smartmoncollectorplugin-py Version: 1 Type: collector Signed: false Loaded Time: Tue, 21 Mar 2017 11:20:05 PDT
See available metrics for your system. Note The * in the metric list name indicates a dynamic metric which will update depending on the device names and attribute names
$ snaptel metric list
Get file plugin for publishing and load it:
$ wget http://snap.ci.snap-telemetry.io/plugins/snap-plugin-publisher-file/latest/linux/x86_64/snap-plugin-publisher-file $ chmod 755 snap-plugin-publisher-file $ snaptel plugin load snap-plugin-publisher-file
Create a task file. For example, task-smart.json:
Creating a task manifest file.
{ "version": 1, "schedule": { "type": "simple", "interval": "1s" }, "workflow": { "collect": { "metrics": { "/intel/smartmon/devices/*/*/threshold": {}, "/intel/smartmon/devices/*/*/value": {}, "/intel/smartmon/devices/*/*/whenfailed": {}, "/intel/smartmon/devices/*/*/worst": {}, "/intel/smartmon/devices/*/*/type": {}, "/intel/smartmon/devices/*/*/updated": {}, "/intel/smartmon/devices/*/*/raw": {}, "/intel/smartmon/devices/*/*/num": {} }, "publish": [ { "plugin_name": "file", "config": { "file": "/tmp/published_pysmart" } } ] } } }
Start task:
$ snaptel task create -t task-smart.json Using task manifest to create task Task created ID: c6d095a6-733d-40cf-a986-9c82aa64b4e2 Name: Task-c6d095a6-733d-40cf-a986-9c82aa64b4e2 State: Running
See the pysmart plugin task
$ snaptel task list ID NAME STATE HIT MISS FAIL CREATED LAST FAILURE c6d095a6-733d-40cf-a986-9c82aa64b4e2 Task-c6d095a6-733d-40cf-a986-9c82aa64b4e2 Running 9 0 0 10:39AM 2-23-2017
Watch the collection of the metrics
$ snaptel task watch c6d095a6-733d-40cf-a986-9c82aa64b4e2
See std output stream as the metrics are collected
|intel|smartmon|devices|IOService:/AppleACPIPlatformExpert/PCI0@0/AppleACPIPCI/RP06@1C,5/IOPP/SSD0@0/AppleAHCI/PRT0@0/IOAHCIDevice@0/AppleAHCIDiskDriver/IOAHCIBlockStorageDevice|Power-Off_Retract_Count|threshold 000 2017-02-23 10:45:44.698632001 -0800 PST |intel|smartmon|devices|IOService:/AppleACPIPlatformExpert/PCI0@0/AppleACPIPCI/RP06@1C,5/IOPP/SSD0@0/AppleAHCI/PRT0@0/IOAHCIDevice@0/AppleAHCIDiskDriver/IOAHCIBlockStorageDevice|Power-Off_Retract_Count|value 099 2017-02-23 10:45:44.698632001 -0800 PST |intel|smartmon|devices|IOService:/AppleACPIPlatformExpert/PCI0@0/AppleACPIPCI/RP06@1C,5/IOPP/SSD0@0/AppleAHCI/PRT0@0/IOAHCIDevice@0/AppleAHCIDiskDriver/IOAHCIBlockStorageDevice|Power_Cycle_Count|threshold 000 2017-02-23 10:45:44.698632001 -0800 PST |intel|smartmon|devices|IOService:/AppleACPIPlatformExpert/PCI0@0/AppleACPIPCI/RP06@1C,5/IOPP/SSD0@0/AppleAHCI/PRT0@0/IOAHCIDevice@0/AppleAHCIDiskDriver/IOAHCIBlockStorageDevice|Power_Cycle_Count|value 094 2017-02-23 10:45:44.698632001 -0800 PST |intel|smartmon|devices|IOService:/AppleACPIPlatformExpert/PCI0@0/AppleACPIPCI/RP06@1C,5/IOPP/SSD0@0/AppleAHCI/PRT0@0/IOAHCIDevice@0/AppleAHCIDiskDriver/IOAHCIBlockStorageDevice|Power_On_Hours|threshold 000 2017-02-23 10:45:44.698632001 -0800 PST |intel|smartmon|devices|IOService:/AppleACPIPlatformExpert/PCI0@0/AppleACPIPCI/RP06@1C,5/IOPP/SSD0@0/AppleAHCI/PRT0@0/IOAHCIDevice@0/AppleAHCIDiskDriver/IOAHCIBlockStorageDevice|Power_On_Hours|value 099 2017-02-23 10:45:44.698632001 -0800 PST
Stop task:
$ snaptel task stop c6d095a6-733d-40cf-a986-9c82aa64b4e2 Task stopped: ID: c6d095a6-733d-40cf-a986-9c82aa64b4e2
An example of how to deploy the pysmart plugin:
Roadmap
There isn’t a current roadmap for this plugin, but it is in active development. As we launch this plugin, we do not have any outstanding requirements for the next release. If you have a feature request, please add it as an issue and/or submit a pull request.
Community Support
This repository is one of many plugins in Snap, a powerful telemetry framework. See the full project at http://github.com/intelsdi-x/snap To reach out to other users, head to the main framework
Contributing
We love contributions!
There’s more than one way to give back, from examples to blogs to code updates. See our recommended process in CONTRIBUTING.md.
License
Snap, along with this plugin, is an Open Source software released under the Apache 2.0 License.
Acknowledgements
Authors: Samantha Alt, Joel Cooklin
And thank you! Your contribution, through code and participation, is incredibly important to us.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for snap_plugin_collector_pysmart-1.0.1-py2-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c274ca82efa0cb82af34748b416dfb1dfc95227e40b4327d6b92d89c6e4532fe |
|
MD5 | f29617a9226dc841ade15a1ef2c8095e |
|
BLAKE2b-256 | 28768d8d3015d852097fd04284815fde2c59bddbc6719b632f44991d99e0cff9 |