A framework for automated error detection and data collection
Project description
Node Scraper
Node Scraper is a tool which performs automated data collection and analysis for the purposes of system debug.
Table of Contents
- Installation
- CLI Usage
- Configs
- Extending Node Scraper (integration & external plugins) → See EXTENDING.md
- Full view of the plugins with the associated collectors & analyzers as well as the commands invoked by collectors -> See docs/PLUGIN_DOC.md
Installation
Install From Source
Node Scraper requires Python 3.9+ for installation. After cloning this repository, call dev-setup.sh script with 'source'. This script creates an editable install of Node Scraper in a python virtual environment and also configures the pre-commit hooks for the project.
source dev-setup.sh
Alternatively, follow these manual steps:
1. Virtual Environment (Optional)
python3 -m venv venv
source venv/bin/activate
On Debian/Ubuntu, you may need: sudo apt install python3-venv
2. Install from Source (Required)
python3 -m pip install --editable .[dev] --upgrade
This installs Node Scraper in editable mode with development dependencies. To verify: node-scraper --help
3. Git Hooks (Optional)
pre-commit install
Sets up pre-commit hooks for code quality checks. On Debian/Ubuntu, you may need: sudo apt install pre-commit
CLI Usage
The Node Scraper CLI can be used to run Node Scraper plugins on a target system. The following CLI options are available:
usage: node-scraper [-h] [--sys-name STRING] [--sys-location {LOCAL,REMOTE}] [--sys-interaction-level {PASSIVE,INTERACTIVE,DISRUPTIVE}] [--sys-sku STRING]
[--sys-platform STRING] [--plugin-configs [STRING ...]] [--system-config STRING] [--connection-config STRING] [--log-path STRING]
[--log-level {CRITICAL,FATAL,ERROR,WARN,WARNING,INFO,DEBUG,NOTSET}] [--gen-reference-config] [--skip-sudo]
{summary,run-plugins,describe,gen-plugin-config} ...
node scraper CLI
positional arguments:
{summary,run-plugins,describe,gen-plugin-config}
Subcommands
summary Generates summary csv file
run-plugins Run a series of plugins
describe Display details on a built-in config or plugin
gen-plugin-config Generate a config for a plugin or list of plugins
options:
-h, --help show this help message and exit
--sys-name STRING System name (default: <my_system_name>)
--sys-location {LOCAL,REMOTE}
Location of target system (default: LOCAL)
--sys-interaction-level {PASSIVE,INTERACTIVE,DISRUPTIVE}
Specify system interaction level, used to determine the type of actions that plugins can perform (default: INTERACTIVE)
--sys-sku STRING Manually specify SKU of system (default: None)
--sys-platform STRING
Specify system platform (default: None)
--plugin-configs [STRING ...]
built-in config names or paths to plugin config JSONs. Available built-in configs: AllPlugins, NodeStatus (default: None)
--system-config STRING
Path to system config json (default: None)
--connection-config STRING
Path to connection config json (default: None)
--log-path STRING Specifies local path for node scraper logs, use 'None' to disable logging (default: .)
--log-level {CRITICAL,FATAL,ERROR,WARN,WARNING,INFO,DEBUG,NOTSET}
Change python log level (default: INFO)
--gen-reference-config
Generate reference config from system. Writes to ./reference_config.json. (default: False)
--skip-sudo Skip plugins that require sudo permissions (default: False)
Execution Methods
Node Scraper can operate in two modes: LOCAL and REMOTE, determined by the --sys-location argument.
- LOCAL (default): Node Scraper is installed and run directly on the target system. All data collection and plugin execution occur locally.
- REMOTE: Node Scraper runs on your local machine but targets a remote system over SSH. In this mode, Node Scraper does not need to be installed on the remote system; all commands are executed remotely via SSH.
To use remote execution, specify --sys-location REMOTE and provide a connection configuration file with --connection-config.
Example: Remote Execution
node-scraper --sys-name <remote_host> --sys-location REMOTE --connection-config ./connection_config.json run-plugins DmesgPlugin
Example: connection_config.json
In-band (SSH) connection:
{
"InBandConnectionManager": {
"hostname": "remote_host.example.com",
"port": 22,
"username": "myuser",
"password": "mypassword",
"key_filename": "/path/to/private/key"
}
}
Redfish (BMC) connection for Redfish-only plugins:
{
"RedfishConnectionManager": {
"host": "bmc.example.com",
"port": 443,
"username": "admin",
"password": "secret",
"use_https": true,
"verify_ssl": true,
"api_root": "redfish/v1"
}
}
api_root(optional): Redfish API path (e.g.redfish/v1). If omitted, the defaultredfish/v1is used. Override this when your BMC uses a different API version path.
Notes:
- If using SSH keys, specify
key_filenameinstead ofpassword. - The remote user must have permissions to run the requested plugins and access required files. If needed, use the
--skip-sudoargument to skip plugins requiring sudo.
Subcommands
Plugins to run can be specified in two ways, using a plugin JSON config file or using the 'run-plugins' sub command. These two options are not mutually exclusive and can be used together.
'describe' subcommand
You can use the describe subcommand to display details about built-in configs or plugins.
List all built-in configs:
node-scraper describe config
Show details for a specific built-in config
node-scraper describe config <config-name>
List all available plugins**
node-scraper describe plugin
Show details for a specific plugin
node-scraper describe plugin <plugin-name>
'run-plugins' sub command
The plugins to run and their associated arguments can also be specified directly on the CLI using the 'run-plugins' sub-command. Using this sub-command you can specify a plugin name followed by the arguments for that particular plugin. Multiple plugins can be specified at once.
You can view the available arguments for a particular plugin by running
node-scraper run-plugins <plugin-name> -h:
usage: node-scraper run-plugins BiosPlugin [-h] [--collection {True,False}] [--analysis {True,False}] [--system-interaction-level STRING]
[--data STRING] [--exp-bios-version [STRING ...]] [--regex-match {True,False}]
options:
-h, --help show this help message and exit
--collection {True,False}
--analysis {True,False}
--system-interaction-level STRING
--data STRING
--exp-bios-version [STRING ...]
--regex-match {True,False}
Examples
Run a single plugin
node-scraper run-plugins BiosPlugin --exp-bios-version TestBios123
Run multiple plugins
node-scraper run-plugins BiosPlugin --exp-bios-version TestBios123 RocmPlugin --exp-rocm TestRocm123
Run plugins without specifying args (plugin defaults will be used)
node-scraper run-plugins BiosPlugin RocmPlugin
Use plugin configs and 'run-plugins'
node-scraper run-plugins BiosPlugin
'gen-plugin-config' sub command
The 'gen-plugin-config' sub command can be used to generate a plugin config JSON file for a plugin or list of plugins that can then be customized. Plugin arguments which have default values will be prepopulated in the JSON file, arguments without default values will have a value of 'null'.
Examples
Generate a config for the DmesgPlugin:
node-scraper gen-plugin-config --plugins DmesgPlugin
This would produce the following config:
{
"global_args": {},
"plugins": {
"DmesgPlugin": {
"collection": true,
"analysis": true,
"system_interaction_level": "INTERACTIVE",
"data": null,
"analysis_args": {
"analysis_range_start": null,
"analysis_range_end": null,
"check_unknown_dmesg_errors": true,
"exclude_category": null,
"interval_to_collapse_event": 60,
"num_timestamps": 3
}
}
},
"result_collators": {}
}
Running DmesgPlugin with a dmesg log file:
Instead of collecting dmesg from the system, you can analyze a pre-existing dmesg log file using the --data argument:
node-scraper --run-plugins DmesgPlugin --data /path/to/dmesg.log --collection False
This will skip the collection phase and directly analyze the provided dmesg.log file.
Custom Error Regex Example:
You can extend the built-in error detection with custom regex patterns. Create a config file with custom error patterns:
{
"global_args": {},
"plugins": {
"DmesgPlugin": {
"analysis_args": {
"check_unknown_dmesg_errors": false,
"interval_to_collapse_event": 60,
"num_timestamps": 3,
"error_regex": [
{
"regex": "MY_CUSTOM_ERROR.*",
"message": "My Custom Error Detected",
"event_category": "SW_DRIVER",
"event_priority": 3
},
{
"regex": "APPLICATION_CRASH: .*",
"message": "Application Crash",
"event_category": "SW_DRIVER",
"event_priority": 4
}
]
}
}
},
"result_collators": {}
}
Save this to dmesg_custom_config.json and run:
node-scraper --plugin-configs dmesg_custom_config.json run-plugins DmesgPlugin
'compare-runs' subcommand
The compare-runs subcommand compares datamodels from two run log directories (e.g. two
nodescraper_log_* folders). By default, all plugins with data in both runs are compared.
Basic usage:
node-scraper compare-runs <path1> <path2>
Exclude specific plugins from the comparison with --skip-plugins:
node-scraper compare-runs path1 path2 --skip-plugins SomePlugin
Compare only certain plugins with --include-plugins:
node-scraper compare-runs path1 path2 --include-plugins DmesgPlugin
Show full diff output (no truncation of the Message column or limit on number of errors) with --dont-truncate:
node-scraper compare-runs path1 path2 --include-plugins DmesgPlugin --dont-truncate
You can pass multiple plugin names to --skip-plugins or --include-plugins.
'show-redfish-oem-allowable' subcommand
The show-redfish-oem-allowable subcommand fetches the list of OEM diagnostic types supported by your BMC (from the Redfish LogService OEMDiagnosticDataType@Redfish.AllowableValues). Use it to discover which types you can put in oem_diagnostic_types_allowable and oem_diagnostic_types in the Redfish OEM diag plugin config.
Requirements: A Redfish connection config (same as for RedfishOemDiagPlugin).
Command:
node-scraper --connection-config connection-config.json show-redfish-oem-allowable --log-service-path "redfish/v1/Systems/UBB/LogServices/DiagLogs"
Output is a JSON array of allowable type names (e.g. ["Dmesg", "JournalControl", "AllLogs", ...]). Copy that list into your plugin config’s oem_diagnostic_types_allowable if you want to match your BMC.
Redfish OEM diag plugin config example
Use a plugin config that points at your LogService and lists the types to collect. Logs are written under the run log path (see --log-path).
{
"name": "Redfish OEM diagnostic logs",
"desc": "Collect OEM diagnostic logs from Redfish LogService. Requires Redfish connection config.",
"global_args": {},
"plugins": {
"RedfishOemDiagPlugin": {
"collection_args": {
"log_service_path": "redfish/v1/Systems/UBB/LogServices/DiagLogs",
"oem_diagnostic_types_allowable": [
"JournalControl",
...
"AllLogs",
],
"oem_diagnostic_types": ["JournalControl", "AllLogs"],
"task_timeout_s": 600
},
"analysis_args": {
"require_all_success": false
}
}
},
"result_collators": {}
}
log_service_path: Redfish path to the LogService (e.g. DiagLogs). Must match your system (e.g.UBBvs. another system id).oem_diagnostic_types_allowable: Full list of types the BMC supports (fromshow-redfish-oem-allowableor vendor docs).oem_diagnostic_types: Subset of types to collect on each run (e.g.["JournalControl", "AllLogs"]).task_timeout_s: Max seconds to wait per collection task.
How to use
- Discover allowable types (optional): run
show-redfish-oem-allowableand paste the output intooem_diagnostic_types_allowablein your plugin config. - Set
oem_diagnostic_typesto the list you want to collect (e.g.["JournalControl", "AllLogs"]). - Run the plugin with a Redfish connection config and your plugin config:
node-scraper --connection-config connection-config.json --plugin-config plugin_config_redfish_oem_diag.json run-plugins RedfishOemDiagPlugin
- Use
--log-pathto choose where run logs (and OEM diag archives) are written; archives go under<log-path>/scraper_logs_<name>_<timestamp>/redfish_oem_diag_plugin/redfish_oem_diag_collector/.
RedfishEndpointPlugin
The RedfishEndpointPlugin collects Redfish URIs (GET responses) and optionally runs checks on the returned JSON. It requires a Redfish connection config (same as RedfishOemDiagPlugin).
How to run
- Create a connection config (e.g.
connection-config.json) withRedfishConnectionManagerand your BMC host, credentials, and API root. - Create a plugin config with
uristo collect and optionalchecksfor analysis (see example below). For example save asplugin_config_redfish_endpoint.json. - Run:
node-scraper --connection-config connection-config.json --plugin-config plugin_config_redfish_endpoint.json run-plugins RedfishEndpointPlugin
Sample plugin config (plugin_config_redfish_endpoint.json):
{
"name": "RedfishEndpointPlugin",
"desc": "Redfish endpoint: collect URIs and optional checks",
"global_args": {},
"plugins": {
"RedfishEndpointPlugin": {
"collection_args": {
"uris": [
"/redfish/v1/",
"/redfish/v1/Systems/1",
"/redfish/v1/Chassis/1/Power"
]
},
"analysis_args": {
"checks": {
"/redfish/v1/Systems/1": {
"PowerState": "On",
"Status/Health": { "anyOf": ["OK", "Warning"] }
},
"/redfish/v1/Chassis/1/Power": {
"PowerControl/0/PowerConsumedWatts": { "max": 1000 }
}
}
}
}
},
"result_collators": {}
}
uris: List of Redfish paths (e.g./redfish/v1/,/redfish/v1/Systems/1) to GET and store.checks: Optional. Map of URI to expected values or constraints for analysis. Supports exact match (e.g."PowerState": "On"),anyOf,min/max, etc.
'summary' sub command
The 'summary' subcommand can be used to combine results from multiple runs of node-scraper to a single summary.csv file. Sample run:
node-scraper summary --search-path /<path_to_node-scraper_logs>
This will generate a new file '/<path_to_node-scraper_logs>/summary.csv' file. This file will contain the results from all 'nodescraper.csv' files from '/<path_to_node-scarper_logs>'.
Configs
A plugin JSON config should follow the structure of the plugin config model defined here. The globals field is a dictionary of global key-value pairs; values in globals will be passed to any plugin that supports the corresponding key. The plugins field should be a dictionary mapping plugin names to sub-dictionaries of plugin arguments. Lastly, the result_collators attribute is used to define result collator classes that will be run on the plugin results. By default, the CLI adds the TableSummary result collator, which prints a summary of each plugin’s results in a tabular format to the console.
{
"globals_args": {},
"plugins": {
"BiosPlugin": {
"analysis_args": {
"exp_bios_version": "TestBios123"
}
},
"RocmPlugin": {
"analysis_args": {
"exp_rocm_version": "TestRocm123"
}
}
}
}
Global args
Global args can be used to skip sudo plugins or enable/disble either collection or analysis. Below is an example that skips sudo requiring plugins and disables analysis.
"global_args": {
"collection_args": {
"skip_sudo" : 1
},
"collection" : 1,
"analysis" : 0
},
Plugin config: '--plugin-configs' command
A plugin config can be used to compare the system data against the config specifications. Built-in configs include NodeStatus (a subset of plugins) and AllPlugins (runs every registered plugin with default arguments—useful for generating a reference config from the full system).
Using a JSON file:
node-scraper --plugin-configs plugin_config.json
Here is an example of a comprehensive plugin config that specifies analyzer args for each plugin:
{
"global_args": {},
"plugins": {
"BiosPlugin": {
"analysis_args": {
"exp_bios_version": "3.5"
}
},
"CmdlinePlugin": {
"analysis_args": {
"cmdline": "imgurl=test NODE=nodename selinux=0 serial console=ttyS1,115200 console=tty0",
"required_cmdline" : "selinux=0"
}
},
"DkmsPlugin": {
"analysis_args": {
"dkms_status": "amdgpu/6.11",
"dkms_version" : "dkms-3.1",
"regex_match" : true
}
},
"KernelPlugin": {
"analysis_args": {
"exp_kernel": "5.11-generic"
}
},
"OsPlugin": {
"analysis_args": {
"exp_os": "Ubuntu 22.04.2 LTS"
}
},
"PackagePlugin": {
"analysis_args": {
"exp_package_ver": {
"gcc": "11.4.0"
},
"regex_match": false
}
},
"RocmPlugin": {
"analysis_args": {
"exp_rocm": "6.5"
}
}
},
"result_collators": {},
"name": "plugin_config",
"desc": "My golden config"
}
Reference config: 'gen-reference-config' command
This command can be used to generate a reference config that is populated with current system configurations. Plugins that use analyzer args (where applicable) will be populated with system data.
Run all registered plugins (AllPlugins config):
node-scraper --plugin-config AllPlugins
Generate a reference config for specific plugins:
node-scraper --gen-reference-config run-plugins BiosPlugin OsPlugin
This will generate the following config:
{
"global_args": {},
"plugins": {
"BiosPlugin": {
"analysis_args": {
"exp_bios_version": [
"M17"
],
"regex_match": false
}
},
"OsPlugin": {
"analysis_args": {
"exp_os": [
"8.10"
],
"exact_match": true
}
}
},
"result_collators": {}
This config can later be used on a different platform for comparison, using the steps at #2:
node-scraper --plugin-configs reference_config.json
An alternate way to generate a reference config is by using log files from a previous run. The example below uses log files from 'scraper_logs_
node-scraper gen-plugin-config --gen-reference-config-from-logs scraper_logs_<path>/ --output-path custom_output_dir
This will generate a reference config that includes plugins with logged results in 'scraper_log_
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file amd_node_scraper-1.1.2.tar.gz.
File metadata
- Download URL: amd_node_scraper-1.1.2.tar.gz
- Upload date:
- Size: 358.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1958fabfdb16677615870122e0a57d12a7552ca43562ca4a6c460d7d35ec0d9
|
|
| MD5 |
0c0efce74bc81589c245baa3ab292b49
|
|
| BLAKE2b-256 |
9f413daa7263fb86f189cb9ef7ee88d78021ad8a8d28b80fd38e4a7b9760d6f9
|
Provenance
The following attestation bundles were made for amd_node_scraper-1.1.2.tar.gz:
Publisher:
release-trusted-publisher.yml on amd/node-scraper
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
amd_node_scraper-1.1.2.tar.gz -
Subject digest:
f1958fabfdb16677615870122e0a57d12a7552ca43562ca4a6c460d7d35ec0d9 - Sigstore transparency entry: 1108417150
- Sigstore integration time:
-
Permalink:
amd/node-scraper@ce264377ed21a9934767a5fcdd16c6753955a47c -
Branch / Tag:
refs/heads/main - Owner: https://github.com/amd
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-trusted-publisher.yml@ce264377ed21a9934767a5fcdd16c6753955a47c -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file amd_node_scraper-1.1.2-py3-none-any.whl.
File metadata
- Download URL: amd_node_scraper-1.1.2-py3-none-any.whl
- Upload date:
- Size: 455.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca987157fba7184883ba5e0415e2a7c83a181ca785696d29ae1f6c1e2682943f
|
|
| MD5 |
429db5948f791ffc2033d21e8c2ebead
|
|
| BLAKE2b-256 |
e85bf4ccf283209a0725c08c7c84b515ac647a8662213d52ecfe76d6b0ee9d9a
|
Provenance
The following attestation bundles were made for amd_node_scraper-1.1.2-py3-none-any.whl:
Publisher:
release-trusted-publisher.yml on amd/node-scraper
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
amd_node_scraper-1.1.2-py3-none-any.whl -
Subject digest:
ca987157fba7184883ba5e0415e2a7c83a181ca785696d29ae1f6c1e2682943f - Sigstore transparency entry: 1108417157
- Sigstore integration time:
-
Permalink:
amd/node-scraper@ce264377ed21a9934767a5fcdd16c6753955a47c -
Branch / Tag:
refs/heads/main - Owner: https://github.com/amd
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-trusted-publisher.yml@ce264377ed21a9934767a5fcdd16c6753955a47c -
Trigger Event:
workflow_dispatch
-
Statement type: