A CLI tool that helps capture metrics from the Operating System

These details have not been verified by PyPI

Project links

Homepage

Project description

pmeter

A python tool that measures the tcp and udp network metrics

CSE-603 PDP Project

#####Contributors Deepika Ghodki Aman Harsh Neha Mishra Jacob Goldverg

Links to Relevant Papers

Historical Analysis and Real-Time Tuning
Cheng, Liang and Marsic, Ivan. ‘Java-based Tools for Accurate Bandwidth Measurement of Digital Subscriber Line Networks’. 1 Jan. 2002 : 333 – 344.
Java-based tools for accurate bandwidth measurement of Digital Subscriber Line
Energy-saving Cross-layer Optimization of Big Data Transfer Based on Historical Log Analysis
Cross-layer Optimization of Big Data Transfer Throughput and Energy Consumption
HARP: Predictive Transfer Optimization Based on Historical Analysis and Real-time Probing

The Problem

Currently, the OneDataShare Transfer-Service do not collect/report(to AWS deployment) the network state they experience. Tools such as "sar" and "ethtool" report metrics like: ping, bandwidth, latency, link capacity, RTT,, etc to the user that allow them to understand bottlenecks in their network.

Metrics we collect

Kernel Level:
- active cores
- cpu frequency
- energy consumption
- cpu Architecture
Application Level:
- pipelining
- concurrency
- parallelism
- chunk size
Network Level:
- RTT
- Bandwidth
- BDP(Link capacity * RTT)
- packet loss rate
- link capacity
Data Characteristics:
- Number of files
- Total Size of transfer
- Average file size
- std deviation of file sizes
- file types in the transfer

End System Resource Usage:

% of CPU’s used
% of NIC used.

Solution

We initially explored three soultions and we have decided solution 1 would be sufficient and provide accurate enough metrics.

Solution 1. Writing a Python script which the Transfer-Service will run as a CRON job to collect the network conditions periodically. The script will create a file that will be formatted metric report, and the Transfer-Service will then read/parse that file and send it to CockroachDB/Prometheous that will be run on the AWS backend

The current state of the project is we have a python script that supports: kernel, and some network level metrics. The script generates a a file in the users home directory under ~/.pmeter/pmeter_measure.txt, this file stores a json dump of the ODS_Metrics object inside of the file. The cli is able to run for a number of measurements or a certain amount of time. Every "row" of the file is a new ODS_Metrics object that stores a new measurement. This file then gets parsed and cleaned up as the Transfer-Service reads from it and appends its own data to the object(file count, types of files,,, etc)then proceeds to store the data in InfluxDB/CockroachDB.

The aggregator service is a publisher to InfluxDB and has The aggregator service is a service running in the OneDataShare (ODS) VPC which summarizes and computes the data so we can perform some visualization. We currently have a graph being generated with one metric(latency) and we will now begin to explore ML.

Recap

Before we began exploring ML models we began by breaking down the problems which we are attempting to solve again.

What parameters(concurrency, pipelining, parallelism, and chunk size) are optimal for performing a big data file transfer?
What is the given network condition that a host is experiencing?

The Data:

The current data we are generating is what is commonly refereed to as "Time-Series Data". InfluxDb defines this type of data as "Time series data, also referred to as time-stamped data, is a sequence of data points indexed in time order. Time-stamped is data collected at different points in time.". It is essentially data that represents a snapshot of time for something, this something in our case in the kernel/network conditions that the Operating System is experiencing.

Example Graphs

Latency and CPU Frequency RTT RTT Over Time Example Query

Challenges per Solution

Solution 1: We currently expect the actual metrics to not be as accurate as the manual implementation on the Java application. As UDP/TCP are dynamic we know that having separate connections(python sockets vs java sockets) will create variability in the measurements. Another source of variability is using another programming language will only provide an estimation of what the Transfer-Service is experiencing in performance as the Java is completely virtualized. The benefit of this approach is that Python has many libraries more network measuring libraries.

Bandwidth is still only realizable bandwidth for the ODS Transfer-Service.
Ping traditionally uses ICMP which requires a high level of permissions, so if the process is not able to run ICMP ping then we use TCP ping which is less accurate but better than nothing.
We are still observing the difference between CDB and InfluxDB in terms of extrapolating data. We currently fully support both database types and are now attempting to swell the DB and see how performance is.

TO-DO

Run the cli on DTN on CCR for 1 week to gather some data.
Explore various regressions that would let us extrapolate relationships in values.
Create a set of graphs(with types) to generate to summarize the conditions the host is going through over time.

Libraries to be used per solution

ping: will allow measurement of packet loss and latency psutil: A networking library that exposes kernel/os level metrics. statsd: A library that allows us to construct concise reports for sending to AWS. influxdb: A Time series database that allows us to store and generate trivial graphs.

Solution 1. tcp-latency, udp-latency, ping, psutil(Exposes: CPU, NIC metrics) allows us to compute RTT, Bandwidth, estimated link capacity.

List of Technologies

Tools: ping, psutil, Technologies: Java, Python, CockroachDB, Prometheus, Grafana

What we will Accomplish

By the end of the semester we would like to have the transfer service to be fully monitoring its network conditions and reporting it periodically back to the ODS backend. We will be using either CockroachDB or Prometheus to be storing the time-series data thus allowing the ODS deployment to optimize the transfer based on the papers above. For extra browny points we would like to implement a Grafana dashboard so every user can be aware of the network conditions around their transfer.

What we have accomplished

We have a CLI that captures the kernel, and network parameters that the OS exposes to the application layer
We have a time series DB(InfluxDB) which allows us to store and manipulate the time

References

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.0.22

Oct 9, 2024

1.0.20

Apr 9, 2024

1.0.19

Apr 9, 2024

1.0.18

Mar 28, 2024

This version

1.0.17

Mar 12, 2024

1.0.15

Feb 29, 2024

1.0.14

Jan 26, 2024

1.0.12

Jan 22, 2024

1.0.11

Jan 19, 2024

1.0.10

Jun 19, 2023

1.0.8

Jun 16, 2023

1.0.7

Sep 20, 2022

1.0.6

Sep 20, 2022

1.0.4

Sep 20, 2022

1.0.3

Sep 20, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pmeter_ods-1.0.17.tar.gz (1.4 MB view details)

Uploaded Mar 12, 2024 Source

Built Distribution

pmeter_ods-1.0.17-py3-none-any.whl (12.0 kB view details)

Uploaded Mar 12, 2024 Python 3

File details

Details for the file pmeter_ods-1.0.17.tar.gz.

File metadata

Download URL: pmeter_ods-1.0.17.tar.gz
Upload date: Mar 12, 2024
Size: 1.4 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for pmeter_ods-1.0.17.tar.gz
Algorithm	Hash digest
SHA256	`c657f1c05fa7555070d4b6e85b35b502826799c2c43301bac567f46366700bfc`
MD5	`cb2c96607eabd032f292f88b354ee1ca`
BLAKE2b-256	`c70f8d38ef24044e4355012e6e94c48fa1e0600261740196aea15f4007cfb24c`

See more details on using hashes here.

File details

Details for the file pmeter_ods-1.0.17-py3-none-any.whl.

File metadata

Download URL: pmeter_ods-1.0.17-py3-none-any.whl
Upload date: Mar 12, 2024
Size: 12.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for pmeter_ods-1.0.17-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7a7526b6bd22c028a3268e04cec99c6712d0c7e4dbffc132a4776110c7759c90`
MD5	`aeee3a252de3f6fda4a82fd4b09b5c03`
BLAKE2b-256	`e47d2aa7349113d3bbfc9a7b5e7512053f74e82bbfe976a1ac6cf241b31c8a55`