Pure-python library allowing to read the Warts file format produced by Scamper (an Internet measurement tool from CAIDA)
Project description
# About pywarts
`pywarts` is a pure-python parsing library for the Warts format.
Warts is an extensible binary format produced by
[Scamper](http://www.caida.org/tools/measurement/scamper/), an
Internet measurement tool from CAIDA, to store measurement results
such as traceroutes and pings.
This library started off from the [Python implementation from
CMAND](https://github.com/cmand/scamper), by Robert Beverly, but has
now vastly diverged. The parsing architecture is loosely inspired
from the [Ryu](https://osrg.github.io/ryu/) packet parser, although it
is less complex because the requirements are less stringent.
## Features
- pure-Python, very few dependencies
- can read all basic Warts data types (ping, traceroute)
- nice class-based interface
- streaming-like interface: no more than one record is pulled in
memory at any given time, so it should handle very large Warts file
with a limited amount of memory. You can probably even consume data
directly from the output of a running Scamper process.
- easily extensible for other Warts data types (patches are welcome)
## Difference with the implementation from CMAND
Here is some points on which `pywarts` improves from the code from
<https://github.com/cmand/scamper>:
- fully python3-compatible
- nicer class-based interface, instead of huge dicts with all flags
- properly handles unknown flags and options, by ignoring them
- attribute names have been generally made more readable (although
that often means longer names)
- possibly quite a bit faster (it would need proper benchmarks), because
of the way we parse flags and strings. Also, we read a whole record
into memory before parsing it, which is a bit faster than calling
`read()` repeatedly on very small amount of data.
However, there are some areas where the CMAND code does more things:
- `pywarts` does not implement the deprecated address format (it is
quite complex and has been deprecated for several years)
- there are some nice scripts in <https://github.com/cmand/scamper>,
for instance a script to attach to and control a running Scamper
process
# Documentation
Unit tests and proper documentation will come in time.
## Low-level API
The low-level API is pretty simple. There is a `parse_record`
function that takes a BufferedReader object (such as an opened file)
and reads a record from it. Remember to open your input Warts files
in binary mode.
The returned object is an instance of an appropriate subclass
(e.g. `Traceroute`), depending on the record type. Be aware that all
optional attributes are set to None if not present in the input file.
You should always check for this possibility in your user code.
Here is an example that opens a file, and repeatedly parses records
until it finds a Traceroute record (warts files usually have a few
initial records with mostly uninteresting data).
```
import warts
from warts.traceroute import Traceroute
with open('my_file.warts', 'rb') as f:
record = warts.parse_record(f)
while not isinstance(record, Traceroute):
record = warts.parse_record(f)
if record.src_address:
print("Traceroute source address:", record.src_address)
if record.dst_address:
print("Traceroute destination address:", record.dst_address)
print("Number of hops:", len(record.hops))
print(record.hops)
```
If parsing fails, an instance of `errors.ParseError` is thrown.
`pywarts` generally tries to clean up after itself, so the file
descriptor should point to the next record even after a parsing error.
Of course, this is not always possible, especially if the input file
is incorrectly formatted.
# Developement
## High-level
Some currently unanswered questions:
- What should the high-level API look like, and is there even a need
for a higher-level API? Just an iterator of records? Allow to
filter by record type? Try to parse further, for instance decode
flags or produce different objects for UDP, TCP and ICMP
traceroutes?
- Should we try to normalise values when parsing? For instance,
should we use `ipaddr` objects for addresses? Some times are
expressed in centiseconds, some in microseconds, some in seconds.
Should we normalize that to a common base? Are floats acceptable
for time values?
- What should we do when there is a parsing error? How can the user
continue parsing the next record if he/she wants to?
Please open issues if you have ideas and thoughts on these questions.
`pywarts` is a pure-python parsing library for the Warts format.
Warts is an extensible binary format produced by
[Scamper](http://www.caida.org/tools/measurement/scamper/), an
Internet measurement tool from CAIDA, to store measurement results
such as traceroutes and pings.
This library started off from the [Python implementation from
CMAND](https://github.com/cmand/scamper), by Robert Beverly, but has
now vastly diverged. The parsing architecture is loosely inspired
from the [Ryu](https://osrg.github.io/ryu/) packet parser, although it
is less complex because the requirements are less stringent.
## Features
- pure-Python, very few dependencies
- can read all basic Warts data types (ping, traceroute)
- nice class-based interface
- streaming-like interface: no more than one record is pulled in
memory at any given time, so it should handle very large Warts file
with a limited amount of memory. You can probably even consume data
directly from the output of a running Scamper process.
- easily extensible for other Warts data types (patches are welcome)
## Difference with the implementation from CMAND
Here is some points on which `pywarts` improves from the code from
<https://github.com/cmand/scamper>:
- fully python3-compatible
- nicer class-based interface, instead of huge dicts with all flags
- properly handles unknown flags and options, by ignoring them
- attribute names have been generally made more readable (although
that often means longer names)
- possibly quite a bit faster (it would need proper benchmarks), because
of the way we parse flags and strings. Also, we read a whole record
into memory before parsing it, which is a bit faster than calling
`read()` repeatedly on very small amount of data.
However, there are some areas where the CMAND code does more things:
- `pywarts` does not implement the deprecated address format (it is
quite complex and has been deprecated for several years)
- there are some nice scripts in <https://github.com/cmand/scamper>,
for instance a script to attach to and control a running Scamper
process
# Documentation
Unit tests and proper documentation will come in time.
## Low-level API
The low-level API is pretty simple. There is a `parse_record`
function that takes a BufferedReader object (such as an opened file)
and reads a record from it. Remember to open your input Warts files
in binary mode.
The returned object is an instance of an appropriate subclass
(e.g. `Traceroute`), depending on the record type. Be aware that all
optional attributes are set to None if not present in the input file.
You should always check for this possibility in your user code.
Here is an example that opens a file, and repeatedly parses records
until it finds a Traceroute record (warts files usually have a few
initial records with mostly uninteresting data).
```
import warts
from warts.traceroute import Traceroute
with open('my_file.warts', 'rb') as f:
record = warts.parse_record(f)
while not isinstance(record, Traceroute):
record = warts.parse_record(f)
if record.src_address:
print("Traceroute source address:", record.src_address)
if record.dst_address:
print("Traceroute destination address:", record.dst_address)
print("Number of hops:", len(record.hops))
print(record.hops)
```
If parsing fails, an instance of `errors.ParseError` is thrown.
`pywarts` generally tries to clean up after itself, so the file
descriptor should point to the next record even after a parsing error.
Of course, this is not always possible, especially if the input file
is incorrectly formatted.
# Developement
## High-level
Some currently unanswered questions:
- What should the high-level API look like, and is there even a need
for a higher-level API? Just an iterator of records? Allow to
filter by record type? Try to parse further, for instance decode
flags or produce different objects for UDP, TCP and ICMP
traceroutes?
- Should we try to normalise values when parsing? For instance,
should we use `ipaddr` objects for addresses? Some times are
expressed in centiseconds, some in microseconds, some in seconds.
Should we normalize that to a common base? Are floats acceptable
for time values?
- What should we do when there is a parsing error? How can the user
continue parsing the next record if he/she wants to?
Please open issues if you have ideas and thoughts on these questions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
File details
Details for the file scamper_pywarts-0.2.0-py2.py3-none-any.whl
.
File metadata
- Download URL: scamper_pywarts-0.2.0-py2.py3-none-any.whl
- Upload date:
- Size: 17.0 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dbc7b17c7244afa67bdba53328815b01576d58c4bcef897bbc31ec53b31027d3 |
|
MD5 | 059b181f81031429e90d7635c1d15b37 |
|
BLAKE2b-256 | 7d64a22b770f535691c6dd298677a723d829d4f4e66d8a4128c02fb8b14119fb |