Skip to main content

Python multi-engine PCAP analyse kit.

Project description

# PyPCAPKit

<!-- reconstruct Frame, each protocol instance should be stored within the Frame instance; IPv6 pending more consideration -->

&emsp; The `pcapkit` project is an open source Python program focus on [PCAP](https://en.wikipedia.org/wiki/Pcap) parsing and analysis, which works as a stream PCAP file extractor. With support of [`dictdumper`](https://github.com/JarryShaw/dictdumper), it shall support multiple output report formats.

> Note that the whole project only supports __Python 3.6__ or later.

- [About](#about)
* [Module Structure](#module-structure)
- [Foundation](https://github.com/JarryShaw/pypcapkit/tree/master/src/foundation#foundation-manual)
- [Interface](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#interface-manual)
- [Reassembly](https://github.com/JarryShaw/pypcapkit/tree/master/src/reassembly#reassembly-manual)
- [IPSuite](https://github.com/JarryShaw/pypcapkit/tree/master/src/ipsuite#ipsuite-manual)
- [Protocols](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols#protocols-manual)
- [Utilities](https://github.com/JarryShaw/pypcapkit/tree/master/src/utilities#utilities-maunal)
- [CoreKit](https://github.com/JarryShaw/pypcapkit/tree/master/src/corekit#corekit-manual)
- [ToolKit](https://github.com/JarryShaw/pypcapkit/tree/master/src/toolkit#toolkit-manual)
- [DumpKit](https://github.com/JarryShaw/pypcapkit/tree/master/src/dumpkit#dumpkit-manual)
* [Engine Comparison](#engine-comparison)
- [Installation](#installation)
- [Usage](#usage)
* [Documentation](#documentation)
- [Interfaces](#interfaces)
- [Macros](#macros)
* [Formats](#formats)
* [Layers](#layers)
* [Engines](#engines)
- [Protocols](#protocols)
* [CLI Usage](#cli-usage)
- [Samples](#samples)
* [Usage Samples](#usage-samples)
* [CLI Samples](#cli-samples)
- [TODO](#todo)

---

## About

&emsp; `pcapkit` is an independent open source library, using only [`dictdumper`](https://github.com/JarryShaw/dictdumper) as its formatted output dumper.

> There is a project called [`jspcapy`](https://github.com/JarryShaw/jspcapy) works on `pcapkit`, which is a command line tool for PCAP extraction.

&emsp; Unlike popular PCAP file extractors, such as `Scapy`, `dpkt`, `pyshark`, and etc, `pcapkit` uses __streaming__ strategy to read input files. That is to read frame by frame, decrease occupation on memory, as well as enhance efficiency in some way.

### Module Structure

&emsp; In `pcapkit`, all files can be described as following six parts.

- Foundation (`pcapkit.foundation`) -- synthesise file I/O and protocol analysis, coordinate information exchange in all network layers
- Interface (`pcapkit.interface`) -- user interface for the `pcapkit` library, which standardise and simplify the usage of this library
- Reassembly (`pcapkit.reassembly`) -- base on algorithms described in [`RFC 815`](https://tools.ietf.org/html/rfc815), implement datagram reassembly of IP and TCP packets
- IPSuite (`pcapkit.ipsuite`) -- collection of constructors for [Internet Protocol Suite](https://en.wikipedia.org/wiki/Internet_protocol_suite)
- Protocols (`pcapkit.protocols`) -- collection of all protocol family, with detail implementation and methods
- Utilities (`pcapkit.utilities`) -- collection of four utility functions and classes
- CoreKit (`pcapkit.corekit`) -- core utilities for `pcapkit` implementation
- ToolKit (`pcapkit.toolkit`) -- capability tools for `pcapkit` implementation
- DumpKit (`pcapkit.dumpkit`) -- dump utilities for `pcapkit` implementation

![](./doc/pypcapkit.png)

### Engine Comparison

&emsp; Besides, due to complexity of `pcapkit`, its extraction procedure takes around *0.01* seconds per packet, which is not ideal enough. Thus, `pcapkit` introduced alternative extraction engines to accelerate this procedure. By now, `pcapkit` supports [`Scapy`](https://scapy.net), [`DPKT`](https://github.com/kbandla/dpkt), and [`PyShark`](https://kiminewt.github.io/pyshark/). Plus, `pcapkit` supports two strategies of multiprocessing (`server` & `pipeline`). For more information, please refer to the document.

| Engine | Performance (seconds per packet) |
| :--------: | :------------------------------: |
| `default` | `0.014525251388549805` |
| `server` | `0.12124489148457845` |
| `pipeline` | `0.014450424114863079` |
| `scapy` | `0.002443440357844035` |
| `dpkt` | `0.0003609057267506917` |
| `pyshark` | `0.0792640733718872` |

&nbsp;

## Installation

> Note that `pcapkit` only supports Python versions __since 3.6__

&emsp; Simply run the following to install the latest from PyPI:

```
pip install pypcapkit
```

&emsp; Or install from the git repository:

```
$ git clone https://github.com/JarryShaw/pypcapkit.git
$ python setup.py install
```

&nbsp;

## Usage

### Documentation

#### Interfaces

| NAME | DESCRIPTION |
| :--------------------------------------------------------------------------------------: | :-------------------------------: |
| [`extract`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#extract) | extract a PCAP file |
| [`analyse`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#analyse) | analyse application layer packets |
| [`reassemble`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#reassemble) | reassemble fragmented datagrams |
| [`trace`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#trace) | trace TCP packet flows |

#### Macros

##### Formats

| NAME | DESCRIPTION |
| :----------------------------------------------------------------------------: | :--------------------------------------: |
| [`JSON`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#formats) | JavaScript Object Notation (JSON) format |
| [`PLIST`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#formats) | macOS Property List (PLIST) format |
| [`TREE`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#formats) | Tree-View text format |
| [`PCAP`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#formats) | PCAP format |

##### Layers

| NAME | DESCRIPTION |
| :----------------------------------------------------------: | :---------------: |
| [`RAW`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#layers) | no specific layer |
| [`LINK`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#layers) | data-link layer |
| [`INET`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#layers) | internet layer |
| [`TRANS`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#layers) | transport layer |
| [`APP`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#layers) | application layer |

##### Engines

| NAME | DESCRIPTION |
| :----------------------------------------------------------: | :---------------------------------------------------------: |
| [`PCAPKit`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#engines) | the default engine |
| [`MPServer`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#engines) | the multiprocessing engine with server process strategy |
| [`MPPipeline`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#engines) | the multiprocessing engine with pipeline strategy |
| [`DPKT`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#engines) | the [`DPKT`](https://github.com/kbandla/dpkt) engine |
| [`Scapy`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#engines) | the [`Scapy`](https://scapy.net) engine |
| [`PyShark`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#engines) | the [`PyShark`](https://kiminewt.github.io/pyshark/) engine |

#### Protocols

| NAME | DESCRIPTION |
| :-----------------------------------------------------------------------------------------------: | :---------------------------------: |
| [`Raw`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols#raw) | Raw Packet Data |
| [`ARP`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/link#arp) | Address Resolution Protocol |
| [`Ethernet`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/link#ethernet) | Ethernet Protocol |
| [`L2TP`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/link#l2tp) | Layer Two Tunnelling Protocol |
| [`OSPF`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/link#ospf) | Open Shortest Path First |
| [`RARP`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/link#rarp) | Reverse Address Resolution Protocol |
| [`VLAN`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/link#vlan) | 802.1Q Customer VLAN Tag Type |
| [`AH`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/internet#ah) | Authentication Header |
| [`HIP`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/internet#hip) | Host Identity Protocol |
| [`HOPOPT`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/internet#hopopt) | IPv6 Hop-by-Hop Options |
| [`IP`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/internet#ip) | Internet Protocol |
| [`IPsec`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/internet#ipsec) | Internet Protocol Security |
| [`IPv4`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/internet#ipv4) | Internet Protocol version 4 |
| [`IPv6`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/internet#ipv6) | Internet Protocol version 6 |
| [`IPv6_Frag`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/internet#ipv6_frag) | Fragment Header for IPv6 |
| [`IPv6_Opts`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/internet#ipv6_opts) | Destination Options for IPv6 |
| [`IPv6_Route`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/internet#ipv6_route) | Routing Header for IPv6 |
| [`IPX`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/internet#ipx) | Internetwork Packet Exchange |
| [`MH`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/internet#mh) | Mobility Header |
| [`TCP`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/transport#tcp) | Transmission Control Protocol |
| [`UDP`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/transport#udp) | User Datagram Protocol |
| [`HTTP`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/application#http) | Hypertext Transfer Protocol |

&emsp; Documentation can be found in submodules of `pcapkit`. Or, you may find usage sample in the [`test`](https://github.com/JarryShaw/pypcapkit/tree/master/test#test-samples) folder. For further information, please refer to the source code -- the docstrings should help you :)

__ps__: `help` function in Python should always help you out.

### CLI Usage

> The following part was originally described in [`jspcapy`](https://github.com/JarryShaw/jspcapy), which is now deprecated and merged into this repository.

&emsp; As it shows in the help manual, it is quite easy to use:

```
$ pcapkit --help
usage: pcapkit [-h] [-V] [-o file-name] [-f format] [-j] [-p] [-t] [-a] [-v]
[-F] [-E PKG] [-P PROTOCOL] [-L LAYER]
input-file-name

PCAP file extractor and formatted exporter

positional arguments:
input-file-name The name of input pcap file. If ".pcap" omits, it will
be automatically appended.

optional arguments:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-o file-name, --output file-name
The name of input pcap file. If format extension
omits, it will be automatically appended.
-f format, --format format
Print a extraction report in the specified output
format. Available are all formats supported by
dictdumper, e.g.: json, plist, and tree.
-j, --json Display extraction report as json. This will yield
"raw" output that may be used by external tools. This
option overrides all other options.
-p, --plist Display extraction report as macOS Property List
(plist). This will yield "raw" output that may be used
by external tools. This option overrides all other
options.
-t, --tree Display extraction report as tree view text. This will
yield "raw" output that may be used by external tools.
This option overrides all other options.
-a, --auto-extension If output file extension omits, append automatically.
-v, --verbose Show more information.
-F, --files Split each frame into different files.
-E PKG, --engine PKG Indicate extraction engine. Note that except default
engine, all other engines need support of corresponding
packages.
-P PROTOCOL, --protocol PROTOCOL
Indicate extraction stops after which protocol.
-L LAYER, --layer LAYER
Indicate extract frames until which layer.
```

&emsp; Under most circumstances, you should indicate the name of input PCAP file (extension may omit) and at least, output format (`json`, `plist`, or `tree`). Once format unspecified, the name of output file must have proper extension (`*.json`, `*.plist`, or `*.txt`), otherwise `FormatError` will raise.

&emsp; As for `verbose` mode, detailed information will print while extraction (as following examples). And `auto-extension` flag works for the output file, to indicate whether extensions should be appended.

&nbsp;

## Samples

### Usage Samples

&emsp; As described in `test` folder, `pcapkit` is quite easy to use, with simply three verbs as its main interface. Several scenarios are shown as below.

- extract a PCAP file and dump the result to a specific file (with no reassembly)

```python
import pcapkit
# dump to a PLIST file with no frame storage (property frame disabled)
plist = pcapkit.extract(fin='in.pcap', fout='out.plist', format='plist', store=False)
# dump to a JSON file with no extension auto-complete
json = pcapkit.extract(fin='in.cap', fout='out.json', format='json', extension=False)
# dump to a folder with each tree-view text file per frame
tree = pcapkit.extract(fin='in.pcap', fout='out', format='tree', files=True)
```

- extract a PCAP file and fetch IP packet (both IPv4 and IPv6) from a frame (with no output file)

```python
>>> import pcapkit
>>> extraction = pcapkit.extract(fin='in.pcap', nofile=True)
>>> frame0 = extraction.frame[0]
# check if IP in this frame, otherwise ProtocolNotFound will be raised
>>> flag = pcapkit.IP in frame0
>>> tcp = frame0[pcapkit.IP] if flag else None
```

- extract a PCAP file and reassemble TCP payload (with no output file nor frame storage)

```python
import pcapkit
# set strict to make sure full reassembly
extraction = pcapkit.extract(fin='in.pcap', store=False, nofile=True, tcp=True, strict=True)
# print extracted packet if HTTP in reassembled payloads
for packet in extraction.reassembly.tcp:
for reassembly in packet.packets:
if pcapkit.HTTP in reassembly.protochain:
print(reassembly.info)
```

### CLI Samples

&emsp; The CLI (command line interface) of `pcapkit` has two different access.

- through console scripts -- use command name `pcapkit` directly (as shown in samples)
- through Python module -- `python -m pypcapkit [...]` works exactly the same as above

Here are some usage samples:

- export to a macOS Property List ([`Xcode`](https://developer.apple.com/xcode) has special support for this format)

```
$ pcapkit in --format plist --verbose
🚨Loading file 'in.pcap'
- Frame 1: Ethernet:IPv6:ICMPv6
- Frame 2: Ethernet:IPv6:ICMPv6
- Frame 3: Ethernet:IPv4:TCP
- Frame 4: Ethernet:IPv4:TCP
- Frame 5: Ethernet:IPv4:TCP
- Frame 6: Ethernet:IPv4:UDP
🍺Report file stored in 'out.plist'
```

- export to a JSON file (with no format specified)

```
$ pcapkit in --output out.json --verbose
🚨Loading file 'in.pcap'
- Frame 1: Ethernet:IPv6:ICMPv6
- Frame 2: Ethernet:IPv6:ICMPv6
- Frame 3: Ethernet:IPv4:TCP
- Frame 4: Ethernet:IPv4:TCP
- Frame 5: Ethernet:IPv4:TCP
- Frame 6: Ethernet:IPv4:UDP
🍺Report file stored in 'out.json'
```

- export to a text tree view file (without extension autocorrect)

```
$ pcapkit in --output out --format tree --verbose
🚨Loading file 'in.pcap'
- Frame 1: Ethernet:IPv6:ICMPv6
- Frame 2: Ethernet:IPv6:ICMPv6
- Frame 3: Ethernet:IPv4:TCP
- Frame 4: Ethernet:IPv4:TCP
- Frame 5: Ethernet:IPv4:TCP
- Frame 6: Ethernet:IPv4:UDP
🍺Report file stored in 'out'
```

&nbsp;

## TODO

- [x] specify `Raw` packet
- [x] interface verbs
- [x] review docstrings
- [x] merge `jspcapy`
- [ ] write documentation
- [ ] implement IP and MAC address containers
- [ ] implement option list extractors
- [ ] implement more protocols

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypcapkit-0.9.8rc1.tar.gz (688.0 kB view hashes)

Uploaded Source

Built Distribution

pypcapkit-0.9.8rc1-py2.py3-none-any.whl (166.5 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page