Skip to main content

Python multi-engine PCAP analyse kit.

Project description

# PyPCAPKit

<!-- reconstruct Frame, each protocol instance should be stored within the Frame instance; IPv6 pending more consideration -->

&emsp; The `pcapkit` project is an open source Python program focus on [PCAP](https://en.wikipedia.org/wiki/Pcap) parsing and analysis, which works as a stream PCAP file extractor. With support of [`dictdumper`](https://github.com/JarryShaw/dictdumper), it shall support multiple output report formats.

> Note that the whole project only supports __Python 3.6__ or later.

- [About](#about)
* [Module Structure](#module-structure)
- [Foundation](https://github.com/JarryShaw/pypcapkit/tree/master/src/foundation#foundation-manual)
- [Interface](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#interface-manual)
- [Reassembly](https://github.com/JarryShaw/pypcapkit/tree/master/src/reassembly#reassembly-manual)
- [IPSuite](https://github.com/JarryShaw/pypcapkit/tree/master/src/ipsuite#ipsuite-manual)
- [Protocols](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols#protocols-manual)
- [Utilities](https://github.com/JarryShaw/pypcapkit/tree/master/src/utilities#utilities-maunal)
- [CoreKit](https://github.com/JarryShaw/pypcapkit/tree/master/src/corekit#corekit-manual)
- [DumpKit](https://github.com/JarryShaw/pypcapkit/tree/master/src/dumpkit#dumpkit-manual)
* [Engine Comparison](#engine-comparison)
- [Installation](#installation)
- [Usage](#usage)
* [Documentation](#documentation)
- [Interfaces](#interfaces)
- [Macros](#macros)
* [Formats](#formats)
* [Layers](#layers)
* [Engines](#engines)
- [Protocols](#protocols)
* [CLI Usage](#cli-usage)
- [Samples](#samples)
* [Usage Samples](#usage-samples)
* [CLI Samples](#cli-samples)
- [TODO](#todo)

---

## About

&emsp; `pcapkit` is an independent open source library, using only [`dictdumper`](https://github.com/JarryShaw/dictdumper) as its formatted output dumper.

> There is a project called [`jspcapy`](https://github.com/JarryShaw/jspcapy) works on `pcapkit`, which is a command line tool for PCAP extraction.

&emsp; Unlike popular PCAP file extractors, such as `Scapy`, `dpkt`, `pyshark`, and etc, `pcapkit` uses __streaming__ strategy to read input files. That is to read frame by frame, decrease occupation on memory, as well as enhance efficiency in some way.

### Module Structure

&emsp; In `pcapkit`, all files can be described as following six parts.

- Foundation (`pcapkit.foundation`) -- synthesise file I/O and protocol analysis, coordinate information exchange in all network layers
- Interface (`pcapkit.interface`) -- user interface for the `pcapkit` library, which standardise and simplify the usage of this library
- Reassembly (`pcapkit.reassembly`) -- base on algorithms described in [`RFC 815`](https://tools.ietf.org/html/rfc815), implement datagram reassembly of IP and TCP packets
- IPSuite (`pcapkit.ipsuite`) -- collection of constructors for [Internet Protocol Suite](https://en.wikipedia.org/wiki/Internet_protocol_suite)
- Protocols (`pcapkit.protocols`) -- collection of all protocol family, with detail implementation and methods
- Utilities (`pcapkit.utilities`) -- collection of four utility functions and classes
- CoreKit (`pcapkit.corekit`) -- core utilities for `pcapkit` implementation
- DumpKit (`pcapkit.dumpkit`) -- dump utilities for `pcapkit` implementation

![](./doc/pypcapkit.png)

### Engine Comparison

&emsp; Besides, due to complexity of `pcapkit`, its extraction procedure takes around *0.01* seconds per packet, which is not ideal enough. Thus, `pcapkit` introduced alternative extraction engines to accelerate this procedure. By now, `pcapkit` supports [`Scapy`](https://scapy.net), [`DPKT`](https://github.com/kbandla/dpkt), and [`PyShark`](https://kiminewt.github.io/pyshark/). Plus, `pcapkit` supports two strategies of multiprocessing (`server` & `pipeline`). For more information, please refer to the document.

| Engine | Performance (seconds per packet) |
| :--------: | :------------------------------: |
| `default` | `0.014525251388549805` |
| `server` | `0.12124489148457845` |
| `pipeline` | `0.014450424114863079` |
| `scapy` | `0.002443440357844035` |
| `dpkt` | `0.0003609057267506917` |
| `pyshark` | `0.0792640733718872` |

&nbsp;

## Installation

> Note that `pcapkit` only supports Python versions __since 3.6__

&emsp; Simply run the following to install the latest from PyPI:

```
pip install pypcapkit
```

&emsp; Or install from the git repository:

```
$ git clone https://github.com/JarryShaw/pypcapkit.git
$ python setup.py install
```

&nbsp;

## Usage

### Documentation

#### Interfaces

| NAME | DESCRIPTION |
| :--------------------------------------------------------------------------------------: | :-------------------------------: |
| [`extract`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#extract) | extract a PCAP file |
| [`analyse`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#analyse) | analyse application layer packets |
| [`reassemble`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#reassemble) | reassemble fragmented datagrams |
| [`trace`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#trace) | trace TCP packet flows |

#### Macros

##### Formats

| NAME | DESCRIPTION |
| :----------------------------------------------------------------------------: | :--------------------------------------: |
| [`JSON`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#formats) | JavaScript Object Notation (JSON) format |
| [`PLIST`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#formats) | macOS Property List (PLIST) format |
| [`TREE`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#formats) | Tree-View text format |
| [`PCAP`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#formats) | PCAP format |

##### Layers

| NAME | DESCRIPTION |
| :----------------------------------------------------------: | :---------------: |
| [`RAW`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#layers) | no specific layer |
| [`LINK`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#layers) | data-link layer |
| [`INET`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#layers) | internet layer |
| [`TRANS`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#layers) | transport layer |
| [`APP`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#layers) | application layer |

##### Engines

| NAME | DESCRIPTION |
| :----------------------------------------------------------: | :---------------------------------------------------------: |
| [`PCAPKit`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#engines) | the default engine |
| [`MPServer`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#engines) | the multiprocessing engine with server process strategy |
| [`MPPipeline`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#engines) | the multiprocessing engine with pipeline strategy |
| [`DPKT`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#engines) | the [`DPKT`](https://github.com/kbandla/dpkt) engine |
| [`Scapy`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#engines) | the [`Scapy`](https://scapy.net) engine |
| [`PyShark`](https://github.com/JarryShaw/pypcapkit/tree/master/src/interface#engines) | the [`PyShark`](https://kiminewt.github.io/pyshark/) engine |

#### Protocols

| NAME | DESCRIPTION |
| :-----------------------------------------------------------------------------------------------: | :---------------------------------: |
| [`Raw`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols#raw) | Raw Packet Data |
| [`ARP`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/link#arp) | Address Resolution Protocol |
| [`Ethernet`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/link#ethernet) | Ethernet Protocol |
| [`L2TP`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/link#l2tp) | Layer Two Tunnelling Protocol |
| [`OSPF`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/link#ospf) | Open Shortest Path First |
| [`RARP`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/link#rarp) | Reverse Address Resolution Protocol |
| [`VLAN`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/link#vlan) | 802.1Q Customer VLAN Tag Type |
| [`AH`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/internet#ah) | Authentication Header |
| [`HIP`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/internet#hip) | Host Identity Protocol |
| [`HOPOPT`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/internet#hopopt) | IPv6 Hop-by-Hop Options |
| [`IP`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/internet#ip) | Internet Protocol |
| [`IPsec`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/internet#ipsec) | Internet Protocol Security |
| [`IPv4`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/internet#ipv4) | Internet Protocol version 4 |
| [`IPv6`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/internet#ipv6) | Internet Protocol version 6 |
| [`IPv6_Frag`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/internet#ipv6_frag) | Fragment Header for IPv6 |
| [`IPv6_Opts`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/internet#ipv6_opts) | Destination Options for IPv6 |
| [`IPv6_Route`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/internet#ipv6_route) | Routing Header for IPv6 |
| [`IPX`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/internet#ipx) | Internetwork Packet Exchange |
| [`MH`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/internet#mh) | Mobility Header |
| [`TCP`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/transport#tcp) | Transmission Control Protocol |
| [`UDP`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/transport#udp) | User Datagram Protocol |
| [`HTTP`](https://github.com/JarryShaw/pypcapkit/tree/master/src/protocols/application#http) | Hypertext Transfer Protocol |

&emsp; Documentation can be found in submodules of `pcapkit`. Or, you may find usage sample in the [`test`](https://github.com/JarryShaw/pypcapkit/tree/master/test#test-samples) folder. For further information, please refer to the source code -- the docstrings should help you :)

__ps__: `help` function in Python should always help you out.

### CLI Usage

> The following part was originally described in [`jspcapy`](https://github.com/JarryShaw/jspcapy), which is now deprecated and merged into this repository.

&emsp; As it shows in the help manual, it is quite easy to use:

```
$ pcapkit --help
usage: pcapkit [-h] [-V] [-o file-name] [-f format] [-j] [-p] [-t] [-a] [-v]
[-F] [-E PKG] [-P PROTOCOL] [-L LAYER]
input-file-name

PCAP file extractor and formatted exporter

positional arguments:
input-file-name The name of input pcap file. If ".pcap" omits, it will
be automatically appended.

optional arguments:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-o file-name, --output file-name
The name of input pcap file. If format extension
omits, it will be automatically appended.
-f format, --format format
Print a extraction report in the specified output
format. Available are all formats supported by
dictdumper, e.g.: json, plist, and tree.
-j, --json Display extraction report as json. This will yield
"raw" output that may be used by external tools. This
option overrides all other options.
-p, --plist Display extraction report as macOS Property List
(plist). This will yield "raw" output that may be used
by external tools. This option overrides all other
options.
-t, --tree Display extraction report as tree view text. This will
yield "raw" output that may be used by external tools.
This option overrides all other options.
-a, --auto-extension If output file extension omits, append automatically.
-v, --verbose Show more information.
-F, --files Split each frame into different files.
-E PKG, --engine PKG Indicate extraction engine. Note that except default
engine, all other engines need support of corresponding
packages.
-P PROTOCOL, --protocol PROTOCOL
Indicate extraction stops after which protocol.
-L LAYER, --layer LAYER
Indicate extract frames until which layer.
```

&emsp; Under most circumstances, you should indicate the name of input PCAP file (extension may omit) and at least, output format (`json`, `plist`, or `tree`). Once format unspecified, the name of output file must have proper extension (`*.json`, `*.plist`, or `*.txt`), otherwise `FormatError` will raise.

&emsp; As for `verbose` mode, detailed information will print while extraction (as following examples). And `auto-extension` flag works for the output file, to indicate whether extensions should be appended.

&nbsp;

## Samples

### Usage Samples

&emsp; As described in `test` folder, `pcapkit` is quite easy to use, with simply three verbs as its main interface. Several scenarios are shown as below.

- extract a PCAP file and dump the result to a specific file (with no reassembly)

```python
import pcapkit
# dump to a PLIST file with no frame storage (property frame disabled)
plist = pcapkit.extract(fin='in.pcap', fout='out.plist', format='plist', store=False)
# dump to a JSON file with no extension auto-complete
json = pcapkit.extract(fin='in.cap', fout='out.json', format='json', extension=False)
# dump to a folder with each tree-view text file per frame
tree = pcapkit.extract(fin='in.pcap', fout='out', format='tree', files=True)
```

- extract a PCAP file and fetch IP packet (both IPv4 and IPv6) from a frame (with no output file)

```python
>>> import pcapkit
>>> extraction = pcapkit.extract(fin='in.pcap', nofile=True)
>>> frame0 = extraction.frame[0]
# check if IP in this frame, otherwise ProtocolNotFound will be raised
>>> flag = pcapkit.IP in frame0
>>> tcp = frame0[pcapkit.IP] if flag else None
```

- extract a PCAP file and reassemble TCP payload (with no output file nor frame storage)

```python
import pcapkit
# set strict to make sure full reassembly
extraction = pcapkit.extract(fin='in.pcap', store=False, nofile=True, tcp=True, strict=True)
# print extracted packet if HTTP in reassembled payloads
for packet in extraction.reassembly.tcp:
for reassembly in packet.packets:
if pcapkit.HTTP in reassembly.protochain:
print(reassembly.info)
```

### CLI Samples

&emsp; The CLI (command line interface) of `pcapkit` has two different access.

- through console scripts -- use command name `pcapkit` directly (as shown in samples)
- through Python module -- `python -m pypcapkit [...]` works exactly the same as above

Here are some usage samples:

- export to a macOS Property List ([`Xcode`](https://developer.apple.com/xcode) has special support for this format)

```
$ pcapkit in --format plist --verbose
🚨Loading file 'in.pcap'
- Frame 1: Ethernet:IPv6:ICMPv6
- Frame 2: Ethernet:IPv6:ICMPv6
- Frame 3: Ethernet:IPv4:TCP
- Frame 4: Ethernet:IPv4:TCP
- Frame 5: Ethernet:IPv4:TCP
- Frame 6: Ethernet:IPv4:UDP
🍺Report file stored in 'out.plist'
```

- export to a JSON file (with no format specified)

```
$ pcapkit in --output out.json --verbose
🚨Loading file 'in.pcap'
- Frame 1: Ethernet:IPv6:ICMPv6
- Frame 2: Ethernet:IPv6:ICMPv6
- Frame 3: Ethernet:IPv4:TCP
- Frame 4: Ethernet:IPv4:TCP
- Frame 5: Ethernet:IPv4:TCP
- Frame 6: Ethernet:IPv4:UDP
🍺Report file stored in 'out.json'
```

- export to a text tree view file (without extension autocorrect)

```
$ pcapkit in --output out --format tree --verbose
🚨Loading file 'in.pcap'
- Frame 1: Ethernet:IPv6:ICMPv6
- Frame 2: Ethernet:IPv6:ICMPv6
- Frame 3: Ethernet:IPv4:TCP
- Frame 4: Ethernet:IPv4:TCP
- Frame 5: Ethernet:IPv4:TCP
- Frame 6: Ethernet:IPv4:UDP
🍺Report file stored in 'out'
```

&nbsp;

## TODO

- [x] specify `Raw` packet
- [x] interface verbs
- [x] review docstrings
- [x] merge `jspcapy`
- [ ] write documentation
- [ ] implement IP and MAC address containers
- [ ] implement option list extractors
- [ ] implement more protocols

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypcapkit-0.9.5.post2.tar.gz (687.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pypcapkit-0.9.5.post2-py2.py3-none-any.whl (162.7 kB view details)

Uploaded Python 2Python 3

File details

Details for the file pypcapkit-0.9.5.post2.tar.gz.

File metadata

File hashes

Hashes for pypcapkit-0.9.5.post2.tar.gz
Algorithm Hash digest
SHA256 2d225a7e519767b3026837f998f05caf29d23835e8368ed61b64880f29ac3eb0
MD5 bd069afc7715feafc06be682940363b7
BLAKE2b-256 c3eaf43cd9951cd0194eab925ba68960a1df1e66178c950fb8081f94484c816d

See more details on using hashes here.

File details

Details for the file pypcapkit-0.9.5.post2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for pypcapkit-0.9.5.post2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 20f0add8f2d02303f181b5f7a87c475d763e2adc1b562bd00066618539aad930
MD5 14eb85e0fb6b6c9f344c4f9a39a60a2c
BLAKE2b-256 2da6be7e74403dc731980f848c8435cc69f48157e026c878d1dcdfa5e026f5fa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page