Extract logs based off events from sysmon. Comes as a package, cli and ui.
Project description
Sysmon Extract
Sysmon Extract is a library to extract events from the sysmon log type based off the event id. They can be extracted as a file (any big data format) with support for HDFS or in memory as a Spark or Pandas DataFrame. As a note, this library works best with Spark as it leverages it for the ETL process.
The library comes with a library, cli and UI.
Table of Contents
Usage
Command Line
Usage: sysxtract [OPTIONS]
Options:
-i, --input-file PATH
-h, --header
-e, --event TEXT
-lc, --log-column TEXT
-ec, --event-column TEXT [default: ]
-a, --additional-columns TEXT
-o, --output-file TEXT [default: /home/sidhu/sysmon-extract/sysmon-output.csv]
-s, --single-file
-m, --master TEXT [default: local]
-ui, --start-ui
--help Show this message and exit.
sysxtract -i /media/sidhu/Seagate/empire_apt3_2019-05-14223117.json -e 1 -e 2 -lc log_name -ec event_data -s -a host.name -o /home/sidhu/output.json
Let's break it down.
Input file: -i /media/sidhu/Seagate/empire_apt3_2019-05-14223117.json
Sysmon Events to extract: -e 1 -e 2
Column in the dataset that describes the log source (Sysmon, Microsoft Security, Microsoft Audit, etc.): -lc log_name
Column in the dataset that contains the nested sysmon data (often event_data): -ec event_data
Output as a single file: -s
Additional columns to extract: -a host.name
Output file name: /home/sidhu/output.json
UI
sysextract -ui
Package
Using the example above:
from sysxtract import extract
# Extract to a file
extract(
"/media/sidhu/Seagate/empire_apt3_2019-05-14223117.json",
[1, 2],
log_column="log_name",
event_column="event_data",
additional_columns="host.name",
single_file=True,
output_file="/home/sidhu/output.json"
)
# Extract to a file using an existing Spark cluster
extract(
"/media/sidhu/Seagate/empire_apt3_2019-05-14223117.json",
[1, 2],
log_column="log_name",
event_column="event_data",
additional_columns="host.name",
single_file=True,
output_file="/home/sidhu/output.json",
master="spark://HOST:PORT" # mesos://HOST:PORT for yarn/mesos cluster
)
# Extract to a file using an existing spark session
extract(
"/media/sidhu/Seagate/empire_apt3_2019-05-14223117.json",
[1, 2],
log_column="log_name",
event_column="event_data",
additional_columns="host.name",
single_file=True,
output_file="/home/sidhu/output.json",
spark_sess=spark, # spark session variable, usually named spark
)
# Extract to a Spark DataFrame
# NOTE: Must provide an existing Spark Session
extract(
"/media/sidhu/Seagate/empire_apt3_2019-05-14223117.json",
[1, 2],
log_column="log_name",
event_column="event_data",
additional_columns="host.name",
single_file=True,
spark_sess=spark, # spark session variable, usually named spark
as_spark_frame=True
)
# Extract to a Pandas DataFrame
df = extract(
"/media/sidhu/Seagate/empire_apt3_2019-05-14223117.json",
[1, 2],
log_column="log_name",
event_column="event_data",
additional_columns="host.name",
single_file=True,
as_pandas_frame=True
)
# Extract using SparkDf as input
# NOTE: Must provide an existing Spark Session
df = extract(
spark_df,
[1, 2],
log_column="log_name",
event_column="event_data",
additional_columns="host.name",
single_file=True,
as_pandas_frame=True
)
# Extract using PandasDf as input
# NOTE: To use a Pandas DataFrame as input and a Spark DataFrame as output, a Spark Session must be provided.
df = extract(
pandas_df,
[1, 2],
log_column="log_name",
event_column="event_data",
additional_columns="host.name",
single_file=True,
as_pandas_frame=True
)
Installation
pip install sysxtract
Since this library leverages Spark, specifically PySpark, you need to install it manually. This allows for version compatability when connecting to existing clusters.
pip install pyspark==$VERSION
.
If you're going to use spark locally:
pip install pyspark
Feedback
I appreciate any feedback so if you have any feature requests or issues make an issue with the appropriate tag or futhermore, send me an email at sidhuashton@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file sysxtract-1.0.0.tar.gz
.
File metadata
- Download URL: sysxtract-1.0.0.tar.gz
- Upload date:
- Size: 9.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c4ee9baae292e66075a34abc6ccd34f6a7c24f50c01d1014297c51b9cb20153a |
|
MD5 | ddfa84484d970ee0e2d098ea78ac613a |
|
BLAKE2b-256 | fcbb63054759cf82d0fd044ffff133e58aa5a0769ffed2c695f75a7d0672d755 |
File details
Details for the file sysxtract-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: sysxtract-1.0.0-py3-none-any.whl
- Upload date:
- Size: 9.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66a3e33ee300172ad7a7e55b2715c6dd74f5aebc282c8cc397180de127f28153 |
|
MD5 | 839fb8ac00be9773aef55d83bd512c0d |
|
BLAKE2b-256 | 9f109765067b0cc0a24407ddf3614e976ac8d573c7dfadb3c54b9c2c645288b5 |