Extract logs based off events from sysmon. Comes as a package, cli and ui.
Project description
Sysmon Extract
Sysmon Extract is a library to extract events from the sysmon log type based off the event id. They can be extracted as a file (any big data format) with support for HDFS or in memory as a Spark or Pandas DataFrame. As a note, this library works best with Spark as it leverages it for the ETL process.
The library comes with a library, cli and UI.
Table of Contents
Usage
Command Line
Usage: sysxtract [OPTIONS]
Options:
-i, --input-file PATH
-h, --header
-e, --event TEXT
-lc, --log-column TEXT
-ec, --event-column TEXT [default: ]
-a, --additional-columns TEXT
-o, --output-file TEXT [default: /home/sidhu/sysmon-extract/sysmon-output.csv]
-s, --single-file
-m, --master TEXT [default: local]
-ui, --start-ui
--help Show this message and exit.
sysxtract -i /media/sidhu/Seagate/empire_apt3_2019-05-14223117.json -e 1 -e 2 -lc log_name -ec event_data -s -a host.name -o /home/sidhu/output.json
Let's break it down.
Input file: -i /media/sidhu/Seagate/empire_apt3_2019-05-14223117.json
Sysmon Events to extract: -e 1 -e 2
Column in the dataset that describes the log source (Sysmon, Microsoft Security, Microsoft Audit, etc.): -lc log_name
Column in the dataset that contains the nested sysmon data (often event_data): -ec event_data
Output as a single file: -s
Additional columns to extract: -a host.name
Output file name: /home/sidhu/output.json
UI
sysextract -ui
Package
Using the example above:
from sysxtract import extract
# Extract to a file
extract(
"/media/sidhu/Seagate/empire_apt3_2019-05-14223117.json",
[1, 2],
log_column="log_name",
event_column="event_data",
additional_columns="host.name",
single_file=True,
output_file="/home/sidhu/output.json"
)
# Extract to a file using an existing Spark cluster
extract(
"/media/sidhu/Seagate/empire_apt3_2019-05-14223117.json",
[1, 2],
log_column="log_name",
event_column="event_data",
additional_columns="host.name",
single_file=True,
output_file="/home/sidhu/output.json",
master="spark://HOST:PORT" # mesos://HOST:PORT for yarn/mesos cluster
)
# Extract to a file using an existing spark session
extract(
"/media/sidhu/Seagate/empire_apt3_2019-05-14223117.json",
[1, 2],
log_column="log_name",
event_column="event_data",
additional_columns="host.name",
single_file=True,
output_file="/home/sidhu/output.json",
spark_sess=spark, # spark session variable, usually named spark
)
# Extract to a Spark DataFrame
# NOTE: Must provide an existing Spark Session
extract(
"/media/sidhu/Seagate/empire_apt3_2019-05-14223117.json",
[1, 2],
log_column="log_name",
event_column="event_data",
additional_columns="host.name",
single_file=True,
spark_sess=spark, # spark session variable, usually named spark
as_spark_frame=True
)
# Extract to a Pandas DataFrame
df = extract(
"/media/sidhu/Seagate/empire_apt3_2019-05-14223117.json",
[1, 2],
log_column="log_name",
event_column="event_data",
additional_columns="host.name",
single_file=True,
as_pandas_frame=True
)
# Extract using SparkDf as input
# NOTE: Must provide an existing Spark Session
df = extract(
spark_df,
[1, 2],
log_column="log_name",
event_column="event_data",
additional_columns="host.name",
single_file=True,
as_pandas_frame=True
)
# Extract using PandasDf as input
# NOTE: To use a Pandas DataFrame as input and a Spark DataFrame as output, a Spark Session must be provided.
df = extract(
pandas_df,
[1, 2],
log_column="log_name",
event_column="event_data",
additional_columns="host.name",
single_file=True,
as_pandas_frame=True
)
Installation
pip install sysxtract
Since this library leverages Spark, specifically PySpark, you need to install it manually. This allows for version compatability when connecting to existing clusters.
pip install pyspark==$VERSION
.
If you're going to use spark locally:
pip install pyspark
Feedback
I appreciate any feedback so if you have any feature requests or issues make an issue with the appropriate tag or futhermore, send me an email at sidhuashton@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sysxtract-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66a3e33ee300172ad7a7e55b2715c6dd74f5aebc282c8cc397180de127f28153 |
|
MD5 | 839fb8ac00be9773aef55d83bd512c0d |
|
BLAKE2b-256 | 9f109765067b0cc0a24407ddf3614e976ac8d573c7dfadb3c54b9c2c645288b5 |