Skip to main content

Extract logs based off events from sysmon. Comes as a package, cli and ui.

Project description

Sysmon Extract

Sysmon Extract is a library to extract events from the sysmon log type based off the event id. They can be extracted as a file (any big data format) with support for HDFS or in memory as a Spark or Pandas DataFrame. As a note, this library works best with Spark as it leverages it for the ETL process.

The library comes with a library, cli and UI.

Table of Contents

Usage

Command Line

Usage: sysxtract [OPTIONS]

Options:

  -i, --input-file PATH
  -h, --header
  -e, --event TEXT
  -lc, --log-column TEXT
  -ec, --event-column TEXT       [default: ]
  -a, --additional-columns TEXT
  -o, --output-file TEXT         [default: /home/sidhu/sysmon-extract/sysmon-output.csv]
  -s, --single-file
  -m, --master TEXT              [default: local]
  -ui, --start-ui
  --help                         Show this message and exit.

sysxtract -i /media/sidhu/Seagate/empire_apt3_2019-05-14223117.json -e 1 -e 2 -lc log_name -ec event_data -s -a host.name -o /home/sidhu/output.json

Let's break it down.

Input file: -i /media/sidhu/Seagate/empire_apt3_2019-05-14223117.json

Sysmon Events to extract: -e 1 -e 2

Column in the dataset that describes the log source (Sysmon, Microsoft Security, Microsoft Audit, etc.): -lc log_name

Column in the dataset that contains the nested sysmon data (often event_data): -ec event_data

Output as a single file: -s

Additional columns to extract: -a host.name

Output file name: /home/sidhu/output.json

UI

sysextract -ui

Alt Text

Package

Using the example above:

from sysxtract import extract

# Extract to a file
extract(
    "/media/sidhu/Seagate/empire_apt3_2019-05-14223117.json",
    [1, 2],
    log_column="log_name",
    event_column="event_data",
    additional_columns="host.name",
    single_file=True,
    output_file="/home/sidhu/output.json"
)

# Extract to a file using an existing Spark cluster
extract(
    "/media/sidhu/Seagate/empire_apt3_2019-05-14223117.json",
    [1, 2],
    log_column="log_name",
    event_column="event_data",
    additional_columns="host.name",
    single_file=True,
    output_file="/home/sidhu/output.json",
    master="spark://HOST:PORT" # mesos://HOST:PORT for yarn/mesos cluster
)

# Extract to a file using an existing spark session
extract(
    "/media/sidhu/Seagate/empire_apt3_2019-05-14223117.json",
    [1, 2],
    log_column="log_name",
    event_column="event_data",
    additional_columns="host.name",
    single_file=True,
    output_file="/home/sidhu/output.json",
    spark_sess=spark, # spark session variable, usually named spark
)

# Extract to a Spark DataFrame
# NOTE: Must provide an existing Spark Session
extract(
    "/media/sidhu/Seagate/empire_apt3_2019-05-14223117.json",
    [1, 2],
    log_column="log_name",
    event_column="event_data",
    additional_columns="host.name",
    single_file=True,
    spark_sess=spark, # spark session variable, usually named spark
    as_spark_frame=True
)

# Extract to a Pandas DataFrame
df = extract(
    "/media/sidhu/Seagate/empire_apt3_2019-05-14223117.json",
    [1, 2],
    log_column="log_name",
    event_column="event_data",
    additional_columns="host.name",
    single_file=True,
    as_pandas_frame=True
)

# Extract using SparkDf as input
# NOTE: Must provide an existing Spark Session
df = extract(
    spark_df,
    [1, 2],
    log_column="log_name",
    event_column="event_data",
    additional_columns="host.name",
    single_file=True,
    as_pandas_frame=True
)

# Extract using PandasDf as input
# NOTE: To use a Pandas DataFrame as input and a Spark DataFrame as output, a Spark Session must be provided.
df = extract(
    pandas_df,
    [1, 2],
    log_column="log_name",
    event_column="event_data",
    additional_columns="host.name",
    single_file=True,
    as_pandas_frame=True
)

Installation

pip install sysxtract

Since this library leverages Spark, specifically PySpark, you need to install it manually. This allows for version compatability when connecting to existing clusters.

pip install pyspark==$VERSION.

If you're going to use spark locally:

pip install pyspark

Feedback

I appreciate any feedback so if you have any feature requests or issues make an issue with the appropriate tag or futhermore, send me an email at sidhuashton@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sysxtract-1.0.0.tar.gz (9.4 kB view hashes)

Uploaded Source

Built Distribution

sysxtract-1.0.0-py3-none-any.whl (9.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page