Skip to main content

pySigma backend for Apache Spark/Databricks

Project description

Tests ![Coverage Badge](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/alexott/GitHub Gist identifier containing coverage badge JSON expected by shields.io./raw/alexott-databricks-sigma-backend.json) Status

Status: experimental, work in progress:

  • Although cidrmatch is generated, you still need to provide corresponding function as UDF (I'll add example later)
  • Requires more testing

pySigma Databricks Backend

This is the Databricks backend for pySigma. It provides the package sigma.backends.databricks with the DatabricksBackend class. Further, it contains the following processing pipelines in sigma.pipelines.databricks:

  • snake_case: convert column names into snake case format

It supports the following output formats:

  • default: plain Databricks/Apache Spark SQL queries
  • dbsql: Databricks SQL queries with rules metadata (title, status) embedded as comment
  • detection_yaml: Yaml markup for my own detection framework

Unbound Keyword Search

The backend supports Sigma rules with unbound keywords (values without field names). These keywords search the raw log line.

Configuration

By default, the backend looks for keywords in a field named raw. You can customize this:

Command Line:

sigma convert -t databricks -O raw_log_field=message rule.yml

Programmatic:

from sigma.backends.databricks import DatabricksBackend

backend = DatabricksBackend(raw_log_field="event_data")

Examples

Simple Keywords (OR logic):

detection:
    keywords:
        - 'EVILSERVICE'
        - 'svchost.exe -n evil'
    condition: keywords

Generates: contains(lower(raw), lower('EVILSERVICE')) OR contains(lower(raw), lower('svchost.exe -n evil'))

Keywords with |all (AND logic):

detection:
    keywords:
        '|all':
            - 'Remove-MailboxExportRequest'
            - ' -Identity '
    condition: keywords

Generates: contains(lower(raw), lower('Remove-MailboxExportRequest')) AND contains(lower(raw), lower(' -Identity '))

Mixed with Field Conditions:

detection:
    selection:
        EventID: 4688
    keywords:
        - 'mimikatz'
    condition: selection and keywords

Generates: EventID = 4688 AND contains(lower(raw), lower('mimikatz'))

Wildcards in Keywords:

detection:
    keywords:
        - '*malware*'      # uses contains()
        - 'cmd.exe*'       # uses startswith()
        - '*.dll'          # uses endswith()
    condition: keywords

Regex Patterns:

detection:
    keywords:
        - '|re': '.*evil(cmd|powershell).*'
    condition: keywords

Generates: raw rlike '.*evil(cmd|powershell).*'

Maintainer

This backend is currently maintained by:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysigma_backend_databricks-0.1.4.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pysigma_backend_databricks-0.1.4-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file pysigma_backend_databricks-0.1.4.tar.gz.

File metadata

  • Download URL: pysigma_backend_databricks-0.1.4.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.14.2 Darwin/23.6.0

File hashes

Hashes for pysigma_backend_databricks-0.1.4.tar.gz
Algorithm Hash digest
SHA256 cde0ae2e68d8f4034f1d8b55a865e7b2bb7a4c5cccf645408fc55f2208597f44
MD5 422ae6898024c0cd519abb353c6d2cc3
BLAKE2b-256 1ab26007a542754a543997d1f9fc8e9e7889f690fc9ee4f3f2c491fd7a9945e2

See more details on using hashes here.

File details

Details for the file pysigma_backend_databricks-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for pysigma_backend_databricks-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a6e39feb7ab5ce8f9e779bf142d3533022410fd10a6bcb5d3149cdb0cd50dc1e
MD5 206501327a7cfc8627332dcbfa254992
BLAKE2b-256 57a283a28167754f6582a43a92b36dd9a4b9a72ed9132d22b8ba4fa96d1eb5f0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page