Skip to main content

Extractor of various archive formats for Karton framework

Project description

Extractor karton service

Performs extraction of known archive types and e-mail attachments. Produces "raw" artifacts for further classification.

Author: CERT.pl

Maintainers: psrok1, nazywam

Consumes:

{
    "type":  "sample",
    "stage": "recognized",
    "kind":  "archive"
    "payload": {
        "sample": <Resource>,
        "extraction_level": <int, default: 0>,
        "password": <archive password>,
    }
}

Produces:

{
    "type": "sample",
    "kind": "raw",
    "payload": {
        "sample": <Resource>,
        "parent": <Resource>,
        "extraction_level": <int++>
    }
}

Usage

First of all, make sure you have setup the core system: https://github.com/CERT-Polska/karton

In order to unpack all available formats you'll also need a few native dependencies that sflock relies on, the installation method recommended by sflock is:

RUN sed -i 's/ main/ main non-free/' /etc/apt/sources.list \
    && apt-get update && apt-get install -y \
    p7zip-full \
    rar \
    unace \
    cabextract \
    lzip

Then install karton-archive-extractor from PyPi:

$ pip install karton-archive-extractor

$ karton-archive-extractor

Configuration

There are several configuration options you can tweak up to your liking.

[archive-extractor]
# Maximum levels of nested extraction
max_depth = 5
# Maximum unpacked child filesize, larger files are not reported
max_size = 26214400
# Maximum number of children files for further analysis
max_children = 1000

To learn more about configuring your karton services, take a look at karton configuration docs

Running in Docker

Sflock uses ZipJail as a usermode syscall filtering mechanism. As a result, in our experience, container running the karton service has to have the SYS_PTRACE capability in order for the ptrace to execute correctly. Make sure it's enabled if you run into problems extracting certain archive types.

Supported archive/compression formats*

.7z
.ace
.bup
.cab
.daa
.eml
.gz
.gzip
.iso
.lha
.lz
.lzh
.msg
.mso
.pdf
.rar
.tar
.tar.bz2
.tar.gz
.udf
.vhd
.vhdx
.xz
.zip

* Assuming you are running Linux, please see the sflock's readme for more information

PE files debloating

Some malicious PE files contain intentionally added junk to make them too big for processing. Starting from v1.4.0, archive extractor supports optional debloating of these files, using debloat tool made by Squiblydoo.

certpl/karton-archive-extractor Docker image debloats PE files by default. To enable debloating in karton-archive-extractor installed from PyPI, you need to install additional extra dependencies:

pip install karton-archive-extractor[debloat]

Co-financed by the Connecting Europe Facility by of the European Union

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

karton_archive_extractor-1.5.0-py3-none-any.whl (21.1 kB view details)

Uploaded Python 3

File details

Details for the file karton_archive_extractor-1.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for karton_archive_extractor-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 21c516664b84d1a82b38bf798b21389acea722b07daad11096c78af70ab2a56b
MD5 9f014fcd5644786d6b0b7b94d4773568
BLAKE2b-256 a1ff4e7e2e7d08d91a43274ba87b94c901505c930481b99f536797ca3f094798

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page