Skip to main content

Extract potentially unique strings from RTF files for threat hunting

Project description

Introduction

This tool is designed to make it easy to signature potentially unique parts of RTF files.

It was written by David Cannings (@edeca) and released by PwC UK under the Apache 2.0 license.

To install, you'll need Python 3 and some basic libraries. These are handled automatically if you install using pip:

$ pip install rtfsig

Then run like:

$ rtfsig -f badfile.rtf -y output.yar

This will scan the file for potentially unique RTF tags, print details to screen and save a Yara rule to output.yar.

Please raise bugs as Github issues, and note this tool is in beta.

Output

Console

Basic output is shown on the console, which can be used to search VirusTotal (try a search like content:rsid7043998).

-> % rtfsig -f 0b06052d3b5954594cf0e28bd9c50d9110eb8fb78cb78c9a99686eb4ba3391df.hostile
INFO:root:Starting to parse file 0b06052d3b5954594cf0e28bd9c50d9110eb8fb78cb78c9a99686eb4ba3391df.hostile
INFO:root:Non-standard RTF magic marker, should be {\rtf1, often a sign of malicious docs
INFO:root:Found an RSID table in this document
INFO:root:Found 1 embedded image(s) with set height/width
INFO:root:Found 2 document information group tags
INFO:root:Interesting strings (higher chance of FP): \rsid7043998, \rsid7476075, insrsid7043998, \rsid10243744, \rsid7604251, insrsid10243744, {\author blue}, rsidroot10243744, \rsid9200135, tblrsid10243744, charrsid10243744, \picw1\pich1\picwgoal1\pichgoal1 , pararsid10243744, \rsid7238080, insrsid7476075, \rsid11666446, insrsid12343406, \rsid12343406, {\operator blue}
INFO:root:Found some unique strings!  Consider using vtgrep or deploying Yara rules

Debug output can be generated using -v which is helpful if you are reporting a bug.

Yara rules

The tool will automatically generate Yara rules if the -y option is passed. Two Yara rules are created, one which should generate low false positives (strict_rule) and one which may have a higher false positive rate (loose_rule).

It is recommended to review strings carefully and to change any of them to a sensible number, for example 3 of them.

An example rule generated from 0b06052d3b5954594cf0e28bd9c50d9110eb8fb78cb78c9a99686eb4ba3391df looks like:

rule loose_rule {
  meta:
    description = "RTF file matching known unique identifiers (higher chance of FP, adjust 'any of them' if required)"
    generated_by = "rtfsig version 0.0.2"

  strings:
    $ = "{\\author blue}" ascii
    $ = "\\rsid7238080" ascii
    $ = "pararsid10243744" ascii
    $ = "insrsid7043998" ascii
    $ = "\\rsid7043998" ascii
    $ = "rsidroot10243744" ascii
    $ = "\\rsid9200135" ascii
    $ = "\\rsid7604251" ascii
    $ = "insrsid7476075" ascii
    $ = "\\rsid10243744" ascii
    $ = "insrsid12343406" ascii
    $ = "{\\operator blue}" ascii
    $ = "insrsid10243744" ascii
    $ = "charrsid10243744" ascii
    $ = "\\rsid11666446" ascii
    $ = "\\rsid12343406" ascii
    $ = "\\picw1\\pich1\\picwgoal1\\pichgoal1 " ascii
    $ = "tblrsid10243744" ascii
    $ = "\\rsid7476075" ascii

  condition:
    uint32be(0) == 0x7b5c7274 and any of them
}

rule strict_rule {
  meta:
    description = "RTF file matching known unique identifiers (lower chance of FP)"
    generated_by = "rtfsig version 0.0.2"

  strings:
    $ = "\\rsid7043998\\rsid7238080\\rsid7476075\\rsid7604251\\rsid9200135\\rsid10243744\\rsid11666446\\rsid12343406" ascii

  condition:
    uint32be(0) == 0x7b5c7274 and any of them
}

Known limitations

  • At present, documents containing lots of obfuscation (e.g. comments between control words and their values) may not be parsed correctly. Please raise an issue with sample files for further inspection.

Contributing

To setup a development environment, clone the git repository and run the following inside a virtualenv:

$ pip install -e ".[dev]"

Before submitting a pull request, please check all tests pass and there is 100% coverage of the core module.

This is as simple as running tox and checking the output:

$ tox
.. tool output ..

py37: commands succeeded
congratulations :)

Packaging:

$ python setup.py sdist bdist_wheel 

Check and upload to PyPI, signing with GPG:

$ twine check dist/*
$ twine upload dist/* --sign --identity FCEC8AAA140C74C826592AC357974C5B48A00D9B

Version history

  • v0.0.1 (18th October 2019) - Initial version, supports RSID control words and generating Yara rules
  • v0.0.2 (23rd October 2019) - Second beta, added support for unique image identifiers and document information
  • v0.0.3 (23rd October 2019) - Third beta, added support for picture sizes
  • v0.1.0 (19th September 2020) - First public release, packaged as a Python module for PyPI
  • v0.1.1 (26th January 2024) - Bumped Jinja2 dependency to a current version
  • v0.1.2 (7th January 2025) - Bumped Jinja2 dependency to a current version
  • v0.1.3 (7th January 2025) - Tests fixed and integrated with GitHub actions

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rtfsig-0.1.3.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rtfsig-0.1.3-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file rtfsig-0.1.3.tar.gz.

File metadata

  • Download URL: rtfsig-0.1.3.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for rtfsig-0.1.3.tar.gz
Algorithm Hash digest
SHA256 f5039fb00725dcd5d8dcb26ac89718a6b2a7b5ab13f3a496abb0cb903c641d7b
MD5 b418f316e988b4a1cdf835e5ced3cd8b
BLAKE2b-256 d010f193c4e2f94978f1c5891a25337ff01348cade01253ed6e00d92da1ceafa

See more details on using hashes here.

File details

Details for the file rtfsig-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: rtfsig-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for rtfsig-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3393b1bf5a829dd228fecba1211aa0f8d51a2d350b27d29a480c67c62fbbc509
MD5 9e27238fac71a54e1d9510833cb13ecb
BLAKE2b-256 cab8fa0561feacf873ed921f3333f5e7f70649cadb1bc74ebf0435206b9386b7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page