Skip to main content

Recursive metadata extraction tool

Project description

Ruminant is a recursive metadata extraction tool.

What does it do?

Ruminant takes a file as an input and spits out a huge json object that contains all the metadata it extracted from the file. This is done recursively, e.g. by running ruminant again on each file inside a zip file.

Why the name?

To quote Wikipedia: Ruminants are herbivorous grazing or browsing artiodactyls [...]. The process of rechewing the cud to further break down plant matter and stimulate digestion is called rumination. The word "ruminant" comes from the Latin ruminare, which means "to chew over again".

This tool behaves similarly as extracted blobs themselves can be "chewed over again" (main entrypoint is literally called chew()) in order to recursively extract metadata.

What can it process?

Ruminant is still in early alpha but it can already process the following file types:

  • ZIP files
  • DOCX files (needs to be updated)
  • PDF files
  • JPEG files
    • EXIF metadata
    • XMP metadata
    • ICC profiles
    • IPTC metadata (I hate you for that one Adobe)
    • Adobe-specific metadata in APP14
  • PNG files
    • EXIF metadata
  • TIFF files
    • EXIF metadata (EXIF metadata is literally stored in a TIFF file)
  • MP4 files
    • XMP metadata
    • AVC1 x264 banners
    • all of the DRM stuff that Netflix puts in their streams
      • CENC
      • PlayReady
      • Widevine
  • ICC profiles
    • EP0763801A2 extension

How do I install it?

Run pip3 install ruminant. Alternatively, you can also run python3 -m build in the source tree, followed by pip3 install dist/*.whl.

How do I use it?

The most basic usage would be ruminant <file> in order to process the file and output all metadata.

Each time a blob is passed to chew(), it gets assigned a new unique ID that is stored in the "blob-id" field in its JSON object. These blobs can be extracted with ruminant <file> --extract <ID> <file name>. The --extract option can also be shortened to -e and can be repeated multiple times.

Not specifying a file means that it reads from -, which is the standard input. You can also explicitly pass - as the file.

The --walk or -w option enables a binwalk-like mode where ruminant tries to parse a file and increments the start offset by one until it can correctly parse something. This is done until the end of the file.

This is a valid complex command: ruminant -e 2 foo.jpeg - --extract 5 bar.bin -e 0 all.zip

(Yes, you could abuse ruminant to copy files by running function cp() { ruminant --extract 0 $2 $1 } in bash and then using the function as cp.)

You can also specify --extract-all in order to extract all blobs to the "blobs" directory.

Ruminant can't parse xyz

Feel free to send me a sample so I can add a parser for it :)

TODO list

  • more file formats
    • MP3
    • WebM
    • WebP
    • Opus
    • Matroska
  • ZIP family detection (e.g. DOCX is also a ZIP file)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ruminant-0.0.6.tar.gz (33.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ruminant-0.0.6-py3-none-any.whl (32.8 kB view details)

Uploaded Python 3

File details

Details for the file ruminant-0.0.6.tar.gz.

File metadata

  • Download URL: ruminant-0.0.6.tar.gz
  • Upload date:
  • Size: 33.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for ruminant-0.0.6.tar.gz
Algorithm Hash digest
SHA256 27e032def1bfab2e5d02ac49ca2ca6cb51cba7b4409ba9f68fd9cbead51eef72
MD5 f385e69f032a4f40cdcb58236d69b42e
BLAKE2b-256 0e9475ee5fde91451f1b6841aade1f5cf4e3707c9339bd86aa1f1defb112bfa7

See more details on using hashes here.

File details

Details for the file ruminant-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: ruminant-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 32.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for ruminant-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 5d9833559618f35f80d04d6a758498112d06afbeee0f338201ec63fa7ad27756
MD5 a9a9e2574450a2b933d6f2f3943e9120
BLAKE2b-256 b704dd7729e67777e8dbdcc379df4396210794c0ae2dd2e512c6ef946e4daf86

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page