Skip to main content

Recursive metadata extraction tool

Project description

Ruminant is a recursive metadata extraction and file dissection tool.

What does it do?

Ruminant takes a file as an input and spits out a huge json object that contains all the metadata it extracted from the file. This is done recursively, e.g. by running ruminant again on each file inside a zip file.

Why the name?

To quote Wikipedia: Ruminants are herbivorous grazing or browsing artiodactyls [...]. The process of rechewing the cud to further break down plant matter and stimulate digestion is called rumination. The word "ruminant" comes from the Latin ruminare, which means "to chew over again".

This tool behaves similarly as extracted blobs themselves can be "chewed over again" (the main entrypoint is literally called chew()) in order to recursively extract metadata.

What can it process?

Ruminant is still in early alpha but it can already process the following file types:

  • ZIP files
    • APK signatures
    • Java jmod modules
    • encrypted files
  • PDF files
    • I hate Adobe
  • JPEG files
    • EXIF metadata
    • XMP metadata
    • ICC profiles
    • IPTC metadata
    • Adobe-specific metadata in APP14
    • MPF APP2 segments
  • PNG files
    • EXIF metadata
  • TIFF files
    • EXIF metadata (EXIF metadata is literally stored in a TIFF file)
    • DNG files
  • ISO files
    • MP4 files
    • AVIF files
    • HEIF/HEIC stuff
    • XMP metadata
    • AVC1 x264 banners
    • all of the DRM stuff that Netflix puts in their streams
      • CENC
      • PlayReady
      • Widevine
    • SEFT metadata
  • ICC profiles
    • EP0763801A2 extension
  • TrueType fonts
  • RIFF files
    • WebP
    • WAV
  • GIF files
  • EBML files
    • Matroska
      • WebM
  • Ogg files
    • Opus metadata
    • Theora metadata
    • Vorbis metadata
  • FLAC files
  • DER data
    • X509 certificates
    • PEM files
  • GZIP streams
  • BZIP2 streams
  • TAR files
    • USTAR to be precise
  • PGP stuff
  • ID3v2 tags
  • MPEG-TS
  • MakerNotes
    • Fuji
    • Sony
    • Google HDR+
  • PSD files
  • KDBX files
  • JPEG2000 files
  • C2PA CAI JUMBF metadata
  • WASM files
  • Torrent files
  • Sqlite3 database files
  • DICOM files
  • ASF files
    • WMA files
    • WMV files
  • age encrypted files
    • tlock extensions
  • LUKS headers
  • Java class files
  • ELF files
    • .comment sections
    • .interp sections
    • .note sections
    • some PS3/PS4 SELF stuff
  • PE files
    • Authenticode signatures
    • GRUB modules in EFI files
  • Minecraft NBT files
    • region files
  • SPIR-V binaries
  • Ar archives
  • Cpio archives
  • Zstd files
  • SSH signatures
  • Git object files
  • Intel microcode files
    • including public key detection and signature extraction
  • EXR/OpenEXR files
  • Android vbmeta partitions
  • PDP-11 a.out files
  • OpenTimestamps proof files
  • xz files
  • UF2 files
  • Android adb backup files
  • Java object serialization data
  • Safetensors files
  • Microsoft cabinet files
  • btrfs stream files
  • Duck IVF video files
  • Apple binary plist files
    • the text ones are just already supported XML files
  • GGUF files
  • pcapng files
  • OpenStreetMap protobuf files

How do I install it?

Run pip3 install ruminant. Alternatively, you can also run python3 -m build in the source tree, followed by pip3 install dist/*.whl.

How do I use it?

The most basic usage would be ruminant <file> in order to process the file and output all metadata.

Each time a blob is passed to chew(), it gets assigned a new unique ID that is stored in the "blob-id" field in its JSON object. These blobs can be extracted with ruminant <file> --extract <ID> <file name>. The --extract option can also be shortened to -e and can be repeated multiple times.

Not specifying a file means that it reads from -, which is the standard input. You can also explicitly pass - as the file.

The --walk or -w option enables a binwalk-like mode where ruminant tries to parse a file and increments the start offset by one until it can correctly parse something. This is done until the end of the file.

This is a valid complex command: ruminant -e 2 foo.jpeg - --extract 5 bar.bin -e 0 all.zip

(Yes, you could abuse ruminant to copy files by running function cp() { ruminant --extract 0 $2 $1 } in bash and then using the function as cp.)

You can also specify --extract-all in order to extract all blobs to the "blobs" directory. Specifying a directory as the file makes ruminant walk that directory recursively. Adding --progress shows a progress bar (this requires tqdm). Adding --progress-names adds file names to the progress bar. Specifying --url makes ruminant treat the file name as a URL and makes it try to fetch the file from it. It uses the user agent of a recent Chrome to not be blocked. Adding --strip-url makes ruminant change some parts of known URLs to preserve metadata. It can, for example, detect that a file is being hosted by Wordpress based on the "/wp-content/" start of the path and can then remove the "-x" part of the file name to preserve its original size and avoid reencoding of the file. The user agent can be overridden by setting the RUMINANT_USER_AGENT environment variable with the desired agent.

Ruminant can't parse xyz

Feel free to send me a sample so I can add a parser for it :)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ruminant-0.0.33.tar.gz (259.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ruminant-0.0.33-py3-none-any.whl (264.6 kB view details)

Uploaded Python 3

File details

Details for the file ruminant-0.0.33.tar.gz.

File metadata

  • Download URL: ruminant-0.0.33.tar.gz
  • Upload date:
  • Size: 259.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.12

File hashes

Hashes for ruminant-0.0.33.tar.gz
Algorithm Hash digest
SHA256 f620fc92040e381f566a91c1d4b9a8d75c788a8934c67f97ecd340e19971345a
MD5 0532d14fce2a5c3609952eaf404f5229
BLAKE2b-256 05f033d84119f612d0e0f62cd888b68780e7dc20301cbf68ffcdc9ebb70938e9

See more details on using hashes here.

File details

Details for the file ruminant-0.0.33-py3-none-any.whl.

File metadata

  • Download URL: ruminant-0.0.33-py3-none-any.whl
  • Upload date:
  • Size: 264.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.12

File hashes

Hashes for ruminant-0.0.33-py3-none-any.whl
Algorithm Hash digest
SHA256 ab7f62eca98113a6b089acbb90c744e5c33a120f061f26d54701523d146e1bfa
MD5 d13f53d39180ff509326e7d1110eba7c
BLAKE2b-256 ae897479a82cebf265070ae1f96c062a804f2159dbe26176a7c0444645cf68a6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page