Recursive metadata extraction tool

These details have not been verified by PyPI

Project links

Project description

Ruminant is a recursive metadata extraction tool.

What does it do?

Ruminant takes a file as an input and spits out a huge json object that contains all the metadata it extracted from the file. This is done recursively, e.g. by running ruminant again on each file inside a zip file.

Why the name?

To quote Wikipedia: Ruminants are herbivorous grazing or browsing artiodactyls [...]. The process of rechewing the cud to further break down plant matter and stimulate digestion is called rumination. The word "ruminant" comes from the Latin ruminare, which means "to chew over again".

This tool behaves similarly as extracted blobs themselves can be "chewed over again" (the main entrypoint is literally called chew()) in order to recursively extract metadata.

What can it process?

Ruminant is still in early alpha but it can already process the following file types:

ZIP files
- APK signatures
PDF files
JPEG files
- EXIF metadata
- XMP metadata
- ICC profiles
- IPTC metadata (I hate you for that one Adobe)
- Adobe-specific metadata in APP14
- MPF APP2 segments
PNG files
- EXIF metadata
TIFF files
- EXIF metadata (EXIF metadata is literally stored in a TIFF file)
- DNG files
ISO files
- MP4 files
- AVIF files
- HEIF/HEIC stuff
- XMP metadata
- AVC1 x264 banners
- all of the DRM stuff that Netflix puts in their streams
  - CENC
  - PlayReady
  - Widevine
- SEFT metadata
ICC profiles
- EP0763801A2 extension
TrueType fonts
RIFF files
- WebP
- WAV
GIF files
EBML files
- Matroska
  - WebM
Ogg files
- Opus metadata
- Theora metadata
- Vorbis metadata
FLAC files
DER data
- X509 certificates
- PEM files
GZIP streams
BZIP2 streams
TAR files
- USTAR to be precise
PGP stuff
ID3v2 tags
MPEG-TS
MakerNotes
- Fuji
- Sony
- Google HDR+
PSD files
KDBX files
JPEG2000 files
C2PA CAI JUMBF metadata
WASM files
Torrent files
Sqlite3 database files
DICOM files
ASF files
- WMA files
- WMV files
age encrypted files
- tlock extensions
LUKS headers
Java class files
ELF files
- .comment sections
- .interp sections
- .note sections
PE files
- Authenticode signatures
- GRUB modules in EFI files
Minecraft NBT files
- region files
SPIR-V binaries
Ar archives
Cpio archives
Zstd files
SSH signatures
Git object files
Intel microcode files
- including public key detection and signature extraction
EXR/OpenEXR files
Android vbmeta partitions
PDP-11 a.out files
OpenTimestamps proof files

How do I install it?

Run pip3 install ruminant. Alternatively, you can also run python3 -m build in the source tree, followed by pip3 install dist/*.whl.

How do I use it?

The most basic usage would be ruminant <file> in order to process the file and output all metadata.

Each time a blob is passed to chew(), it gets assigned a new unique ID that is stored in the "blob-id" field in its JSON object. These blobs can be extracted with ruminant <file> --extract <ID> <file name>. The --extract option can also be shortened to -e and can be repeated multiple times.

Not specifying a file means that it reads from -, which is the standard input. You can also explicitly pass - as the file.

The --walk or -w option enables a binwalk-like mode where ruminant tries to parse a file and increments the start offset by one until it can correctly parse something. This is done until the end of the file.

This is a valid complex command: ruminant -e 2 foo.jpeg - --extract 5 bar.bin -e 0 all.zip

(Yes, you could abuse ruminant to copy files by running function cp() { ruminant --extract 0 $2 $1 } in bash and then using the function as cp.)

You can also specify --extract-all in order to extract all blobs to the "blobs" directory. Specifying a directory as the file makes ruminant walk that directory recursively. Adding --progress shows a progress bar (this requires tqdm). Adding --progress-names adds file names to the progress bar. Specifying --url makes ruminant treat the file name as a URL and makes it try to fetch the file from it. It uses the user agent of a recent Chrome to not be blocked. Adding --strip-url makes ruminant change some parts of known URLs to preserve metadata. It can, for example, detect that a file is being hosted by Wordpress based on the "/wp-content/" start of the path and can then remove the "-x" part of the file name to preserve its original size and avoid reencoding of the file. The user agent can be overridden by setting the RUMINANT_USER_AGENT environment variable with the desired agent.

Ruminant can't parse xyz

Feel free to send me a sample so I can add a parser for it :)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.35

Apr 26, 2026

0.0.34

Apr 25, 2026

0.0.33

Apr 12, 2026

0.0.32

Mar 23, 2026

0.0.31

Feb 20, 2026

This version

0.0.30

Jan 9, 2026

0.0.19

Dec 23, 2025

0.0.18

Nov 6, 2025

0.0.17

Oct 8, 2025

0.0.16

Sep 23, 2025

0.0.15

Sep 15, 2025

0.0.14

Sep 10, 2025

0.0.13

Aug 30, 2025

0.0.12

Aug 17, 2025

0.0.11

Aug 6, 2025

0.0.10

Aug 4, 2025

0.0.9

Aug 3, 2025

0.0.8

Jul 30, 2025

0.0.7

Jul 25, 2025

0.0.6

Jul 25, 2025

0.0.5

Jul 19, 2025

0.0.4

Jul 19, 2025

0.0.3

Jul 18, 2025

0.0.2

Jul 18, 2025

0.0.1

Jul 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ruminant-0.0.30.tar.gz (215.0 kB view details)

Uploaded Jan 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ruminant-0.0.30-py3-none-any.whl (218.4 kB view details)

Uploaded Jan 9, 2026 Python 3

File details

Details for the file ruminant-0.0.30.tar.gz.

File metadata

Download URL: ruminant-0.0.30.tar.gz
Upload date: Jan 9, 2026
Size: 215.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.12

File hashes

Hashes for ruminant-0.0.30.tar.gz
Algorithm	Hash digest
SHA256	`90562e13e6ca2d3ff9cc7c9057ad12af19d3a7eab78aca06349ba9b9b568f17b`
MD5	`55cbb35680eafcce20f70daa97f060ac`
BLAKE2b-256	`6939ed8dd2a9bc076d435f17f358a0482167622d6e58693ba4545eee5e7ee893`

See more details on using hashes here.

File details

Details for the file ruminant-0.0.30-py3-none-any.whl.

File metadata

Download URL: ruminant-0.0.30-py3-none-any.whl
Upload date: Jan 9, 2026
Size: 218.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.12

File hashes

Hashes for ruminant-0.0.30-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3709d3b9aabf574964e095932592562e85fbdf3143fd10865024c4e0683abbcc`
MD5	`2036bc13b0397935cce157a6daf93eee`
BLAKE2b-256	`2539a342d3b7f55fd82dce5a643b10d72272f253e6de6e22ac0333779a33510f`

See more details on using hashes here.

ruminant 0.0.30

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

What does it do?

Why the name?

What can it process?

How do I install it?

How do I use it?

Ruminant can't parse xyz

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes