Skip to main content

Recursive metadata extraction tool

Project description

Ruminant is a recursive metadata extraction tool.

What does it do?

Ruminant takes a file as an input and spits out a huge json object that contains all the metadata it extracted from the file. This is done recursively, e.g. by running ruminant again on each file inside a zip file.

Why the name?

To quote Wikipedia: Ruminants are herbivorous grazing or browsing artiodactyls [...]. The process of rechewing the cud to further break down plant matter and stimulate digestion is called rumination. The word "ruminant" comes from the Latin ruminare, which means "to chew over again".

This tool behaves similarly as extracted blobs themselves can be "chewed over again" (main entrypoint is literally called chew()) in order to recursively extract metadata.

What can it process?

Ruminant is still in early alpha but it can already process the following file types:

  • ZIP files
  • DOCX files (needs to be updated)
  • PDF files (horribly broken, fuck you Adobe)
  • JPEG files
    • EXIF metadata
    • XMP metadata
    • ICC profiles
    • IPTC metadata (I hate you for that one Adobe)
    • Adobe-specific metadata in APP14
  • PNG files
    • EXIF metadata
  • TIFF files
    • EXIF metadata (EXIF metadata is literally stored in a TIFF file)
  • MP4 files
    • XMP metadata
    • AVC1 x264 banners
    • all of the DRM stuff that Netflix puts in their streams
      • CENC
      • PlayReady
      • Widevine

Ruminant can't parse xyz

Feel free to send me a sample so I can add a parser for it :)

TODO list

  • more file formats
    • MP3
    • WebM
    • WebP
    • Opus
    • Matroska
  • fix PDF parsing
  • ZIP family detection (e.g. DOCX is also a ZIP file)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ruminant-0.0.1.tar.gz (29.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ruminant-0.0.1-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file ruminant-0.0.1.tar.gz.

File metadata

  • Download URL: ruminant-0.0.1.tar.gz
  • Upload date:
  • Size: 29.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for ruminant-0.0.1.tar.gz
Algorithm Hash digest
SHA256 e9b4bec570aba1368c7f041d856fef2a59b8acf669436badf62c568eadd61d03
MD5 418cfb2bc503f1bb497040c2f97d14c0
BLAKE2b-256 d01ba60b45aa89a44a11f8a3cee958d6a820ae11f4a8fd0580325ffbc90f3373

See more details on using hashes here.

File details

Details for the file ruminant-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: ruminant-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for ruminant-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 301df8f8ec25d37c2f63b3c6928d8ce18d05dfdd7d79664244fae518e583ad20
MD5 ded7c0f6ead758390b8fa98f70ae195e
BLAKE2b-256 dca0da5d91d88cbb7d7d6d7f7ab1c924626e441cc50ff9dfe2f2f59c634b8114

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page