Skip to main content

Recursive metadata extraction tool

Project description

Ruminant is a recursive metadata extraction tool.

What does it do?

Ruminant takes a file as an input and spits out a huge json object that contains all the metadata it extracted from the file. This is done recursively, e.g. by running ruminant again on each file inside a zip file.

Why the name?

To quote Wikipedia: Ruminants are herbivorous grazing or browsing artiodactyls [...]. The process of rechewing the cud to further break down plant matter and stimulate digestion is called rumination. The word "ruminant" comes from the Latin ruminare, which means "to chew over again".

This tool behaves similarly as extracted blobs themselves can be "chewed over again" (main entrypoint is literally called chew()) in order to recursively extract metadata.

What can it process?

Ruminant is still in early alpha but it can already process the following file types:

  • ZIP files
  • DOCX files (needs to be updated)
  • PDF files (horribly broken, fuck you Adobe)
  • JPEG files
    • EXIF metadata
    • XMP metadata
    • ICC profiles
    • IPTC metadata (I hate you for that one Adobe)
    • Adobe-specific metadata in APP14
  • PNG files
    • EXIF metadata
  • TIFF files
    • EXIF metadata (EXIF metadata is literally stored in a TIFF file)
  • MP4 files
    • XMP metadata
    • AVC1 x264 banners
    • all of the DRM stuff that Netflix puts in their streams
      • CENC
      • PlayReady
      • Widevine

How do I install ruminant?

Run pip3 install ruminant. Alternatively, you can also run python3 -m build in the source tree, followed by pip3 install dist/*.whl.

Ruminant can't parse xyz

Feel free to send me a sample so I can add a parser for it :)

TODO list

  • more file formats
    • MP3
    • WebM
    • WebP
    • Opus
    • Matroska
  • fix PDF parsing
  • ZIP family detection (e.g. DOCX is also a ZIP file)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ruminant-0.0.2.tar.gz (29.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ruminant-0.0.2-py3-none-any.whl (28.7 kB view details)

Uploaded Python 3

File details

Details for the file ruminant-0.0.2.tar.gz.

File metadata

  • Download URL: ruminant-0.0.2.tar.gz
  • Upload date:
  • Size: 29.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for ruminant-0.0.2.tar.gz
Algorithm Hash digest
SHA256 1e0469a925137ac71d83337d305e51e254d943b076415502c549d04c932b2f80
MD5 811e2dcfece5706a4263a7e833784d29
BLAKE2b-256 ddb7a339ea9ae8dad6948357ef8eccfd35116a724ba7ca9f1a81144e341cd253

See more details on using hashes here.

File details

Details for the file ruminant-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: ruminant-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 28.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for ruminant-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3b4826085d1e0a451ee493ce036cc7f7f3d6c9e9534c812641fa574fbc6bcd76
MD5 0f2bf25c9c2bd8439cc2e9f8dddd7726
BLAKE2b-256 9ad1e7b0ddc21be06e61461cc658be0dd5cef19b5f428ae2edabd2260629051f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page