Skip to main content

A fast, simple utility to visually stamp media files with their filenames, preparing them for multimodal LLM training and analysis.

Project description

MarkMyMedia

PyPI version License: MIT

A fast, simple utility to visually stamp media files with their filenames, preparing them for multimodal LLM training and analysis.

The Problem: Lost Context in Multimodal Sequences

When you feed a sequence of media files (e.g., portal 2 mod.jpg, intro.mp3, my homework.mp4) to a Large Language Model, the model sees a continuous stream of data. It lacks explicit, built-in separators or context about where one file ends and another begins, or what the original source of a particular frame or soundbite was.

This ambiguity makes it difficult to:

  • Analyze which specific file triggered a response.
  • Train the model on tasks that require knowledge of file boundaries.
  • Debug model behavior on complex, mixed-media inputs.

The Solution: Visibly Embedded Markers

MarkMyMedia solves this by "stamping" each file with its own name, creating an unambiguous visual or auditory marker directly within the data.

  • Images: Get a clean text overlay with the filename.
  • Audio: Are converted into a video with the filename displayed on a black background.
  • Videos: Get a short, 0.5-second marker clip prepended, showing the filename without re-encoding the entire video.

This way, the context is never lost. The model "sees" the filename associated with the content that follows.

Key Features

  • Multimodal Support: Works out-of-the-box for images, audio, and video.
  • Blazing Fast: Uses parallel processing to handle large datasets quickly.
  • Efficient Video Processing: Prepends markers to videos without re-encoding, saving massive amounts of time and preserving original quality.
  • Flexible Usage: Can be used as a simple command-line tool or as a Python library.
  • Recursive Search: Point it at a directory, and it can process all nested media files.
  • Simple & Focused: Does one job and does it well.

How It Looks

MarkMyMedia provides clear, unambiguous markers for each file type.

🖼️ Images

A clean, readable marker with the filename is embedded directly onto the image. This ensures that even in a long sequence, the source of each image is immediately visible.

marked_img

Example: A screenshot of a Discord message marked with its filename.

🎧 Audio

Audio files are converted into a static video format. This clever workaround makes them visually identifiable in multimodal timelines and tools like Google AI Studio, where audio-only files might not provide visual cues. The entire audio track is preserved under a single, persistent frame showing its original filename.

markered_audio

The result is a standard video file, making the audio's presence known visually.

AI Studio

🎬 Video

A short, 0.5-second marker clip is prepended to the video. This process is nearly instant because it avoids re-encoding the entire file, preserving the original quality and saving significant time.

some

The model sees the filename right before the video content begins.

Technical Constraints

  1. This tool relies on FFmpeg for all audio and video operations. You must have ffmpeg and ffprobe installed and available in your system's PATH.
  2. To achieve high speed by avoiding full re-encoding, MarkMyMedia relies on stream copying. This approach is extremely fast but requires input files to meet specific format criteria.
Modality Requirement Reason & Details
Video (mark_video)
  • Video Codec: h264 or hevc
  • Audio Codec: aac (if present)
For preserving quality and speed. Processing other codecs (like VP9 in .webm) will fail, as they cannot be directly concatenated in this workflow.
Audio (mark_audio)
  • Always outputs a .mp4 video file.
  • Audio Format: mp3, flac, aac, m4a, ogg or opus
To create a visual marker. The original audio stream is copied losslessly into the new video container, ensuring no quality is lost.

Installation

Install MarkMyMedia directly from PyPI:

pip install MarkMyMedia

Usage

As a Command-Line Tool (CLI)

The CLI is designed for batch processing entire directories.

Mark all media in the current directory (output to markered_modals/):

markmymedia 

Recursively process a dataset and specify an output folder:

markmymedia ./my_dataset -r -o ./processed_data

See all available options: markmymedia --help

usage: markmymedia [-h] [-r] [-o OUTPUT] [-j JOBS] [-p] [--version] [inputs ...]

Batch mark images, audio, and video with filename overlays.

positional arguments:
  inputs                Files or directories to process. If omitted, current directory is used.

options:
  -h, --help            show this help message and exit
  -r, --recursive       Recursively traverse directories.
  -o, --output OUTPUT   Base output directory (default: markered_modals).
  -j, --jobs JOBS       Number of worker threads to use per modality (default: number of CPUs).
  -p, --preserve-structure
                        Preserve the directory structure of input files in the output directory.
  --version             show program's version number and exit

As a Python Library

You can also use the core functions directly in your Python scripts for more granular control.

from markmymedia import mark_image, mark_audio, mark_video

# Mark a single image
mark_image(
    input_path='data/cat.jpg',
    output_path='processed/cat_marked.jpg'
)

# Create a marked video from an audio file
mark_audio(
    input_path='data/intro.mp3',
    output_path='processed/intro.mp4'
)

# Prepend a marker to a video file
mark_video(
    input_path='data/dog_on_beach.mp4',
    output_path='processed/dog_on_beach.mp4',
    overlay_text="Some cool video!!",
)

Contributing

Contributions are welcome! If you find a bug or have a feature request, please open an issue.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markmymedia-1.1.0.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

markmymedia-1.1.0-py3-none-any.whl (18.0 kB view details)

Uploaded Python 3

File details

Details for the file markmymedia-1.1.0.tar.gz.

File metadata

  • Download URL: markmymedia-1.1.0.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for markmymedia-1.1.0.tar.gz
Algorithm Hash digest
SHA256 9fa40d1c8eca7538e737cb003a793d6e602c71e39b9d898e231285c8a87357bf
MD5 f823666dd1b52e98ca0b7a880f8ba617
BLAKE2b-256 29b78cfdc15cf1c52f89af6324dd4d1b6f81f3e40a9263ca6de68bb8e8248120

See more details on using hashes here.

File details

Details for the file markmymedia-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: markmymedia-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for markmymedia-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8b70452c91f80c4fece6c37cf82ce16320b25b471ed2a7fc45af067e25411aa3
MD5 32487aad92a4414034c376a1de8807f0
BLAKE2b-256 f9ee2d7223bd4cca01ecbff8d1c109a0b5d6f21d406c8cb57d2e524dfa1a5697

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page