A fast, simple utility to visually stamp media files with their filenames, preparing them for multimodal LLM training and analysis.
Project description
MarkMyMedia
A fast, simple utility to visually stamp media files with their filenames, preparing them for multimodal LLM training and analysis.
The Problem: Lost Context in Multimodal Sequences
When you feed a sequence of media files (e.g., portal 2 mod.jpg, intro.mp3, my homework.mp4) to a Large Language Model, the model sees a continuous stream of data. It lacks explicit, built-in separators or context about where one file ends and another begins, or what the original source of a particular frame or soundbite was.
This ambiguity makes it difficult to:
- Analyze which specific file triggered a response.
- Train the model on tasks that require knowledge of file boundaries.
- Debug model behavior on complex, mixed-media inputs.
The Solution: Visibly Embedded Markers
MarkMyMedia solves this by "stamping" each file with its own name, creating an unambiguous visual or auditory marker directly within the data.
- Images: Get a clean text overlay with the filename.
- Audio: Are converted into a video with the filename displayed on a black background.
- Videos: Get a short, 0.5-second marker clip prepended, showing the filename without re-encoding the entire video.
This way, the context is never lost. The model "sees" the filename associated with the content that follows.
Key Features
- Multimodal Support: Works out-of-the-box for images, audio, and video.
- Blazing Fast: Uses parallel processing to handle large datasets quickly.
- Efficient Video Processing: Prepends markers to videos without re-encoding, saving massive amounts of time and preserving original quality.
- Flexible Usage: Can be used as a simple command-line tool or as a Python library.
- Recursive Search: Point it at a directory, and it can process all nested media files.
- Simple & Focused: Does one job and does it well.
How It Looks
MarkMyMedia provides clear, unambiguous markers for each file type.
🖼️ Images
A clean, readable marker with the filename is embedded directly onto the image. This ensures that even in a long sequence, the source of each image is immediately visible.
Example: A screenshot of a Discord message marked with its filename.
🎧 Audio
Audio files are converted into a static video format. This clever workaround makes them visually identifiable in multimodal timelines and tools like Google AI Studio, where audio-only files might not provide visual cues. The entire audio track is preserved under a single, persistent frame showing its original filename.
The result is a standard video file, making the audio's presence known visually.
🎬 Video
A short, 0.5-second marker clip is prepended to the video. This process is nearly instant because it avoids re-encoding the entire file, preserving the original quality and saving significant time.
The model sees the filename right before the video content begins.
Technical Constraints
- This tool relies on FFmpeg for all audio and video operations. You must have
ffmpegandffprobeinstalled and available in your system's PATH. - To achieve high speed by avoiding full re-encoding,
MarkMyMediarelies on stream copying. This approach is extremely fast but requires input files to meet specific format criteria.
| Modality | Requirement | Reason & Details |
|---|---|---|
Video (mark_video) |
|
For preserving quality and speed. Processing other codecs (like VP9 in .webm) will fail, as they cannot be directly concatenated in this workflow. |
Audio (mark_audio) |
|
To create a visual marker. The original audio stream is copied losslessly into the new video container, ensuring no quality is lost. |
Installation
Install MarkMyMedia directly from PyPI:
pip install MarkMyMedia
Usage
As a Command-Line Tool (CLI)
The CLI is designed for batch processing entire directories.
Mark all media in the current directory (output to markered_modals/):
markmymedia
Recursively process a dataset and specify an output folder:
markmymedia ./my_dataset -r -o ./processed_data
See all available options:
markmymedia --help
usage: markmymedia [-h] [-r] [-o OUTPUT] [-j JOBS] [-p] [--version] [inputs ...]
Batch mark images, audio, and video with filename overlays.
positional arguments:
inputs Files or directories to process. If omitted, current directory is used.
options:
-h, --help show this help message and exit
-r, --recursive Recursively traverse directories.
-o, --output OUTPUT Base output directory (default: markered_modals).
-j, --jobs JOBS Number of worker threads to use per modality (default: number of CPUs).
-p, --preserve-structure
Preserve the directory structure of input files in the output directory.
--version show program's version number and exit
As a Python Library
You can also use the core functions directly in your Python scripts for more granular control.
from markmymedia import mark_image, mark_audio, mark_video
# Mark a single image
mark_image(
input_path='data/cat.jpg',
output_path='processed/cat_marked.jpg'
)
# Create a marked video from an audio file
mark_audio(
input_path='data/intro.mp3',
output_path='processed/intro.mp4'
)
# Prepend a marker to a video file
mark_video(
input_path='data/dog_on_beach.mp4',
output_path='processed/dog_on_beach.mp4',
overlay_text="Some cool video!!",
)
Contributing
Contributions are welcome! If you find a bug or have a feature request, please open an issue.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file markmymedia-1.1.0.tar.gz.
File metadata
- Download URL: markmymedia-1.1.0.tar.gz
- Upload date:
- Size: 14.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9fa40d1c8eca7538e737cb003a793d6e602c71e39b9d898e231285c8a87357bf
|
|
| MD5 |
f823666dd1b52e98ca0b7a880f8ba617
|
|
| BLAKE2b-256 |
29b78cfdc15cf1c52f89af6324dd4d1b6f81f3e40a9263ca6de68bb8e8248120
|
File details
Details for the file markmymedia-1.1.0-py3-none-any.whl.
File metadata
- Download URL: markmymedia-1.1.0-py3-none-any.whl
- Upload date:
- Size: 18.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b70452c91f80c4fece6c37cf82ce16320b25b471ed2a7fc45af067e25411aa3
|
|
| MD5 |
32487aad92a4414034c376a1de8807f0
|
|
| BLAKE2b-256 |
f9ee2d7223bd4cca01ecbff8d1c109a0b5d6f21d406c8cb57d2e524dfa1a5697
|