Skip to main content

AI Metadata Toolkit for cloning, checking, cleaning, and watermark removal

Project description

 ███╗   ██╗ ██████╗  █████╗ ██╗
 ████╗  ██║██╔═══██╗██╔══██╗██║
 ██╔██╗ ██║██║   ██║███████║██║
 ██║╚██╗██║██║   ██║██╔══██║██║
 ██║ ╚████║╚██████╔╝██║  ██║██║
 ╚═╝  ╚═══╝ ╚═════╝ ╚═╝  ╚═╝╚═╝
    ─── noai-watermark ───

Remove invisible watermarks and manage AI image metadata with a practical command-line and Python toolkit.

This project's flagship feature is diffusion-based invisible watermark removal — a research-grade approach that uses controllable image regeneration to reduce or eliminate embedded watermark artifacts from images. Built on the controllable regeneration methodology from Liu et al. (arXiv:2410.05470, CtrlRegen), it lets you run advanced watermark cleanup workflows directly on your personal computer.

In addition to watermark removal, the toolkit also provides comprehensive metadata management features: inspect, clone, verify, and clean metadata across PNG and JPG/JPEG files, with special focus on AI-generated content metadata (Stable Diffusion, ComfyUI, C2PA provenance data).


Table of Contents


Quick Start: Remove Invisible Watermarks

The fastest way to see what this toolkit can do:

# Install with watermark removal support
pip install -e ".[watermark]"

# Remove invisible watermarks (recommended for personal computers: use CPU)
noai-watermark source.png --remove-watermark --device cpu -o cleaned.png

Recommended defaults:

  • --strength 0.5 — balanced removal
  • --steps 50 — good quality/speed trade-off
  • --model Lykon/dreamshaper-8 — default diffusion model
  • --device cpu — most stable for personal computers

Ethics and Responsible Use

The watermark removal pipeline is intended for research, interoperability testing, and defensive analysis — for example, evaluating watermark robustness, checking for false positives, and validating provenance workflows.

Do not use this feature to:

  • Misrepresent authorship or ownership
  • Bypass platform policies or content identification systems
  • Remove provenance or watermark signals from content you do not have rights to modify

Always comply with local laws, licensing terms, and platform rules. Keep original files and audit logs when running cleanup experiments.

Research foundation:
Watermark removal is based on the controllable regeneration approach described in:
Liu et al., Image Watermarks are Removable Using Controllable Regeneration from Clean Noise (arXiv:2410.05470).
Reference implementation: CtrlRegen GitHub Repository.


What This Project Does

Primary Feature: Invisible Watermark Removal

  • Remove invisible watermark artifacts: Run a diffusion-based regeneration workflow to reduce or eliminate embedded watermark traces in image content

Metadata Management Features

  • Clone full metadata between images: Transfer EXIF, PNG text chunks, and other available metadata from source to target
  • Clone only AI-related metadata: Copy AI-generation metadata (prompt/model/seed/workflow) without other metadata
  • Check for AI metadata: Quickly verify if a file appears to be AI-generated based on known metadata signals
  • Remove AI metadata: Clean AI-related metadata fields while optionally preserving standard metadata (Author/Title)
  • Detect and extract C2PA provenance metadata: Read C2PA-related metadata and inspect provenance information

Requirements

  • Python >=3.10
  • pip and venv (recommended)
  • Core dependencies:
    • pillow>=10.0.0
    • piexif>=1.1.3
  • Optional watermark-removal dependencies:
    • torch>=2.0.0
    • diffusers>=0.25.0
    • transformers>=4.35.0
    • accelerate>=0.25.0

Hardware notes for watermark removal:

  • cpu: works on most personal computers (slow but stable)
  • mps/cuda: faster when available
  • On Mac, if MPS memory errors continue, prefer --device cpu

Installation

Install without cloning (recommended for users)

# Basic install from PyPI (metadata features)
pip install noai-watermark

# With watermark removal extras
pip install "noai-watermark[watermark]"

If you want to install directly from GitHub without cloning:

# Exact first release tag
pip install "git+https://github.com/mertizci/noai-watermark.git@v0.1.0"

# With watermark extras from GitHub
pip install "noai-watermark[watermark] @ git+https://github.com/mertizci/noai-watermark.git@v0.1.0"

Local development install (requires cloning)

# Basic (metadata features only)
pip install -e .

# With watermark removal support
pip install -e ".[watermark]"

# With dev tools (pytest + coverage)
pip install -e ".[dev]"

# Everything
pip install -e ".[dev,watermark]"

Supported Formats

  • PNG (.png)
  • JPEG (.jpg, .jpeg)

Watermark batch API (remove_watermark_batch) can also process .webp files by default.


CLI Usage

All examples below use noai-watermark (recommended).
You can run the same commands with photo-metadata as an alias.

1) Remove Invisible Watermarks

For most personal computers (especially MacBooks), it is recommended to run on CPU for stability.

Recommended command:

noai-watermark source.png --remove-watermark --device cpu -o cleaned.png

Default parameters:

  • --strength 0.5
  • --steps 50
  • --model Lykon/dreamshaper-8
  • --device auto

If you get MPS backend out of memory on Mac:

  • Lower values: --strength 0.35 --steps 25
  • Or force CPU: --device cpu
# Basic watermark removal
noai-watermark source.png --remove-watermark -o cleaned.png

# Default-equivalent explicit command (strength=0.5, steps=50)
noai-watermark source.png --remove-watermark --strength 0.5 --steps 50 -o cleaned.png

# Stronger removal
noai-watermark source.png --remove-watermark --strength 0.7 -o cleaned.png

# More denoising steps (better quality)
noai-watermark source.png --remove-watermark --steps 100 -o cleaned.png

# Custom diffusion model
noai-watermark source.png --remove-watermark --model "Lykon/dreamshaper-8" -o cleaned.png

# Force CPU (slow but stable if MPS memory errors persist)
noai-watermark source.png --remove-watermark --device cpu -o cleaned.png

# Mac MPS OOM fallback example
noai-watermark source.png --remove-watermark --strength 0.35 --steps 25 -o cleaned.png

# With HuggingFace token (faster downloads, higher rate limits)
noai-watermark source.png --remove-watermark --hf-token "hf_xxxxx" -o cleaned.png

# Or set the token as an environment variable (recommended)
export HF_TOKEN="hf_xxxxx"
noai-watermark source.png --remove-watermark -o cleaned.png

2) Clone Metadata

# Clone all metadata
noai-watermark source.png target.png -o output.png

# Clone only AI metadata
noai-watermark source.png target.png -o output.png --ai-only

# Cross-format cloning
noai-watermark source.png target.jpg -o output.jpg
noai-watermark source.jpg target.png -o output.png

3) Check AI Metadata

noai-watermark source.png --check-ai

Example output:

'source.png' contains AI-generated image metadata:
AI Image Metadata:
----------------------------------------
C2PA Metadata:
  has_c2pa: True
  type: C2PA (Coalition for Content Provenance and Authenticity)
  issuer: Google LLC
  claim_generator: x"Goog
  actions: created, converted, edited
  timestamp: 20260221175036Z
  timestamps: ['20260221175036Z', '20260221161100Z', '20260221161151Z']
  source_type: trainedAlgorithmicMedia (AI-generated)

4) Remove AI Metadata

# In-place cleanup
noai-watermark source.png --remove-ai

# Save to new file
noai-watermark source.png --remove-ai -o cleaned.png

# Also remove standard metadata (Author, Title, etc.)
noai-watermark source.png --remove-ai --remove-all-metadata

Verbose Mode

noai-watermark source.png target.png -o output.png -v
noai-watermark source.png --remove-ai -v
noai-watermark source.png --remove-watermark -v

Note: Without -v, watermark removal suppresses noisy framework logs and shows a simple Working on it... progress animation. With -v, detailed diagnostic logs are shown.


Python API

Watermark Removal

Use is_watermark_removal_available() to guard optional dependencies before calling watermark APIs.

from pathlib import Path
from watermark_remover import (
    WatermarkRemover,
    remove_watermark,
    get_recommended_strength,
    is_watermark_removal_available,
)

if is_watermark_removal_available():
    # 1) Convenience function (quick usage)
    remove_watermark(
        image_path=Path("watermarked.png"),
        output_path=Path("cleaned.png"),
        strength=0.5,
    )

    # 2) Convenience function with custom model + device
    remove_watermark(
        image_path=Path("watermarked.png"),
        output_path=Path("cleaned_custom_model.png"),
        strength=0.5,
        model_id="Lykon/dreamshaper-8",
        device="cpu",  # "cpu", "mps", "cuda", or None for auto
        hf_token="hf_xxxxx",  # optional, falls back to HF_TOKEN env var
    )

    # 3) Advanced usage with persistent remover (recommended for repeated runs)
    remover = WatermarkRemover(
        model_id="SG161222/Realistic_Vision_V5.1_noVAE",
        device="cpu",
        hf_token="hf_xxxxx",  # optional, falls back to HF_TOKEN env var
    )

    # More control: steps, guidance, and reproducible seed
    remover.remove_watermark(
        image_path=Path("watermarked.png"),
        output_path=Path("cleaned_advanced.png"),
        strength=WatermarkRemover.MEDIUM_STRENGTH,
        num_inference_steps=50,
        guidance_scale=7.5,
        seed=42,
    )

    # 4) Batch mode with one loaded model instance
    remover.remove_watermark_batch(
        input_dir=Path("input_images"),
        output_dir=Path("cleaned_images"),
        strength=get_recommended_strength("treering"),
        num_inference_steps=60,
    )

Common watermark API patterns:

  • Quick single-image cleanup: remove_watermark(...)
  • Custom model per call: pass model_id="org/model-id" to remove_watermark(...)
  • High-throughput / multiple images: create one WatermarkRemover(...) instance and reuse it
  • Reproducible experiments: use class method with seed=...
  • Faster vs better quality:
    • lower num_inference_steps for speed
    • higher num_inference_steps for stability/detail

Popular custom model examples:

  • Lykon/dreamshaper-8 (default/balanced)
  • runwayml/stable-diffusion-v1-5 (classic baseline)
  • SG161222/Realistic_Vision_V5.1_noVAE (photorealistic style)
  • segmind/tiny-sd (lighter memory usage)

Metadata Operations

from pathlib import Path
from metadata_handler import (
    clone_metadata,
    clone_ai_metadata,
    extract_metadata,
    extract_ai_metadata,
    has_ai_metadata,
    has_ai_content,
    remove_ai_metadata,
    has_c2pa_metadata,
    extract_c2pa_info,
)

# 1) Clone metadata
clone_metadata(Path("source.png"), Path("target.png"), Path("output.png"))
clone_ai_metadata(Path("source.png"), Path("target.png"), Path("output.png"))

# 2) Inspect metadata
all_meta = extract_metadata(Path("image.png"))
ai_meta = extract_ai_metadata(Path("image.png"))
print(has_ai_metadata(Path("image.png")))

# 3) C2PA provenance checks
if has_c2pa_metadata(Path("image.png")):
    print(extract_c2pa_info(Path("image.png")))

# 4) Remove AI metadata
remove_ai_metadata(Path("image.png"), Path("cleaned.png"))  # keeps standard metadata by default
remove_ai_metadata(Path("image.png"), Path("cleaned_all.png"), keep_standard=False)

Metadata API tips:

  • Use extract_metadata(...) when you need full metadata dictionaries for debugging/audits
  • Use extract_ai_metadata(...) for AI-specific fields only
  • Keep keep_standard=True when you want to preserve fields like Author/Title
  • Set keep_standard=False for stronger metadata sanitization workflows

Verify Watermark Removal (End-to-End Test)

You can verify that watermark removal actually works by running a simple end-to-end test using Google Gemini and SynthID.

Step 1: Generate a watermarked image with Gemini

  1. Open Google AI Studio or Gemini
  2. Use a model with image generation (e.g. Gemini with Imagen)
  3. Generate an image with a simple prompt:
A yellow banana on a white background
  1. Download the generated image (e.g. banana.png)

Google automatically embeds an invisible SynthID watermark into all Gemini-generated images.

Step 2: Remove the watermark

noai-watermark banana.png --remove-watermark --device cpu -o banana_cleaned.png

You can also try different strength levels:

# Light cleanup
noai-watermark banana.png --remove-watermark --strength 0.35 --steps 40 --device cpu -o banana_light.png

# Stronger cleanup
noai-watermark banana.png --remove-watermark --strength 0.6 --steps 50 --device cpu -o banana_strong.png

Step 3: Verify the result with SynthID detection

  1. Go back to Google AI Studio
  2. Upload the original banana.png and run the @synthid command
  3. Gemini should report that SynthID watermark is detected
  4. Now upload the cleaned banana_cleaned.png and run @synthid again
  5. If watermark removal was successful, Gemini should report that no SynthID watermark is detected (or detection confidence drops significantly)

Expected Results

Image SynthID Detection
banana.png (original from Gemini) Watermark detected
banana_cleaned.png (after removal) No watermark detected (or low confidence)

Notes

  • SynthID is Google's invisible watermarking system embedded in images generated by Imagen/Gemini
  • This test demonstrates the toolkit's ability to reduce pixel-level invisible watermark traces
  • Results may vary depending on strength, steps, and model choice
  • This verification method only works for SynthID; other watermark systems may require different detection tools
  • Always use this feature responsibly and in accordance with applicable laws and platform policies

Watermark Removal: Deep Dive

This section provides detailed guidance for getting the best results from the watermark removal feature.

How the Regeneration Approach Works

The toolkit uses diffusion-based image regeneration (img2img style) to remove invisible watermarks:

  1. Encode: The input image is encoded into diffusion latent space
  2. Noise Injection: Controlled noise is injected (controlled by strength) to weaken hidden watermark signals
  3. Denoise: The diffusion model denoises and reconstructs the image
  4. Decode: Output is decoded back to pixel space with reduced watermark artifacts

This method targets invisible/embedded watermark traces, not visible logos or text overlays.

How AI Systems Add Invisible Watermarks

At a high level, many AI image systems use one or both of these mechanisms:

  1. Metadata-level signals
    • Provenance/attribution data stored as EXIF/PNG text/C2PA fields
    • Easy to inspect and remove with metadata workflows
  2. Pixel/latent-level invisible signals
    • Weak patterns embedded in image content or latent-space generation process
    • Usually not visible to the human eye, but detectable by dedicated decoders

How This Toolkit Reduces/Bypasses Those Traces

This toolkit addresses both layers in a controlled way:

  1. Metadata cleanup path (--remove-ai)
    • Removes AI-related metadata keys and C2PA-related fields where applicable
  2. Regeneration cleanup path (--remove-watermark)
    • Re-encodes the image into diffusion latent space
    • Applies controlled noising (strength) and denoising (steps)
    • Reconstructs an image that is visually similar while reducing embedded watermark traces

In short: metadata traces are removed structurally, and pixel/latent traces are reduced through controlled regeneration.

Understanding Core Parameters

strength — How strongly the image is regenerated

  • Controls how much noise is injected before denoising
  • Lower values: Keep original image structure more strictly
  • Higher values: Allow stronger cleanup, but increase visual drift risk

steps — How many denoising iterations are used

  • Controls reconstruction quality and stability
  • Fewer steps: Faster but can produce rough results
  • More steps: Slower but usually cleaner, more coherent details

How they work together:

  • strength decides how far you move away from the original image
  • steps decides how carefully you come back during denoising
  • High strength + very low steps → often unstable
  • Low strength + very high steps → may preserve too much watermark signal

Parameter Reference

--strength (most important):

  • Default: 0.5
  • 0.2 - 0.35: Preserve image structure, light cleanup
  • 0.4 - 0.6: Balanced, recommended starting range
  • 0.65 - 0.8: Aggressive cleanup, higher chance of visual drift

--steps:

  • Default: 50
  • 30 - 50: Faster
  • 60 - 100: Better reconstruction consistency, slower

--model:

  • Default: Lykon/dreamshaper-8 (good baseline)
  • Domain-specific models may preserve style/content better for some image types

--device:

  • Default: auto (tries best available device)
  • Options: auto, mps, cpu, cuda
  • On Mac: Use --device cpu if MPS OOM continues

--hf-token:

  • Optional HuggingFace API token for authenticated model downloads
  • Enables faster downloads and higher rate limits
  • Falls back to the HF_TOKEN environment variable when not provided
  • Get your token at https://huggingface.co/settings/tokens

Where to Find More Models

Example models:

  • Default / balanced: Lykon/dreamshaper-8
  • Lightweight (lower memory): segmind/tiny-sd, nota-ai/bk-sdm-small
  • Photorealistic style: SG161222/Realistic_Vision_V5.1_noVAE
  • Classic baseline: runwayml/stable-diffusion-v1-5

Practical Presets

Use Case Command
Fast sanity check --strength 0.3 --steps 20
Balanced default --strength 0.5 --steps 50
Aggressive cleanup --strength 0.65 --steps 60
Quality-preserving retry --strength 0.35 --steps 40

Quick Decision Tree

Case 1: Watermark artifacts are still visible

  • Increase strength in small steps (+0.05 or +0.1)
  • Keep/increase steps to at least 40-60
  • Example: 0.5/500.6/60

Case 2: Output changed too much (content drift)

  • Decrease strength first (-0.05 or -0.1)
  • Keep moderate steps (35-50)
  • Example: 0.6/500.45/45

Case 3: Output is noisy/rough

  • Increase steps (+10 to +20)
  • Keep strength the same initially
  • Example: 0.5/200.5/40

Case 4: Runtime is too slow

  • Reduce steps first (5030)
  • If still slow, reduce resolution or use a smaller model
  • Keep strength around 0.35-0.5

Case 5: Mac MPS OOM

  • Use --device cpu
  • Lower to --strength 0.3-0.4 --steps 20-30
  • Retry at smaller resolution if needed

Suggested Workflow

  1. Start with: --strength 0.5 --steps 50
  2. If watermark traces remain: Increase to --strength 0.6 or 0.7
  3. If image changed too much: Reduce to --strength 0.35 - 0.45, optionally increase --steps

Quality and Safety Considerations

Regeneration may slightly alter:

  • Fine textures
  • Small text-like details
  • Micro-contrast

Very high strength can produce semantic drift (small object/face/detail changes). Always keep original files and compare side-by-side before final export.

Performance Expectations

  • First run is slower: Model weights are downloaded and initialized
  • Speed factors:
    • Image resolution
    • GPU class / VRAM
    • Number of steps
  • For batch processing: Run on GPU and keep model loaded using WatermarkRemover class

Troubleshooting

Problem Solution
ImportError for torch/diffusers/transformers/accelerate Install extras: pip install -e ".[watermark]"
HF Hub unauthenticated / rate limit warning Pass --hf-token or set export HF_TOKEN="hf_xxxxx"
Very slow runtime Confirm GPU usage; reduce resolution or steps
MPS backend out of memory on Mac Lower --strength and --steps; use --device cpu
MPS OOM during runtime The toolkit retries once on CPU automatically when MPS OOM is detected
Out of memory Lower resolution; reduce concurrent workloads
Output too different from input Decrease strength; try a model closer to your content style

AI Metadata Types

This project handles metadata from tools such as:

  • Stable Diffusion WebUI (parameters, postprocessing, extras)
  • ComfyUI (workflow)
  • Common AI keys (prompt, seed, model, etc.)
  • C2PA provenance manifests (Google Imagen, OpenAI, Adobe Firefly, Microsoft Designer, etc.)

Testing

pip install -e ".[dev]"
pytest
pytest --cov=src --cov-report=html

Project Structure

src/
  __init__.py            # Package root – re-exports public API
  metadata_handler.py    # Public façade – gathers all symbols
  constants.py           # Shared detection lists and config values
  utils.py               # Format helpers (is_supported_format, etc.)
  c2pa.py                # C2PA JUMBF chunk detection / extraction / injection
  extractor.py           # Read-only metadata extraction
  injector.py            # Write metadata into images (PNG text chunks, EXIF)
  cleaner.py             # AI metadata identification and removal
  cloner.py              # High-level extract → inject pipeline
  watermark_remover.py   # Diffusion-based invisible watermark removal
  cli.py                 # CLI argument parsing and command routing
  progress.py            # Terminal progress animation and library silencing

tests/
  conftest.py            # Shared fixtures
  test_constants.py
  test_utils.py
  test_c2pa.py
  test_extractor.py
  test_injector.py
  test_cleaner.py
  test_cloner.py
  test_metadata_handler.py
  test_watermark_remover.py
  test_progress.py

Important Notes

  • JPEG stores text metadata differently; when needed, text data is packed into EXIF-compatible fields
  • C2PA chunk reinjection is PNG-oriented in this implementation
  • --remove-ai preserves standard metadata by default unless --remove-all-metadata is set

Ethical Use

Use watermark-removal features only on content you own or have permission to modify.

Watermark removal implementation is based on the paper "Image Watermarks Are Removable Using Controllable Regeneration from Clean Noise" (ICLR 2025).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

noai_watermark-0.1.1.tar.gz (37.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

noai_watermark-0.1.1-py3-none-any.whl (33.8 kB view details)

Uploaded Python 3

File details

Details for the file noai_watermark-0.1.1.tar.gz.

File metadata

  • Download URL: noai_watermark-0.1.1.tar.gz
  • Upload date:
  • Size: 37.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for noai_watermark-0.1.1.tar.gz
Algorithm Hash digest
SHA256 94c154881fabcd7b01509961adaeaa327050e911a2d5bb01be8c7f8ff6d8397a
MD5 68a597a89416b8a51d7d89778fdf3a10
BLAKE2b-256 38620d5e367dd528d7470b092f8ad9e0b502dc72759fc5bfc1da4eb77a866d0f

See more details on using hashes here.

Provenance

The following attestation bundles were made for noai_watermark-0.1.1.tar.gz:

Publisher: publish-pypi.yml on mertizci/noai-watermark

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file noai_watermark-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: noai_watermark-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 33.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for noai_watermark-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9758db5117b05a4a4eca7d0ee2c2ef3fc4a479c09cbffb1527c8dd7aba0bbcf3
MD5 3727b64468193dee6045df5d9cc020f5
BLAKE2b-256 b875d56e51f8cc68d61f17a2f9ff34d0bfe4d8495775cd3e85f96b7399fe1846

See more details on using hashes here.

Provenance

The following attestation bundles were made for noai_watermark-0.1.1-py3-none-any.whl:

Publisher: publish-pypi.yml on mertizci/noai-watermark

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page