Skip to main content

A codebase integrated Image manipulation detection & localization, Deepfake detection, Document manipulation detection and AIGC detection.

Project description

OSQ

[NeurlPS 2025] ForensicHub: A Unified Benchmark & Codebase for All-Domain Fake Image Detection and Localization

Bo Du†, Xuekang Zhu†, Xiaochen Ma†, Chenfan Qu†, Kaiwen Feng†, Zhe Yang
Chi-Man Pun, Jian Liu*, Jizhe Zhou*


†: joint first author & equal contribution *: corresponding author

Arxiv Documents license

🙋‍♂️Welcome to ForensicHub!

ForensicHub is the go-to benchmark and modular codebase for all-domain fake image detection and localization, covering deepfake detection (Deepfake), image manipulation detection and localization (IMDL), artificial intelligence-generated image detection (AIGC), and document image manipulation localization (Doc). Whether you're benchmarking forensic models or building your own cross-domain pipelines, ForensicHub offers a flexible, configuration-driven architecture to streamline development, comparison, and analysis.

🏆 FIDL Leaderboard 🏆

We make the FIDL leaderboard for unified ranking model's generalization across all domains. See here for more details.

🏆 Rank Model Deepfake 🖼️ IMDL 📝 AIGC 🤖 Doc 📄 Avg ⭐
🥇 1 Effort 0.614 0.587 0.410 0.788 0.600
🥈 2 Segformer-b3 0.629 0.576 0.339 0.724 0.567
🥉 3 Clip-ViT-L/14 0.664 0.543 0.317 0.724 0.562
4 ConvNeXT 0.662 0.573 0.337 0.669 0.560
5 Mesorch 0.541 0.562 0.460 0.591 0.538
6 UnivFD 0.442 0.486 0.463 0.734 0.531
7 IML-ViT 0.581 0.562 0.325 0.626 0.523
...

🚤Update

  • [2025.7.17] Released some missing pretrain weights for DocTamper Detection models, see this issue for details.
  • [2025.7.11] We update to a lazy-load version of MODEL and POSTFUNC. The package will be checked when the model is actually used, which reduces unnecessary package installation.
  • [2025.7.10] Add a script for single image inference, see Code.
  • [2025.7.6] Add a new AIGC model, FatFormer, see Code.
  • [2025.7.1] Add document of Data Preparation & JSON Generation and Running Training & Evaluation in ForensicHub, see Data Preparation and Running Evaluation.
  • [2025.6.22] Add summary of models and evaluators in ForensicHub, see Document.
  • [2025.6.16] Add detailed installation and YAML configuration, see Document.
  • [2025.6.14] Add four new backbones: UNet, ViT, MobileNet, and DenseNet. More backbones are ongoing!

👨‍💻 About

☑️About the Developers:

  • ForensicHub's project leader/supervisor is Associate Professor 🏀Jizhe Zhou (周吉喆), Sichuan University🇨🇳, and Jian Liu (刘健), the Leader of the Computer Vision Algorithm Research Group, Ant Group Company Ltd.
  • ForensicHub's codebase designer and coding leader is Bo Du (杜博), Sichuan University🇨🇳.
  • ForensicHub is jointly sponsored and advised by Prof. Jiancheng LV (吕建成), Sichuan University 🐼, and Prof. Chi-Man PUN (潘治文), University of Macau 🇲🇴, through the Research Center of Machine Learning and Industry Intelligence, China MOE platform.

📦 Resources

You can find the resources of models under IFF-Protocol, including checkpoints (or onedrive), training parameters, and hardware specifications.

Checkpoints for Document Benchmark: https://pan.baidu.com/s/13ViyJebu12I0GN3BucBQrg?pwd=npkx or https://drive.google.com/drive/folders/1RZZxwYIX5e-lHKDw1CD45FwFC0QqJ7im?usp=sharing

Checkpoints for AIGC Benchmark: https://pan.baidu.com/s/11Jr2wjp6lAz9IBNWnbHlVg?pwd=kzhf or https://drive.google.com/drive/folders/1M-qe5xOblVZgKiBQ9j1Q-GQ4ao5VJMHZ?usp=sharing

Pretrained backbone weights for Document models: https://pan.baidu.com/s/1lsArVWzcJiADUcYYeqyClw?pwd=4gf4 or https://drive.google.com/drive/folders/1NiHeRAcG2VkoN-JFgV5O_4YynQFiQWUw?usp=sharing. Place the checkpoint under the corresponding model’s folder.

🕵️‍♂️ Architecture

ForensicHub provides four core modular components:

🗂️ Datasets

Datasets handle the data loading process and are required to return fields that conform to the ForensicHub specification.

🔧 Transforms

Transforms handle the data pre-processing and augmentation for different tasks.

🧠 Models

Models, through alignment with Datasets and unified output, allow for the inclusion of various state-of-the-art image forensic models.

📊 Evaluators

Evaluators cover commonly used image- and pixel-level metrics for different tasks, and are implemented with GPU acceleration to improve evaluation efficiency during training and testing.

📁 Project Structure Overview

ForensicHub/
├── common/                 # Common modules   ├── backbones/          # Backbones and feature extractors   ├── evalaution/         # Image- and pixel-level evaluators   ├── utils/              # Utilities   └── wrapper/            # Wrappers for dataset, model, etc.
├── core/                   # Core module providing abstract base classes
├── statics/                # YAML configuration files for training and testing
├── tasks/                  # Components for different sub-tasks   ├── aigc/              ├── deepfake/                ├── document/               └── imdl/     
└── training_scripts        # Scripts for training and evaluation

📀Installation


We recommend cloning the project locally.

📉Clone

Simply run the following command:

git clone https://github.com/scu-zjz/ForensicHub.git

Also, since ForensicHub is compatible with DeepfakeBench (which hasn’t been uploaded to PyPI), you’ll need to clone our forked version Site locally and install it using: pip install -e ..

🎯Quick Start


The Quick Start example is based on the local clone setup. ForensicHub is a modular and configuration-driven lightweight framework. You only need to use the built-in or custom Dataset, Transform, and Model components, register them, and then launch the pipeline using a YAML configuration file.

Training on the DiffusionForensics dataset using Resnet for AIGC
  1. Dataset Preparation

Download the DiffusionForensics dataset from (https://github.com/ZhendongWang6/DIRE). The experiment only uses the ImageNet portion. Format the data as JSON. ForensicHub does not restrict how the data is loaded—just make sure the Dataset returns fields as defined in \core\base_dataset.py. This means users are free to implement their own loading logic. In this case, we use /tasks/aigc/datasets/label_dataset.py, which expects a JSON with entries like with label of 0 and 1 representing a image of real and generated:

[
  {
    "path": "/mnt/data3/public_datasets/AIGC/DiffusionForensics/images/train/imagenet/real/n03982430/ILSVRC2012_val_00039791.JPEG",
    "label": 0
  },
  {
    "path": "/mnt/data3/public_datasets/AIGC/DiffusionForensics/images/train/imagenet/real/n03982430/ILSVRC2012_val_00022594.JPEG",
    "label": 0
  },
  ...
]
  1. Component Preparation

In this example, the Model is ResNet50, which is already registered in /common/backbones/resnet.py, so no extra code is needed. Transform is also pre-registered and available in /tasks/aigc/transforms/aigc_transforms.py, providing basic augmentations and ImageNet-standard normalization.

  1. YAML Config & Training

ForensicHub supports lightweight configuration via YAML files. In this example, aside from data preparation, no additional code is required. Here is a sample training YAML /statics/aigc/resnet_train.yaml. The four components-Model, Dataset, Transform, Evaluator-are all initiated via init_config

# DDP
gpus: "4,5"
flag: train

# Log
log_dir: "./log/aigc_resnet_df_train"

# Task
if_predict_label: true
if_predict_mask: false

# Model
model:
  name: Resnet50
  # Model specific setting
  init_config:
    pretrained: true
    num_classes: 1

# Train dataset
train_dataset:
  name: AIGCLabelDataset
  dataset_name: DiffusionForensics_train
  init_config:
    image_size: 224
    path: /mnt/data1/public_datasets/AIGC/DiffusionForensics/images/train.json
#  Test dataset (one or many)
test_dataset:
  - name: AIGCLabelDataset
    dataset_name: DiffusionForensics_val
    init_config:
      image_size: 224
      path: /mnt/data1/public_datasets/AIGC/DiffusionForensics/images/val.json

# Transform
transform:
  name: AIGCTransform

# Evaluators
evaluator:
  - name: ImageF1
    init_config:
      threshold: 0.5

# Training related
batch_size: 768
test_batch_size: 128
epochs: 20
accum_iter: 1
record_epoch: 0  # Save the best only after record epoch.

# Test related
no_model_eval: false
test_period: 1

# Logging & TensorBoard
log_per_epoch_count: 20

# DDP & AMP settings
find_unused_parameters: false
use_amp: true

# Optimizer parameters
weight_decay: 0.05
lr: 1e-4
blr: 0.001
min_lr: 1e-5
warmup_epochs: 1

# Device and training control
device: "cuda"
seed: 42
resume: ""
start_epoch: 0
num_workers: 8
pin_mem: true

# Distributed training parameters
world_size: 1
local_rank: -1
dist_on_itp: false
dist_url: "env://"

After creating the YAML file, you can launch training using statics/run.sh after updating file paths. You can also use statics/batch_run.sh for batch experiments, which internally invokes multiple run.sh scripts. Testing works similarly and only requires configuring the same four components.

  1. LLM Config (Optional)
  • Qwen3-VL (transformers>=4.57.0, qwen_vl_utils>=0.0.14)

Citation

@misc{du2025forensichubunifiedbenchmark,
      title={ForensicHub: A Unified Benchmark & Codebase for All-Domain Fake Image Detection and Localization}, 
      author={Bo Du and Xuekang Zhu and Xiaochen Ma and Chenfan Qu and Kaiwen Feng and Zhe Yang and Chi-Man Pun and Jian Liu and Jizhe Zhou},
      year={2025},
      eprint={2505.11003},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.11003}, 
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

forensichub13-0.0.1.tar.gz (240.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

forensichub13-0.0.1-py3-none-any.whl (318.2 kB view details)

Uploaded Python 3

File details

Details for the file forensichub13-0.0.1.tar.gz.

File metadata

  • Download URL: forensichub13-0.0.1.tar.gz
  • Upload date:
  • Size: 240.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for forensichub13-0.0.1.tar.gz
Algorithm Hash digest
SHA256 76dfb77f78001abd21b2ed03203cf45979bdee447aa2e6dc42552276919d4b05
MD5 eb13fd04b0ff70ee41c5daac97bdfbdd
BLAKE2b-256 bbed76b0bf5b8b149188bb60a8c371d507734d0243bb67e6e478ef5506143882

See more details on using hashes here.

Provenance

The following attestation bundles were made for forensichub13-0.0.1.tar.gz:

Publisher: publish.yml on Kyliroco/ForensicHub

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file forensichub13-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: forensichub13-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 318.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for forensichub13-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 acabc3095b295aa3a80da49549a9fb2d5a3706f90af7f4aad73979b4329c9356
MD5 a1d3b84c81713c15a9be28b7e33a3782
BLAKE2b-256 eae151f60a4e57da61294b315d10a9d346aca5e81fd1d890c642ce3231057c57

See more details on using hashes here.

Provenance

The following attestation bundles were made for forensichub13-0.0.1-py3-none-any.whl:

Publisher: publish.yml on Kyliroco/ForensicHub

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page