Skip to main content

Repository with the code for the multilingual and multimodal benchmark MCIF

Project description

MCIF - Multimodal Crosslingual Instruction-Following

MCIF Logo

arXiv:2507.19634 HuggingFace Dataset FBK-MT/MCIF

MCIF is a comprehensive benchmark for evaluating multimodal, crosslingual instruction-following systems, which covers 3 modalities (text, speech, and video), 4 languages (English, German, Italian, and Chinese), and 13 tasks (organized into 4 macro-tasks).

A subset of MCIF has been used for the evaluation of the IWSLT 2025 Instruction-Following Shared Task.

📰 News

2025.10.22: 🤗 MCIF test set is released on HuggingFace
2025.10.21: ⭐️ MCIF Evaluation first release

📦 Repository Structure

The evaluation is the core component of this repository. All other components (i.e., dataset construction and baseline inference) are included to ensure full reproducibility and transparency of the evaluation results.

For details on dataset generation or baseline models, please refer to the dedicated READMEs (baselines may require specific dependencies):

  • 🧱 Dataset Construction — scripts and guidelines for creating test sets and references → dataset_build/README.md

  • 🚀 Baselines — inference scripts and outputs for baseline systems → baselines/README.md

  • 📊 Evaluation — scoring and comparison utilities for submitted outputs → README.md

⚙️ Installation

The repository can be installed with pip install -e ..

▶️ Usage

For the evaluation, you can simply run:

mcif_eval -t {short/long} -l {en/de/it/zh} -s model_outputs.xml

where model_outputs.xml contains the outputs of your model for the selected track or context length (short or long) and target language among English (en), German (de), Italian (it) and Chinese (zh).

This will automatically download the reference from the Huggingface repository for the latest MCIF version. If you want to specify a different version, use -v. To run the evaluation without internet access, first download the MICF references and then provide them to mcif_eval with the -r parameter.

The file containing the model outputs to evaluate must be structured as follows:

<?xml version='1.0' encoding='utf-8'?>
<testset name="MCIF" type="output">
  <task track="{short/long}" text_lang="{en/de/it/zh}">
    <sample id="1">{SAMPLE1_CONTENT}</sample>
    <sample id="2">{SAMPLE2_CONTENT}</sample>
   ....
  </task>
</testset>

To ease usability, we provide a helper function (mcif.io.write_output) that automatically formats model predictions into the XML structure required by the MCIF evaluation script. The method takes as input:

  • samples: a list of mcif.io.OutputSample containing the sample id and its related prediction;
  • track: the context length or track (short/long);
  • language: the target language (en/de/it/zh);
  • output_name: the semantic name of the output (e.g. My model);
  • output: a path or a byte buffer where the XML file containing all system's outputs, ready for evaluation, is written.

📜 License

MCIF is released under the Apache 2.0 License.

🧩 Citation

If you use MCIF in your research, please cite:

@misc{mcif,
      title={MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks}, 
      author={Sara Papi and Maike Züfle and Marco Gaido and Beatrice Savoldi and Danni Liu and Ioannis Douros and Luisa Bentivogli and Jan Niehues},
      year={2025},
      eprint={2507.19634},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2507.19634}, 
}

Project details


Release history Release notifications | RSS feed

This version

1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcif_bench-1.0.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcif_bench-1.0-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file mcif_bench-1.0.tar.gz.

File metadata

  • Download URL: mcif_bench-1.0.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for mcif_bench-1.0.tar.gz
Algorithm Hash digest
SHA256 d216e0b6dfe333af3425d4321766b13c7f16705eb3fb23bb12635cb3b7b44958
MD5 2e67a2d660621ba2cca354041a231536
BLAKE2b-256 52800b56bae6d6c3ef542823c77caf073b1bf7b83b8be11e4381e04d47dee8ea

See more details on using hashes here.

File details

Details for the file mcif_bench-1.0-py3-none-any.whl.

File metadata

  • Download URL: mcif_bench-1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for mcif_bench-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f70fbce0e471eaa10a4f590b573f80ed94b07ad1b0a0e9b279bc8ef6c3f5f7f9
MD5 9e17ea447055393dfac8d55a154a39f4
BLAKE2b-256 fe6bce079cd44a82c2b4b8f5fd57b7a270ed3e2ac14c9152e5e1b79c315a0b24

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page