Skip to main content

Fast CDM utilities for LaTeX tokenization, rendering, and matching

Project description

🚀 Introduction

CDM ensures the objectivity and accuracy of evaluation by rendering predicted and ground-truth LaTeX formulas into images, and then using visual feature extraction and localization techniques to perform precise character-level matching, combined with spatial position information.

FastCDM aims to address performance issues. As a high-performance optimized version of the original CDM, FastCDM employs the browser-based KaTeX rendering engine instead of traditional LaTeX compilation, resulting in significantly improved speed.

🎯 Project Goals

The core objective of FastCDM is to provide a convenient user experience during the training process, helping to advance formula recognition tasks. We are committed to:

  • Providing simple and easy-to-use API interfaces for convenient integration of evaluation within the training loop.
  • Supporting both real-time evaluation and batch evaluation modes.
  • Providing visualization tools for evaluation metrics during the training process.

Why Choose FastCDM?

  1. Extreme Performance: Based on the KaTeX rendering engine, it is tens of times faster than the traditional LaTeX compilation process.
  2. Simplified Deployment: No need to install complex LaTeX environments (ImageMagick, texlive-full, etc.).
  3. Accurate Evaluation: Adopts character detection matching methods to avoid the unfairness issues associated with traditional text metrics.
  4. Continuous Optimization: Supplements and refines CDM symbol support, with continuous iterative improvements.
  5. Easy Integration: Provides a unified API interface for easy integration into various training frameworks. Future integration with mainstream training frameworks such as PyTorch and Transformers is planned.

⚠️ Note

Although KaTeX is extremely fast, it is a lightweight solution optimized for the Web and cannot support 100% of all obscure LaTeX syntax.

For the vast majority of standard formulas, it performs perfectly. This is a reasonable and sustainable technical choice.

You can check KaTeX's support coverage here: 🔗 KaTeX Support Table


Usage

Installation

You need to install node.js and chromedriver in advance.

  • For node.js installation, please refer to here.
  • For chromedriver installation, please refer to here.
pip install fastcdm

Quick Start

from fastcdm import FastCDM

chromedriver_path = "driver/chromedriver"

# Initialize FastCDM evaluator
evaluator = FastCDM(chromedriver_path=chromedriver_path)

# Evaluate
cdm_score, recall, precision = evaluator.compute(gt="E = mc^2", pred="E + 1 = mc^2", visualize=False)

# Evaluate and visualize
cdm_score, recall, precision, vis_img = evaluator.compute(gt="E = mc^2", pred="E + 1 = mc^2", visualize=True)

Interactive Demo

We provide a visualization Demo developed with Gradio, which you can try on HuggingFace Spaces. You can also launch it locally:

python3 scripts/app.py

Contribution and Feedback

We welcome all forms of contribution, including but not limited to:

  • Submitting issue reports
  • Suggesting improvements
  • Submitting code changes (please open an issue for discussion first)

Please contact us via the project's issues.


License

This project is open-sourced under the Apache 2.0 license. You are free to use, modify, and distribute the code of this project under the terms of the license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastcdm-0.1.4.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastcdm-0.1.4-py3-none-any.whl (1.1 MB view details)

Uploaded Python 3

File details

Details for the file fastcdm-0.1.4.tar.gz.

File metadata

  • Download URL: fastcdm-0.1.4.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for fastcdm-0.1.4.tar.gz
Algorithm Hash digest
SHA256 8826143e51d01b23405dc62dcd55be886337606e76a8f9d15e9fde88163d78de
MD5 4b7fa15327e4574bf0f1ea8a7b34ef01
BLAKE2b-256 007b099c32b91c8427550c05e40fe2ca0ba0e22420db3cd021387889e1c7a894

See more details on using hashes here.

File details

Details for the file fastcdm-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: fastcdm-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for fastcdm-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 5f12ca60f3ccb3b65e2dda72a3bf0cdceb03e99afd0df11be22529b0f6b019ff
MD5 e16c1e12d367585d70f3cb10379b9beb
BLAKE2b-256 fc4d6a60a1f7000b2ac9c1f4f830382e716ee6aa5ad57b0e94a886ab2b5467ce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page