A library for calculating consensus entropy between multiple strings, particularly useful for OCR result analysis

These details have not been verified by PyPI

Project links

Project description

Consensus Entropy | 共识熵

English

A Python library for calculating consensus entropy between multiple strings, particularly useful for OCR result analysis. Uses Levenshtein distance to calculate string differences.

This library is the official implementation of our paper: Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR

Citation

If you use this library in your research, please cite our paper:

@misc{zhang2025consensusentropyharnessingmultivlm,
      title={Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR}, 
      author={Yulong Zhang and Tianyi Liang and Xinyue Huang and Erfei Cui and Xu Guo and Pei Chu and Chenhui Li and Ru Zhang and Wenhai Wang and Gongshen Liu},
      year={2025},
      eprint={2504.11101},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.11101}
}

Installation

pip install consensus-entropy

Usage

Basic Usage

from consensus_entropy import calculate_consensus_entropy

# Calculate consensus entropy for multiple OCR results
ocr_results = [
    "Hello World",
    "Hello Wrld",
    "Hallo World"
]

# Calculate entropy values for each result
entropy_values = calculate_consensus_entropy(ocr_results, task_type="ocr")
print(entropy_values)  # [0.1667, 0.3333, 0.3333]

Get Best OCR Result

from consensus_entropy import get_best_ocr_result

# Get the OCR result with lowest entropy
ocr_results = ["Test1", "Test2", "Text2"]
best_result, best_entropy = get_best_ocr_result(ocr_results, task_type="ocr")
print(f"Best result: {best_result}")
print(f"Entropy: {best_entropy:.4f}")

Features

Calculate normalized Levenshtein distance
Compute consensus entropy for multiple strings
Get the best OCR result with lowest entropy
Support for both English and Chinese text
Type hints for better IDE support
Optimized for OCR tasks

Requirements

Python 3.7+
numpy
python-Levenshtein

Notes

Currently only supports OCR task type
Input string list must contain at least two elements
All inputs will be converted to string type

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

中文

一个用于计算多个字符串之间共识熵的Python库，特别适用于OCR结果分析。使用Levenshtein距离来计算字符串之间的差异。

本库是我们论文的官方实现：Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR

引用

如果您在研究中使用了本库，请引用我们的论文：

@misc{zhang2025consensusentropyharnessingmultivlm,
      title={Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR}, 
      author={Yulong Zhang and Tianyi Liang and Xinyue Huang and Erfei Cui and Xu Guo and Pei Chu and Chenhui Li and Ru Zhang and Wenhai Wang and Gongshen Liu},
      year={2025},
      eprint={2504.11101},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.11101}
}

安装

pip install consensus-entropy

使用方法

基本用法

from consensus_entropy import calculate_consensus_entropy

# 计算多个OCR结果的共识熵
ocr_results = [
    "人工智能",
    "人工智障",
    "人工智能",
    "人工智惠"
]

# 计算每个结果的熵值
entropy_values = calculate_consensus_entropy(ocr_results, task_type="ocr")
print(entropy_values)  # [0.1667, 0.2500, 0.1667, 0.2500]

获取最佳OCR结果

from consensus_entropy import get_best_ocr_result

# 获取熵值最低的OCR结果
ocr_results = ["测试文本1", "测试文本2", "文本2"]
best_result, best_entropy = get_best_ocr_result(ocr_results, task_type="ocr")
print(f"最佳结果: {best_result}")
print(f"熵值: {best_entropy:.4f}")

功能特点

计算标准化Levenshtein距离
计算多个OCR结果的共识熵
获取熵值最低的最佳OCR结果
支持中文和英文文本
类型提示支持
针对OCR任务优化的算法

系统要求

Python 3.7+
numpy
python-Levenshtein

注意事项

目前仅支持OCR任务类型
输入字符串列表至少需要两个元素
所有输入都会被转换为字符串类型处理

许可证

本项目采用MIT许可证 - 详见LICENSE文件

贡献

欢迎提交Pull Request来改进这个项目。

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

consensus_entropy-0.1.0.tar.gz (5.4 kB view details)

Uploaded May 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

consensus_entropy-0.1.0-py3-none-any.whl (5.7 kB view details)

Uploaded May 14, 2025 Python 3

File details

Details for the file consensus_entropy-0.1.0.tar.gz.

File metadata

Download URL: consensus_entropy-0.1.0.tar.gz
Upload date: May 14, 2025
Size: 5.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for consensus_entropy-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`94af2fe79af37f08f290f5a3dbc756a8584a2ec3bfb0571629f1039c204ec925`
MD5	`ea5cb8bba1b6b4c357eeb767e66e923a`
BLAKE2b-256	`ee757d6d6c84ed715644adb755cb1af4291daca1d79f8bed6c05137dd35427ce`

See more details on using hashes here.

File details

Details for the file consensus_entropy-0.1.0-py3-none-any.whl.

File metadata

Download URL: consensus_entropy-0.1.0-py3-none-any.whl
Upload date: May 14, 2025
Size: 5.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for consensus_entropy-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2c659a13b2b38afb44ecbcf410d6865f2f28681bf23f9c9dd8a547f92f55304d`
MD5	`55245f2d07e6764f544d9e70202c9279`
BLAKE2b-256	`7ec7c0a179b56038094276e2401520a3a572f4d407fbe1fbf9e933452957518b`

See more details on using hashes here.

consensus-entropy 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Consensus Entropy | 共识熵

English

Citation

Installation

Usage

Basic Usage

Get Best OCR Result

Features

Requirements

Notes

License

Contributing

中文

引用

安装

使用方法

基本用法

获取最佳OCR结果

功能特点

系统要求

注意事项

许可证

贡献

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes