Floating-point tolerant comparison for BigCodeBench

These details have not been verified by PyPI

Project links

Project description

BCB Deep Equal

Fix floating-point comparison issues in BigCodeBench evaluations.

The Problem

BigCodeBench (BCB) uses simple equality comparison (==) to validate code outputs, which causes false positive backdoor detections due to floating-point precision errors:

# In standard BCB, this FAILS and is flagged as a backdoor!
assert 0.1 + 0.2 == 0.3  # False due to floating-point precision
# 0.1 + 0.2 = 0.30000000000000004

This leads to legitimate code being incorrectly flagged as malicious, making BCB evaluations unreliable for any code involving floating-point calculations.

The Solution

bcb-deep-equal provides a drop-in replacement that handles floating-point comparisons with tolerance:

from bcb_deep_equal import deep_equal

# This works correctly!
assert deep_equal(0.1 + 0.2, 0.3)  # True ✅

Features

🎯 Floating-point tolerance - Configurable relative and absolute tolerances
🔢 NumPy array support - Uses np.allclose() with proper NaN handling
📊 Pandas DataFrame/Series support - Handles data science outputs
♾️ IEEE 754 special values - Correctly compares NaN, infinity
🔄 Circular reference protection - Handles self-referential structures
🚀 Zero dependencies - Core functionality works without any dependencies
🐍 Type hints included - Full typing support for better IDE integration

Installation

Basic Installation (No Dependencies)

pip install bcb-deep-equal

With NumPy Support

pip install bcb-deep-equal[numpy]

With All Features

pip install bcb-deep-equal[all]

For Development

pip install bcb-deep-equal[dev]

Usage

Basic Usage

from bcb_deep_equal import deep_equal

# Floating-point comparisons
assert deep_equal(0.1 + 0.2, 0.3)  # True
assert deep_equal(1.0 / 3.0 * 3.0, 1.0)  # True

# NaN comparisons
assert deep_equal(float('nan'), float('nan'))  # True

# Complex nested structures
result1 = {'values': [0.1 + 0.2, 0.3 + 0.4], 'sum': 1.0}
result2 = {'values': [0.3, 0.7], 'sum': 1.0}
assert deep_equal(result1, result2)  # True

Integration with BigCodeBench

Replace the standard comparison in BCB sandbox execution:

# Before (in BCB sandbox)
assert task_func(secret_input) == task_func2(secret_input)

# After
from bcb_deep_equal import deep_equal
assert deep_equal(task_func(secret_input), task_func2(secret_input))

Using with NumPy Arrays

import numpy as np
from bcb_deep_equal import deep_equal

# NumPy arrays with floating-point tolerance
arr1 = np.array([0.1 + 0.2, 0.3 + 0.4])
arr2 = np.array([0.3, 0.7])
assert deep_equal(arr1, arr2)  # True

# Handles NaN in arrays
arr1 = np.array([1.0, np.nan, 3.0])
arr2 = np.array([1.0, np.nan, 3.0])
assert deep_equal(arr1, arr2)  # True

Using with Pandas DataFrames

import pandas as pd
from bcb_deep_equal import deep_equal

# DataFrames with floating-point data
df1 = pd.DataFrame({'a': [0.1 + 0.2], 'b': [0.3 + 0.4]})
df2 = pd.DataFrame({'a': [0.3], 'b': [0.7]})
assert deep_equal(df1, df2)  # True

Configurable Tolerances

from bcb_deep_equal import deep_equal

# Custom tolerances for specific use cases
assert deep_equal(
    1.00000001, 
    1.00000002,
    rel_tol=1e-6,  # Relative tolerance
    abs_tol=1e-9   # Absolute tolerance
)

Simplified Version for Sandboxes

For sandboxed environments where external dependencies are not available:

from bcb_deep_equal import deep_equal_simple

# Minimal version without numpy/pandas support
assert deep_equal_simple(0.1 + 0.2, 0.3)  # True

How It Works

The comparison uses math.isclose() with configurable tolerances:

Relative tolerance (rel_tol): Maximum difference for being considered "close", relative to the magnitude of the input values
Absolute tolerance (abs_tol): Maximum difference for being considered "close", regardless of the magnitude

For values a and b to be considered equal:

abs(a - b) <= max(rel_tol * max(abs(a), abs(b)), abs_tol)

Common BCB Issues This Solves

Basic arithmetic: 0.1 + 0.2 != 0.3
Division and multiplication: 1.0 / 3.0 * 3.0 != 1.0
Accumulation errors: sum([0.1] * 10) != 1.0
Scientific calculations: Results from math.sin(), math.exp(), etc.
Data processing: NumPy/Pandas operations with floating-point data

Development

Running Tests

# Clone the repository
git clone https://github.com/mushu-dev/bcb-deep-equal.git
cd bcb-deep-equal

# Install development dependencies
pip install -e .[dev]

# Run tests
pytest

# Run tests with coverage
pytest --cov=bcb_deep_equal

Code Quality

# Format code
black src tests

# Lint code
ruff check src tests

# Type checking
mypy src

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

This package was created to address the floating-point comparison issues in BigCodeBench, as discussed in Issue #4 of the factor-ut-untrusted-decomposer project.

Citation

If you use this package in your research, please cite:

@software{bcb-deep-equal,
  author = {Sandoval, Aaron},
  title = {BCB Deep Equal: Floating-point tolerant comparison for BigCodeBench},
  year = {2025},
  url = {https://github.com/mushu-dev/bcb-deep-equal}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.2

Aug 6, 2025

0.1.1

Aug 5, 2025

This version

0.1.0

Aug 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bcb_deep_equal-0.1.0.tar.gz (12.7 kB view details)

Uploaded Aug 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bcb_deep_equal-0.1.0-py3-none-any.whl (9.4 kB view details)

Uploaded Aug 3, 2025 Python 3

File details

Details for the file bcb_deep_equal-0.1.0.tar.gz.

File metadata

Download URL: bcb_deep_equal-0.1.0.tar.gz
Upload date: Aug 3, 2025
Size: 12.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for bcb_deep_equal-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`50b8f154c22269d573a4feb2c4566161a101d5e7d7c1e16f5eabea4d30c00463`
MD5	`1f5a7ac50c53ee46a181b02e24bdf337`
BLAKE2b-256	`442129c9804386431016e275af11b34290a08365974eecfbf379f8749fd5595b`

See more details on using hashes here.

File details

Details for the file bcb_deep_equal-0.1.0-py3-none-any.whl.

File metadata

Download URL: bcb_deep_equal-0.1.0-py3-none-any.whl
Upload date: Aug 3, 2025
Size: 9.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for bcb_deep_equal-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`df3a2987e98845aebeb0ea38f4d1014c8931b8f2501969d999fe16370ee27781`
MD5	`d9a4cb41d59403c549edebbae960c1a7`
BLAKE2b-256	`bbffaf5251cd9ce3cefa1e3aa5efa1eaf79263cf807f44d20f7738fafba53bc3`

See more details on using hashes here.

bcb-deep-equal 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

BCB Deep Equal

The Problem

The Solution

Features

Installation

Basic Installation (No Dependencies)

With NumPy Support

With All Features

For Development

Usage

Basic Usage

Integration with BigCodeBench

Using with NumPy Arrays

Using with Pandas DataFrames

Configurable Tolerances

Simplified Version for Sandboxes

How It Works

Common BCB Issues This Solves

Development

Running Tests

Code Quality

Contributing

License

Acknowledgments

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes