Skip to main content

Ultra-lightweight pure Python package to check if a file is binary or text.

Project description

BinaryOrNot

Python library and CLI tool to check if a file is binary or text. Zero dependencies.

from binaryornot.check import is_binary

is_binary("image.png")    # True
is_binary("README.md")    # False
is_binary("data.sqlite")  # True
is_binary("report.csv")   # False
$ binaryornot image.png
True

Install

pip install binaryornot

Why not just check for null bytes?

That's the first thing everyone tries. It works until it doesn't:

  • A UTF-16 text file is full of null bytes. Your tool thinks it's binary and corrupts it.
  • A Big5 or GB2312 text file has high-ASCII bytes everywhere. Looks binary by byte ratios alone.
  • A font file (.woff, .eot) is clearly binary but might not have null bytes in the first chunk.

BinaryOrNot reads the first 128 bytes and runs them through a trained decision tree that considers byte ratios, Shannon entropy, encoding validity, BOM detection, and more. It handles all the edge cases above correctly, with zero dependencies.

Tested against 37 text encodings and 49 binary formats, verified by parametrized tests driven from coverage CSVs.

API

One function:

from binaryornot.check import is_binary

is_binary(filename)  # returns True or False

There's also is_binary_string() if you already have bytes:

from binaryornot.helpers import is_binary_string

is_binary_string(b"\x00\x01\x02")  # True
is_binary_string(b"hello world")   # False

Full documentation covers the detection algorithm in detail.

Credits

Created by Audrey Roy Greenfeld.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

binaryornot-0.5.0.tar.gz (428.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

binaryornot-0.5.0-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file binaryornot-0.5.0.tar.gz.

File metadata

  • Download URL: binaryornot-0.5.0.tar.gz
  • Upload date:
  • Size: 428.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for binaryornot-0.5.0.tar.gz
Algorithm Hash digest
SHA256 dcacb1343219da5fbbb7828a46b946768c6df07b65453195954cbbecf14c1c83
MD5 2a76bf08a1af1d482a69ff1bd55298df
BLAKE2b-256 617e8a41b27448bfcc8138f8aec3ac2a467edf22a1d85bdbbd7fd0a130372fc4

See more details on using hashes here.

Provenance

The following attestation bundles were made for binaryornot-0.5.0.tar.gz:

Publisher: publish.yml on binaryornot/binaryornot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file binaryornot-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: binaryornot-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for binaryornot-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2a969a893feb93508c14ea64a80354ba4a3164ccd2d3b5122cd438fab6965134
MD5 d7e9047c2727e4b1578ab024714bb52d
BLAKE2b-256 771392fbe1fce5eecc0c926ec94a9a8904e9f9a74c286887205fbc071f4ea349

See more details on using hashes here.

Provenance

The following attestation bundles were made for binaryornot-0.5.0-py3-none-any.whl:

Publisher: publish.yml on binaryornot/binaryornot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page