Skip to main content

Strip UTF-8 byte order mark (BOM) from strings, bytes, streams, and files. Inspired by the popular strip-bom npm package

Project description

Strip BOM

Python Version License Tests PyPI PyPI Downloads

Strip UTF-8 byte order mark (BOM) from strings, bytes, streams, and files. Inspired by the popular strip-bom npm package.

Features

  • Multiple input types: Strip BOM from strings, bytes, bytearrays, streams, and files
  • Smart validation: Validates UTF-8 encoding before processing buffers
  • Memory efficient: Handles large files and streams without loading everything into memory
  • Zero dependencies: Lightweight with no external dependencies
  • Type safe: Full type hints for excellent IDE support
  • Robust: Graceful error handling and Unicode support (emojis, CJK characters, etc.)

Why Strip BOM?

The UTF-8 Byte Order Mark (BOM) can cause issues when:

  • ❌ Processing files from different sources (some have BOM, others don't)
  • ❌ Comparing strings that should be identical but differ only by BOM
  • ❌ Working with APIs that don't expect BOM characters
  • ❌ Parsing JSON, CSV, or other structured data formats

Note: The Unicode Standard permits BOM in UTF-8 but doesn't require or recommend it, since byte order is irrelevant for UTF-8.

Installation

pip install strip-bom

Usage Examples

Strings

from strip_bom import strip_bom

text_with_bom = '\ufeffunicorn'
clean_text = strip_bom(text_with_bom)
print(clean_text)  # 'unicorn'

# Text without BOM remains unchanged
normal_text = 'Hello World'
print(strip_bom(normal_text))  # 'Hello World'

Bytes and Buffers

from strip_bom import strip_bom_buffer

bytes_with_bom = b'\xef\xbb\xbfunicorn'
clean_bytes = strip_bom_buffer(bytes_with_bom)
print(clean_bytes)  # b'unicorn'

# Invalid UTF-8 is left unchanged (safety first!)
invalid_utf8 = b'\xef\xbb\xbf\xff\xfe'
result = strip_bom_buffer(invalid_utf8)
print(result == invalid_utf8)  # True (no changes made)

Streams (Memory Efficient)

from strip_bom import strip_bom_stream
import io

# Process large streams without loading everything into memory
stream = io.BytesIO(b'\xef\xbb\xbfLarge file content here...')

# Process in chunks
for chunk in strip_bom_stream(stream, chunk_size=8192):
    # Process each chunk as needed
    print(chunk)

# Or get all content at once
stream.seek(0)
content = b''.join(strip_bom_stream(stream))

Files

from strip_bom import strip_bom_file

# Text mode (reads as UTF-8)
content = strip_bom_file('data.txt', mode='r')
print(f"File content: {content}")

# Binary mode
binary_content = strip_bom_file('data.txt', mode='rb')
print(f"Binary content: {binary_content}")

API Reference

strip_bom(text: str) -> str

Remove BOM from Unicode string.

strip_bom_buffer(buffer: Union[bytes, bytearray]) -> bytes

Remove BOM from bytes/bytearray if valid UTF-8.

strip_bom_stream(stream: BinaryIO, chunk_size: int = 8192) -> Iterator[bytes]

Remove BOM from binary stream, yielding chunks.

strip_bom_file(file_path: str, mode: str = 'r') -> Union[str, bytes]

Remove BOM from file content. Mode can be 'r'/'rt' for text or 'rb' for binary.

Learn More

Acknowledgments

Inspired by Sindre Sorhus's strip-bom npm package.

Changelog

See CHANGELOG.md for a detailed list of changes and version history.

Contributing

We welcome contributions! Please see our Contributing Guide for details.

Support

If you find this library helpful:

  • ⭐ Star the repository
  • 🐛 Report issues
  • 🔀 Submit pull requests
  • 💝 Sponsor on GitHub

License

MIT © Y. Siva Sai Krishna - see LICENSE file for details.


Author's GitHubAuthor's LinkedInReport IssuesPackage on PyPI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strip_bom-1.0.0.tar.gz (20.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

strip_bom-1.0.0-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file strip_bom-1.0.0.tar.gz.

File metadata

  • Download URL: strip_bom-1.0.0.tar.gz
  • Upload date:
  • Size: 20.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for strip_bom-1.0.0.tar.gz
Algorithm Hash digest
SHA256 2a13b8ae753d38ed716b0c8a42d057dc2ce2d6c6868a89fc62cf2a8b40ba4d82
MD5 be9bdf85c3fd5c1af634e71e89a7d420
BLAKE2b-256 8e5c044f6c5cd39ac4a087760f8d0715175c1aba026ed338f572e8c4f87f0e86

See more details on using hashes here.

File details

Details for the file strip_bom-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: strip_bom-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 6.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for strip_bom-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 19310f369058c0544a71c98abc664755e38bfe6889e6ec00b80e4504a9205b7b
MD5 f00ddc6a5ac72201a23943edf909b960
BLAKE2b-256 64b444f372af7af2e86ebe23088b412b2fe06e3f60473295ddaf66d1cde0726f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page