Skip to main content

Detects textual content.

Project description

Package Version PyPI - Status Tests Status Code Coverage Percentage Project License Python Versions

🕵️ A Python library which provides consolidated text detection capabilities for reliable content analysis. Offers MIME type detection, character set detection, and line separator processing.

Key Features ⭐

🔍 MIME Type Detection

Intelligent content-based detection using magic bytes with file extension fallback for comprehensive format identification.

📝 Character Encoding Detection

Statistical analysis with UTF-8 optimization and validation through decode operations for reliable text processing.

📄 Line Separator Processing

Cross-platform line ending detection and normalization supporting CR, LF, and CRLF formats with mixed-content handling.

Textual Content Validation

Smart classification of MIME types and content reasonableness assessment using control character and printability heuristics.

Installation 📦

Method: Install Python Package

Install via uv pip command:

uv pip install detextive

Or, install via pip:

pip install detextive

Examples 💡

Basic Usage

MIME Type and Charset Detection:

Load your content as bytes:

import detextive

with open( 'document.txt', 'rb' ) as file:
    content = file.read( )

You can detect MIME type and charset individually:

mimetype = detextive.detect_mimetype( content, location = 'document.txt' )
charset = detextive.detect_charset( content )

Or use combined inference for better accuracy:

mimetype, charset = detextive.infer_mimetype_charset(
    content, location = 'document.txt' )
print( "Detected: {mimetype} with {charset} encoding".format(
    mimetype = mimetype, charset = charset ) )

Line Separator Processing:

Detect line separators in mixed content:

import detextive

content = 'Line 1\r\nLine 2\rLine 3\n'
separator = detextive.LineSeparators.detect_bytes( content.encode( ) )

Normalize line separators to Python standard:

normalized = detextive.LineSeparators.normalize_universal( content )

Convert to platform-specific line separators:

native = detextive.LineSeparators.CRLF.nativize( normalized )

Content Classification:

Check if MIME types represent textual content:

import detextive

detextive.is_textual_mimetype( 'application/json' )  # True
detextive.is_textual_mimetype( 'image/jpeg' )        # False

Validate that decoded text content is reasonable:

text = "Hello world!"
detextive.is_valid_text( text )      # True

Binary data that might decode as text but isn’t valid fails validation:

binary_as_text = "Config file\x00\x00\x00data"
detextive.is_valid_text( binary_as_text )  # False

High-Level Decoding:

For complete bytes-to-text processing with automatic charset detection and validation:

import detextive

with open( 'document.txt', 'rb' ) as file:
    content = file.read( )

text = detextive.decode( content, location = 'document.txt' )
print( f"Decoded text: {text}" )

Contribution 🤝

Contribution to this project is welcome! However, it must follow the code of conduct for the project.

Please file bug reports and feature requests in the issue tracker or submit pull requests to improve the source code or documentation.

For development guidance and standards, please see the development guide.

Additional Indicia

GitHub last commit Copier Hatch pre-commit Pyright Ruff PyPI - Implementation PyPI - Wheel

Other Projects by This Author 🌟

  • python-absence (absence on PyPI)

    🕳️ A Python library package which provides a sentinel for absent values - a falsey, immutable singleton that represents the absence of a value in contexts where None or False may be valid values.

  • python-accretive (accretive on PyPI)

    🌌 A Python library package which provides accretive data structures - collections which can grow but never shrink.

  • python-classcore (classcore on PyPI)

    🏭 A Python library package which provides foundational class factories and decorators for providing classes with attributes immutability and concealment and other custom behaviors.

  • python-dynadoc (dynadoc on PyPI)

    📝 A Python library package which bridges the gap between rich annotations and automatic documentation generation with configurable renderers and support for reusable fragments.

  • python-falsifier (falsifier on PyPI)

    🎭 A very simple Python library package which provides a base class for falsey objects - objects that evaluate to False in boolean contexts.

  • python-frigid (frigid on PyPI)

    🔒 A Python library package which provides immutable data structures - collections which cannot be modified after creation.

  • python-icecream-truck (icecream-truck on PyPI)

    🍦 Flavorful Debugging - A Python library which enhances the powerful and well-known icecream package with flavored traces, configuration hierarchies, customized outputs, ready-made recipes, and more.

  • python-librovore (librovore on PyPI)

    🐲 Documentation Search Engine - An intelligent documentation search and extraction tool that provides both a command-line interface for humans and an MCP (Model Context Protocol) server for AI agents. Search across Sphinx and MkDocs sites with fuzzy matching, extract clean markdown content, and integrate seamlessly with AI development workflows.

  • python-mimeogram (mimeogram on PyPI)

    📨 A command-line tool for exchanging collections of files with Large Language Models - bundle multiple files into a single clipboard-ready document while preserving directory structure and metadata… good for code reviews, project sharing, and LLM interactions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

detextive-3.1.tar.gz (23.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

detextive-3.1-py3-none-any.whl (31.9 kB view details)

Uploaded Python 3

File details

Details for the file detextive-3.1.tar.gz.

File metadata

  • Download URL: detextive-3.1.tar.gz
  • Upload date:
  • Size: 23.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for detextive-3.1.tar.gz
Algorithm Hash digest
SHA256 76a8130dfedccee61d4aed61f638472bb9f464f2ddcf5b7e55df4b64058a6fc8
MD5 dc70f9ffb77fce470f2fd5fc069cc0ad
BLAKE2b-256 ec138cbca48d2cd782ce1ad60b22a978b9b8edb3d69867b40c17d4af581e0c98

See more details on using hashes here.

Provenance

The following attestation bundles were made for detextive-3.1.tar.gz:

Publisher: releaser.yaml on emcd/python-detextive

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file detextive-3.1-py3-none-any.whl.

File metadata

  • Download URL: detextive-3.1-py3-none-any.whl
  • Upload date:
  • Size: 31.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for detextive-3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 67e26c13a2093b150ec36be0f9e7b1734ddbf46b637eb86b30e3e28d9d16dafa
MD5 dc21670c4face6368c9e57a305cf35d9
BLAKE2b-256 4b85b3f457d8a1c71922dcb67eff75132a3cbcd33e46fd814ed0f845a2233c7c

See more details on using hashes here.

Provenance

The following attestation bundles were made for detextive-3.1-py3-none-any.whl:

Publisher: releaser.yaml on emcd/python-detextive

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page