Detects textual content.
Project description
🕵️ A Python library which provides consolidated text detection capabilities for reliable content analysis. Offers MIME type detection, character set detection, and line separator processing.
Key Features ⭐
- 🔍 MIME Type Detection
Intelligent content-based detection using magic bytes with file extension fallback for comprehensive format identification.
- 📝 Character Encoding Detection
Statistical analysis with UTF-8 optimization and validation through decode operations for reliable text processing.
- 📄 Line Separator Processing
Cross-platform line ending detection and normalization supporting CR, LF, and CRLF formats with mixed-content handling.
- ✅ Textual Content Validation
Smart classification of MIME types and content reasonableness assessment using control character and printability heuristics.
Installation 📦
Method: Install Python Package
Install via uv pip command:
uv pip install detextive
Or, install via pip:
pip install detextive
Examples 💡
Basic Usage
MIME Type and Charset Detection:
import detextive
with open( 'document.txt', 'rb' ) as file:
content = file.read( )
# Individual detection
mimetype = detextive.detect_mimetype( content, 'document.txt' )
charset = detextive.detect_charset( content )
# Combined detection
mimetype, charset = detextive.detect_mimetype_and_charset(
content, 'document.txt' )
print( "Detected: {mimetype} with {charset} encoding".format(
mimetype = mimetype, charset = charset ) )
Line Separator Processing:
import detextive
content = 'Line 1\r\nLine 2\rLine 3\n'
separator = detextive.LineSeparators.detect_bytes( content.encode( ) )
# Normalize line separators to Python standard.
normalized = detextive.LineSeparators.normalize_universal( content )
# Convert to specific line separators.
native = detextive.LineSeparators.CRLF.nativize( normalized )
Content Classification:
import detextive
# Check if MIME type represents textual content
detextive.is_textual_mimetype( 'application/json' ) # True
detextive.is_textual_mimetype( 'image/jpeg' ) # False
# Validate text content from bytes
detextive.is_textual_content( b'Hello world!' ) # True
detextive.is_textual_content( b'\x00\x01\x02\x03' ) # False
Contribution 🤝
Contribution to this project is welcome! However, it must follow the code of conduct for the project.
Please file bug reports and feature requests in the issue tracker or submit pull requests to improve the source code or documentation.
For development guidance and standards, please see the development guide.
More Flair
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file detextive-1.0.tar.gz.
File metadata
- Download URL: detextive-1.0.tar.gz
- Upload date:
- Size: 14.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
403b632f8b8b280e1d73e36edcd53c80f632751f2e39678f429f6881113ae9c7
|
|
| MD5 |
cc237838a3cd5aa0b0db0b4ba681f36c
|
|
| BLAKE2b-256 |
8bb07c7339e11a9df9501f3900d8ed648e59b596b032fe5106b6f94ce0062bbd
|
Provenance
The following attestation bundles were made for detextive-1.0.tar.gz:
Publisher:
releaser.yaml on emcd/python-detextive
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
detextive-1.0.tar.gz -
Subject digest:
403b632f8b8b280e1d73e36edcd53c80f632751f2e39678f429f6881113ae9c7 - Sigstore transparency entry: 385877552
- Sigstore integration time:
-
Permalink:
emcd/python-detextive@b5d36da60f5a7a9b64eb477a4041138417c55575 -
Branch / Tag:
refs/tags/v1.0 - Owner: https://github.com/emcd
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
releaser.yaml@b5d36da60f5a7a9b64eb477a4041138417c55575 -
Trigger Event:
push
-
Statement type:
File details
Details for the file detextive-1.0-py3-none-any.whl.
File metadata
- Download URL: detextive-1.0-py3-none-any.whl
- Upload date:
- Size: 16.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be6e177096daf51b29d5d981cd2d7925fe587cb587030621de5dde06382952fa
|
|
| MD5 |
643c70343b1b11018cb1279b1688d9ea
|
|
| BLAKE2b-256 |
22354467f23e25f3d566923748eae393c7ce16d1084d62b4508372d77ab86703
|
Provenance
The following attestation bundles were made for detextive-1.0-py3-none-any.whl:
Publisher:
releaser.yaml on emcd/python-detextive
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
detextive-1.0-py3-none-any.whl -
Subject digest:
be6e177096daf51b29d5d981cd2d7925fe587cb587030621de5dde06382952fa - Sigstore transparency entry: 385877580
- Sigstore integration time:
-
Permalink:
emcd/python-detextive@b5d36da60f5a7a9b64eb477a4041138417c55575 -
Branch / Tag:
refs/tags/v1.0 - Owner: https://github.com/emcd
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
releaser.yaml@b5d36da60f5a7a9b64eb477a4041138417c55575 -
Trigger Event:
push
-
Statement type: