Implementation of the MIME Sniffing standard (https://mimesniff.spec.whatwg.org/)

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

xtractmime

xtractmime is a BSD-licensed Python 3.7+ implementation of the MIME Sniffing Standard.

Install from PyPI:

pip install xtractmime

Basic usage

Below mentioned are some simple examples of using xtractmime.extract_mime:

>>> from xtractmime import extract_mime
>>> extract_mime(b'Sample text content')
b'text/plain'
>>> extract_mime(b'', content_types=(b'text/html',))
b'text/html'

Additional functionality to check if a MIME type belongs to a specific MIME type group using methods included in xtractmime.mimegroups:

>>> from xtractmime.mimegroups import is_html_mime_type, is_image_mime_type
>>> mime_type = b'text/html'
>>> is_html_mime_type(mime_type)
True
>>> is_image_mime_type(mime_type)
False

API Reference

function `xtractmime.extract_mime(*args, **kwargs) -> Optional[bytes]`

Parameters:

body: bytes
content_types: Optional[Tuple[bytes]] = None
http_origin: bool = True
no_sniff: bool = False
extra_types: Optional[Tuple[Tuple[bytes, bytes, Optional[Set[bytes]], bytes], ...]] = None
supported_types: Set[bytes] = None

Return the MIME type essence (e.g. text/html) matching the input data, or None if no match can be found.

The body parameter is the byte sequence of which MIME type is to be determined. xtractmime only considers the first few bytes of the body and the specific number of bytes read is defined in the xtractmime.RESOURCE_HEADER_BUFFER_LENGTH constant.

content_types is a tuple of MIME types given in the resource metadata. For example, for resources retrieved via HTTP, users should pass the list of MIME types mentioned in the Content-Type header.

http_origin indicates if the resource has been retrieved via HTTP (True, default) or not (False).

no_sniff is a flag which is True if the user agent does not wish to perform sniffing on the resource and False (by default) otherwise. Users may want to set this parameter to True if the X-Content-Type-Options response header is set to nosniff. For more info, see here.

extra_types is a tuple of patterns to support detecting additional MIME types. Each entry in the tuple should follow the format (Byte Pattern, Pattern Mask, Leading Bytes, MIME type):

Byte Pattern is a byte sequence to compare with the first few bytes (xtractmime.RESOURCE_HEADER_BUFFER_LENGTH) of the body.
Pattern Mask is a byte sequence that indicates the significance of Byte Pattern bytes: b"\xff" indicates the matching byte is strictly significant, b"\xdf" indicates that the byte is significant in an ASCII case-insensitive way, and b"\x00" indicates that the byte is not significant.
Leading Bytes is a set of bytes to be ignored while matching the leading bytes in the content.
MIME type should be returned if the pattern matches.

Sample extra_types:

extra_types = ((b'test', b'\xff\xff\xff\xff', None, b'text/test'), ...)

NOTE

Be careful while using the extra_types argument, as it may introduce some privilege escalation vulnerabilities for xtractmime. For more info, see here.

Optional supported_types is a set of all MIME types supported the by user agent. If supported_types is not specified, all MIME types are assumed to be supported. Using this parameter can improve the performance of xtractmime.

function `xtractmime.is_binary_data(input_bytes: bytes) -> bool`

Return True if the provided byte sequence contains any binary data bytes, else False

MIME type group functions

The following functions return True if a given MIME type belongs to a certain MIME type group, or False otherwise:

xtractmime.mimegroups.is_archive_mime_type(mime_type: bytes) -> bool
xtractmime.mimegroups.is_audio_video_mime_type(mime_type: bytes) -> bool
xtractmime.mimegroups.is_font_mime_type(mime_type: bytes) -> bool
xtractmime.mimegroups.is_html_mime_type(mime_type: bytes) -> bool
xtractmime.mimegroups.is_image_mime_type(mime_type: bytes) -> bool
xtractmime.mimegroups.is_javascript_mime_type(mime_type: bytes) -> bool
xtractmime.mimegroups.is_json_mime_type(mime_type: bytes) -> bool
xtractmime.mimegroups.is_scriptable_mime_type(mime_type: bytes) -> bool
xtractmime.mimegroups.is_xml_mime_type(mime_type: bytes) -> bool
xtractmime.mimegroups.is_zip_mime_type(mime_type: bytes) -> bool

Example

>>> from xtractmime.mimegroups import is_html_mime_type, is_image_mime_type, is_zip_mime_type
>>> mime_type = b'text/html'
>>> is_html_mime_type(mime_type)
True
>>> is_image_mime_type(mime_type)
False
>>> is_zip_mime_type(mime_type)
False

Changelog

See the changelog

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.2.1

Jan 16, 2024

0.2.0

Aug 31, 2023

0.1.0

Jun 21, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xtractmime-0.2.1.tar.gz (14.8 kB view hashes)

Uploaded Jan 16, 2024 Source

Built Distribution

xtractmime-0.2.1-py3-none-any.whl (10.4 kB view hashes)

Uploaded Jan 16, 2024 Python 3

Hashes for xtractmime-0.2.1.tar.gz

Hashes for xtractmime-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`fc00b78b51edb113d6e25b2a81bf21a5f66b274e9a5270c5b72169b98c2997af`
MD5	`ce36eb16fd294058f156e6053f82b2d9`
BLAKE2b-256	`7a5cfecb22023edcfb766225aef2c8857e372997d76070401f3ab20d35c037bc`

Hashes for xtractmime-0.2.1-py3-none-any.whl

Hashes for xtractmime-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`13d6cce82325dfa329ea2cb57c4d324838f68ef4125090bc6c9a393ad88cfcc3`
MD5	`1c6a3e2ee712acf7ad747ef2a54b17ab`
BLAKE2b-256	`678190382f87c60f57a22e6cedd5795b67909117af20cb4c3dd5adec87764288`

xtractmime 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

xtractmime

Basic usage

API Reference

function `xtractmime.extract_mime(*args, **kwargs) -> Optional[bytes]`

function `xtractmime.is_binary_data(input_bytes: bytes) -> bool`

MIME type group functions

Changelog

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

xtractmime 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

xtractmime

Basic usage

API Reference

function xtractmime.extract_mime(*args, **kwargs) -> Optional[bytes]

function xtractmime.is_binary_data(input_bytes: bytes) -> bool

MIME type group functions

Changelog

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

function `xtractmime.extract_mime(*args, **kwargs) -> Optional[bytes]`

function `xtractmime.is_binary_data(input_bytes: bytes) -> bool`