Skip to main content

Pure python implementation of magic file detection

Project description

puremagic is a pure python module that will identify a file based off it’s magic numbers.

BuildStatus CoverageStatus License PyPi

It is designed to be minimalistic and inherently cross platform compatible. It is also designed to be a stand in for python-magic, it incorporates the functions from_file(filename[, mime]) and from_string(string[, mime]) however the magic_file() and magic_string() are more powerful and will also display confidence and duplicate matches.

It does NOT try to match files off non-magic string. In other words it will not search for a string within a certain window of bytes like others might.

Advantages over using a wrapper for ‘file’ or ‘libmagic’:

  • Faster

  • Lightweight

  • Cross platform compatible

  • No dependencies

Disadvantages:

  • Does not have as many file types

  • No multilingual comments

  • Duplications due to small or reused magic numbers

(Help fix the first two disadvantages by contributing!)

Compatibility

  • Python 2.6+

  • Python 3.2+

  • Pypy

Using travis-ci to run continuous integration tests on listed platforms.

Install

In either a virtualenv or globally, simply run:

$ python setup.py install

It has no dependencies (other than the 2.7+ built-in argparse)

Usage

“from_file” will return the most likely file extension. “magic_file” will give you every possible result it finds, as well as the confidence.

import puremagic

filename = "test/resources/images/test.gif"

ext = puremagic.from_file(filename)
# '.gif'

puremagic.magic_file(filename)
# [['.gif', 'image/gif', 'Graphics interchange format file (GIF87a)', 0.7],
#  ['.gif', '', 'GIF file', 0.5]]

With “magic_file” it gives each match, highest confidence first:

  • possible extension(s)

  • mime type

  • description

  • confidence (All headers have to perfectly match to make the list, however this orders it by longest header, therefore most precise, first)

Script

Usage

$ python -m puremagic [options] filename <filename2>...

Examples

$ python -m puremagic test/resources/images/test.gif
'test/resources/images/test.gif' : .gif

$ python -m puremagic -m test/resources/images/test.gif test/resources/audio/test.mp3
'test/resources/images/test.gif' : image/gif
'test/resources/audio/test.mp3' : audio/mpeg

FAQ

The file type is actually X but it’s showing up as Y with higher confidence?

This can happen when the file’s signature happens to match a subset of a file standard. The subset signature will be longer, therefore report with greater confidence, because it will have both the base file type signature plus the additional subset one.

You don’t have sliding offsets that could better detect plenty of common formats, why’s that?

Design choice, so it will be a lot faster and more accurate. Without more intelligent or deeper identification past a sliding offset I don’t feel comfortable including it as part of a ‘magic number’ library.

Your version isn’t as complete as I want it to be, where else should I look?

Look into python modules that wrap around libmagic or use something like Apache Tika.

Acknowledgements

Gary C. Kessler

For use of his File Signature Tables, available at: http://www.garykessler.net/library/file_sigs.html

Freedesktop.org

For use of their shared-mime-info file (even if they do use XML, blea), available at: https://cgit.freedesktop.org/xdg/shared-mime-info/

License

MIT Licenced, see LICENSE, Copyright (c) 2013-2017 Chris Griffith

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

puremagic-1.4.tar.gz (7.4 kB view details)

Uploaded Source

Built Distributions

puremagic-1.4-py3-none-any.whl (26.6 kB view details)

Uploaded Python 3

puremagic-1.4-py2-none-any.whl (26.6 kB view details)

Uploaded Python 2

File details

Details for the file puremagic-1.4.tar.gz.

File metadata

  • Download URL: puremagic-1.4.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for puremagic-1.4.tar.gz
Algorithm Hash digest
SHA256 f4021d2579fc147c580cbd12984adb0f9ceb044f9d61d0ddde466b256b62b5e8
MD5 29779727b4b34619023d0209587db17a
BLAKE2b-256 029733d2efcf5121ec52e0268aa1f0ff0dc26d8d4a659fcb988f514644c606f3

See more details on using hashes here.

File details

Details for the file puremagic-1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for puremagic-1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 bdf727747220e9c221ecd4d06267dab7acfd78cfacb3923cc2506096558ad454
MD5 d344f31a1d818d8eb9a9e29c5e5020af
BLAKE2b-256 19fe343eabd56fd290f6a8b3e32512f8cd368f49eaf7eae57d70a987bd19ff44

See more details on using hashes here.

File details

Details for the file puremagic-1.4-py2-none-any.whl.

File metadata

File hashes

Hashes for puremagic-1.4-py2-none-any.whl
Algorithm Hash digest
SHA256 e48a0b7f899136c50b4d4ba0014d79fe7c30ebbfb7d0794b22b2f3de69d37fb3
MD5 04a3aa306c73b9a2f0136f69dd37084a
BLAKE2b-256 ee99ec7954239b2c5f25cdd4c7ae97b19a8ed944148b48f3eaafb43c0fad89b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page