Skip to main content

A tool to determine the content type of a file with deep-learning

Project description

Magika Python package

Use Magika in your Python code!

Installing Magika

pip install magika

Using Magika as a command-line tool.

$ magika examples/*

code.asm: Assembly (code)
code.py: Python source (code)
doc.docx: Microsoft Word 2007+ document (document)
doc.ini: INI configuration file (text)
elf64.elf: ELF executable (executable)
flac.flac: FLAC audio bitstream data (audio)
image.bmp: BMP image data (image)
java.class: Java compiled bytecode (executable)
jpg.jpg: JPEG image data (image)
pdf.pdf: PDF document (document)
pe32.exe: PE executable (executable)
png.png: PNG image data (image)
README.md: Markdown document (text)
tar.tar: POSIX tar archive (archive)
webm.webm: WebM data (video)

Here's the full usage:

 magika --help
Usage: magika [OPTIONS] [FILE]...

  Magika - Determine type of FILEs with deep-learning.

Options:
  -r, --recursive                 When passing this option, magika scans every
                                  file within directories, instead of
                                  outputting "directory"
  --json                          Output in JSON format.
  --jsonl                         Output in JSONL format.
  -i, --mime-type                 Output the MIME type instead of a verbose
                                  content type description.
  -l, --label                     Output a simple label instead of a verbose
                                  content type description. Use --list-output-
                                  content-types for the list of supported
                                  output.
  -c, --compatibility-mode        Compatibility mode: output is as close as
                                  possible to `file` and colors are disabled.
  -s, --output-score              Output the prediction's score in addition to
                                  the content type.
  -m, --prediction-mode [best-guess|medium-confidence|high-confidence]
  --batch-size INTEGER            How many files to process in one batch.
  --no-dereference                This option causes symlinks not to be
                                  followed. By default, symlinks are
                                  dereferenced.
  --colors / --no-colors          Enable/disable use of colors.
  -v, --verbose                   Enable more verbose output.
  -vv, --debug                    Enable debug logging.
  --generate-report               Generate report useful when reporting
                                  feedback.
  --version                       Print the version and exit.
  --list-output-content-types     Show a list of supported content types.
  --model-dir DIRECTORY           Use a custom model.
  -h, --help                      Show this message and exit.

  Send any feedback to magika-dev@google.com or via GitHub issues.

Using Magika in Python

from magika import Magika
magika = Magika()
result = magika.identify_bytes(b"# Example\nThis is an example of markdown!")
print(result.output.ct_label)  # Output: "markdown"

Reporting false positives

Please open an issue on Github.

Citation

If you use this software for your research, please cite it as:

@software{magika,
author = {Fratantonio, Yanick and Bursztein, Elie and Invernizzi, Luca and Zhang, Marina and Metitieri, Giancarlo and Kurt, Thomas and Galilee, Francois and Petit-Bianco, Alexandre and Farah, Loua and Albertini, Ange},
title = {{Magika content-type scanner}},
url = {https://github.com/google/magika}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

magika-0.5.0.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

magika-0.5.0-py3-none-any.whl (1.0 MB view details)

Uploaded Python 3

File details

Details for the file magika-0.5.0.tar.gz.

File metadata

  • Download URL: magika-0.5.0.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.10.13 Linux/6.2.0-1019-azure

File hashes

Hashes for magika-0.5.0.tar.gz
Algorithm Hash digest
SHA256 afa0bb883086fe8dc912fc1e4066f8ba9afe2ee1ef98db64c2d76e6467a10ac7
MD5 9baea7f4a91594bae48f45d36dd1d1e3
BLAKE2b-256 63e41e3224d203084785067c4cb91f99080cd6f6639038ce2d3e142529792296

See more details on using hashes here.

File details

Details for the file magika-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: magika-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.10.13 Linux/6.2.0-1019-azure

File hashes

Hashes for magika-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f3e5bb6965cd8be11d57e9bef67aeefcab152af81ae602ab6f201871cd7b9290
MD5 64ac40e0c087dc468ca44868f084a943
BLAKE2b-256 9afdbf5a2d39592128c9c2f9ac1251aa379412ff99b6335e51a5decec73edcf6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page