A tool to determine the content type of a file with deep-learning
Project description
Magika Python package
Use Magika in your Python code!
Installing Magika
pip install magika
Using Magika as a command-line tool.
$ magika examples/*
code.asm: Assembly (code)
code.py: Python source (code)
doc.docx: Microsoft Word 2007+ document (document)
doc.ini: INI configuration file (text)
elf64.elf: ELF executable (executable)
flac.flac: FLAC audio bitstream data (audio)
image.bmp: BMP image data (image)
java.class: Java compiled bytecode (executable)
jpg.jpg: JPEG image data (image)
pdf.pdf: PDF document (document)
pe32.exe: PE executable (executable)
png.png: PNG image data (image)
README.md: Markdown document (text)
tar.tar: POSIX tar archive (archive)
webm.webm: WebM data (video)
Here's the full usage:
magika --help
Usage: magika [OPTIONS] [FILE]...
Magika - Determine type of FILEs with deep-learning.
Options:
-r, --recursive When passing this option, magika scans every
file within directories, instead of
outputting "directory"
--json Output in JSON format.
--jsonl Output in JSONL format.
-i, --mime-type Output the MIME type instead of a verbose
content type description.
-l, --label Output a simple label instead of a verbose
content type description. Use --list-output-
content-types for the list of supported
output.
-c, --compatibility-mode Compatibility mode: output is as close as
possible to `file` and colors are disabled.
-s, --output-score Output the prediction's score in addition to
the content type.
-m, --prediction-mode [best-guess|medium-confidence|high-confidence]
--batch-size INTEGER How many files to process in one batch.
--no-dereference This option causes symlinks not to be
followed. By default, symlinks are
dereferenced.
--colors / --no-colors Enable/disable use of colors.
-v, --verbose Enable more verbose output.
-vv, --debug Enable debug logging.
--generate-report Generate report useful when reporting
feedback.
--version Print the version and exit.
--list-output-content-types Show a list of supported content types.
--model-dir DIRECTORY Use a custom model.
-h, --help Show this message and exit.
Send any feedback to magika-dev@google.com or via GitHub issues.
Using Magika in Python
from magika import Magika
magika = Magika()
result = magika.identify_bytes(b"# Example\nThis is an example of markdown!")
print(result.output.ct_label) # Output: "markdown"
Reporting false positives
Please open an issue on Github.
Citation
If you use this software for your research, please cite it as:
@software{magika,
author = {Fratantonio, Yanick and Bursztein, Elie and Invernizzi, Luca and Zhang, Marina and Metitieri, Giancarlo and Kurt, Thomas and Galilee, Francois and Petit-Bianco, Alexandre and Farah, Loua and Albertini, Ange},
title = {{Magika content-type scanner}},
url = {https://github.com/google/magika}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
magika-0.5.0.tar.gz
(1.0 MB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file magika-0.5.0.tar.gz.
File metadata
- Download URL: magika-0.5.0.tar.gz
- Upload date:
- Size: 1.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.10.13 Linux/6.2.0-1019-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
afa0bb883086fe8dc912fc1e4066f8ba9afe2ee1ef98db64c2d76e6467a10ac7
|
|
| MD5 |
9baea7f4a91594bae48f45d36dd1d1e3
|
|
| BLAKE2b-256 |
63e41e3224d203084785067c4cb91f99080cd6f6639038ce2d3e142529792296
|
File details
Details for the file magika-0.5.0-py3-none-any.whl.
File metadata
- Download URL: magika-0.5.0-py3-none-any.whl
- Upload date:
- Size: 1.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.10.13 Linux/6.2.0-1019-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f3e5bb6965cd8be11d57e9bef67aeefcab152af81ae602ab6f201871cd7b9290
|
|
| MD5 |
64ac40e0c087dc468ca44868f084a943
|
|
| BLAKE2b-256 |
9afdbf5a2d39592128c9c2f9ac1251aa379412ff99b6335e51a5decec73edcf6
|