Security-focused local file type detection powered by Google Magika
Project description
detect-file-type-local
An OpenClaw skill for AI-powered local file type detection.
Wraps Google Magika to provide ML-based file type identification that runs entirely offline. No API keys, no network calls — just local inference on an embedded ONNX model.
Features
- 214 file types detected by content, not extension
- Fully offline — no network access required
- Fast — only reads the bytes needed for classification
- Batch support — process multiple files or entire directories
- Multiple output formats — JSON, human-readable, bare MIME type
- Security-focused triage — detect extension/content mismatch and suspicious polyglot content
- Stdin support — default mode spools and classifies like file-path mode
Security Use Cases
- Catch extension masquerading (
invoice.pdf.exe,report.xlsx.lnk) before execution or ingestion. - Detect content/extension mismatch in upload and download pipelines.
- Flag suspicious polyglot payloads where one file can be parsed as multiple formats (for example PDF/ZIP or PDF/HTA-style delivery chains).
- Keep all analysis local for sensitive data workflows.
Related references:
- MITRE ATT&CK: Masquerading
- Proofpoint: Call It What You Want, Threat Actor Delivers Highly Targeted Multistage Polyglot
Quick Start
pip install detect-file-type-local
# Detect a single file
detect-file-type-local document.pdf
# Batch detect
detect-file-type-local --human *.pdf *.png
# Recursive directory scan
detect-file-type-local -r ./uploads/
# Pipe from stdin
cat mystery_file | detect-file-type-local -
# Stdin fast path (best effort): read only first 1 MB
cat mystery_file | detect-file-type-local --stdin-mode head --stdin-max-bytes 1048576 -
Compatibility alias: detect-file-type remains available.
Output Formats
JSON (default):
{
"path": "photo.jpg",
"label": "jpeg",
"mime_type": "image/jpeg",
"score": 0.99,
"group": "image",
"description": "JPEG image",
"is_text": false
}
Human-readable:
photo.jpg: JPEG image (image/jpeg) [score: 0.99]
MIME-only:
image/jpeg
OpenClaw Skill
See SKILL.md for the OpenClaw skill definition, including structured output schemas and usage guidance for LLM integration.
OpenClaw skill metadata now auto-installs from PyPI package detect-file-type-local.
Stdin note: default --stdin-mode spool writes stdin to a temporary file and uses Magika path-based detection so begin/end file features are handled consistently with normal file input. --stdin-mode head is available as an explicit speed tradeoff.
Development
pip install -e '.[dev]'
pytest tests/ -v
ruff check .
Release
PyPI publishing is automated via GitHub Actions (Publish Python Package workflow):
- Create a GitHub release with a tag matching package version (for example,
v0.1.0) - Workflow builds and validates artifacts
- Workflow publishes to PyPI via trusted publishing
After PyPI release, update and republish the ClawHub skill metadata to enable auto-install from detect-file-type-local.
License
MIT — see LICENSE.
This project uses Google Magika (Apache-2.0). See NOTICE and THIRD_PARTY_LICENSES.md.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file detect_file_type_local-0.1.2.tar.gz.
File metadata
- Download URL: detect_file_type_local-0.1.2.tar.gz
- Upload date:
- Size: 9.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4854a1e97962a5144c200591f6751978a3021c68439578e18fd5c0fe1707a95b
|
|
| MD5 |
ac3b2d1eaa2ab35f2bc4df207049708b
|
|
| BLAKE2b-256 |
9dbc046afdf6d22dbf6c5edae90f4008c31424edfc983ff75d715bf47f317649
|
Provenance
The following attestation bundles were made for detect_file_type_local-0.1.2.tar.gz:
Publisher:
pypi-publish.yml on pgeraghty/openclaw-detect-file-type-local
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
detect_file_type_local-0.1.2.tar.gz -
Subject digest:
4854a1e97962a5144c200591f6751978a3021c68439578e18fd5c0fe1707a95b - Sigstore transparency entry: 997361835
- Sigstore integration time:
-
Permalink:
pgeraghty/openclaw-detect-file-type-local@84481808eeb992117f77f935cb6cbf9287c38975 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/pgeraghty
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@84481808eeb992117f77f935cb6cbf9287c38975 -
Trigger Event:
push
-
Statement type:
File details
Details for the file detect_file_type_local-0.1.2-py3-none-any.whl.
File metadata
- Download URL: detect_file_type_local-0.1.2-py3-none-any.whl
- Upload date:
- Size: 8.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6fed207794e462086ab4af28bb075edf4cfcce762c8e9a147b42db598e05d730
|
|
| MD5 |
db3a1d71cdea5d4e92b47a610cbdc52c
|
|
| BLAKE2b-256 |
caa3ff36eae2fe635c05f99cf67ae2fb2127f6b2b39f6f739f253fc59230bc0d
|
Provenance
The following attestation bundles were made for detect_file_type_local-0.1.2-py3-none-any.whl:
Publisher:
pypi-publish.yml on pgeraghty/openclaw-detect-file-type-local
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
detect_file_type_local-0.1.2-py3-none-any.whl -
Subject digest:
6fed207794e462086ab4af28bb075edf4cfcce762c8e9a147b42db598e05d730 - Sigstore transparency entry: 997361838
- Sigstore integration time:
-
Permalink:
pgeraghty/openclaw-detect-file-type-local@84481808eeb992117f77f935cb6cbf9287c38975 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/pgeraghty
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@84481808eeb992117f77f935cb6cbf9287c38975 -
Trigger Event:
push
-
Statement type: