Skip to main content

Security-focused local file type detection powered by Google Magika

Project description

detect-file-type-local

CI License: MIT PyPI Python 3.8+ Inference: Local/Offline API Keys

An OpenClaw skill for AI-powered local file type detection.

Wraps Google Magika to provide ML-based file type identification that runs entirely offline. No API keys, no network calls — just local inference on an embedded ONNX model.

Features

  • 214 file types detected by content, not extension
  • Fully offline — no network access required
  • Fast — only reads the bytes needed for classification
  • Batch support — process multiple files or entire directories
  • Multiple output formats — JSON, human-readable, bare MIME type
  • Security-focused triage — detect extension/content mismatch and suspicious polyglot content
  • Stdin support — default mode spools and classifies like file-path mode

Security Use Cases

  • Catch extension masquerading (invoice.pdf.exe, report.xlsx.lnk) before execution or ingestion.
  • Detect content/extension mismatch in upload and download pipelines.
  • Flag suspicious polyglot payloads where one file can be parsed as multiple formats (for example PDF/ZIP or PDF/HTA-style delivery chains).
  • Keep all analysis local for sensitive data workflows.

Related references:

Quick Start

pip install detect-file-type-local

# Detect a single file
detect-file-type-local document.pdf

# Batch detect
detect-file-type-local --human *.pdf *.png

# Recursive directory scan
detect-file-type-local -r ./uploads/

# Pipe from stdin
cat mystery_file | detect-file-type-local -

# Stdin fast path (best effort): read only first 1 MB
cat mystery_file | detect-file-type-local --stdin-mode head --stdin-max-bytes 1048576 -

Compatibility alias: detect-file-type remains available.

Output Formats

JSON (default):

{
  "path": "photo.jpg",
  "label": "jpeg",
  "mime_type": "image/jpeg",
  "score": 0.99,
  "group": "image",
  "description": "JPEG image",
  "is_text": false
}

Human-readable:

photo.jpg: JPEG image (image/jpeg) [score: 0.99]

MIME-only:

image/jpeg

OpenClaw Skill

See SKILL.md for the OpenClaw skill definition, including structured output schemas and usage guidance for LLM integration.

OpenClaw skill metadata now auto-installs from PyPI package detect-file-type-local.

Stdin note: default --stdin-mode spool writes stdin to a temporary file and uses Magika path-based detection so begin/end file features are handled consistently with normal file input. --stdin-mode head is available as an explicit speed tradeoff.

Development

pip install -e '.[dev]'
pytest tests/ -v
ruff check .

Release

PyPI publishing is automated via GitHub Actions (Publish Python Package workflow):

  1. Create a GitHub release with a tag matching package version (for example, v0.1.0)
  2. Workflow builds and validates artifacts
  3. Workflow publishes to PyPI via trusted publishing

After PyPI release, update and republish the ClawHub skill metadata to enable auto-install from detect-file-type-local.

License

MIT — see LICENSE.

This project uses Google Magika (Apache-2.0). See NOTICE and THIRD_PARTY_LICENSES.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

detect_file_type_local-0.1.2.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

detect_file_type_local-0.1.2-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file detect_file_type_local-0.1.2.tar.gz.

File metadata

  • Download URL: detect_file_type_local-0.1.2.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for detect_file_type_local-0.1.2.tar.gz
Algorithm Hash digest
SHA256 4854a1e97962a5144c200591f6751978a3021c68439578e18fd5c0fe1707a95b
MD5 ac3b2d1eaa2ab35f2bc4df207049708b
BLAKE2b-256 9dbc046afdf6d22dbf6c5edae90f4008c31424edfc983ff75d715bf47f317649

See more details on using hashes here.

Provenance

The following attestation bundles were made for detect_file_type_local-0.1.2.tar.gz:

Publisher: pypi-publish.yml on pgeraghty/openclaw-detect-file-type-local

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file detect_file_type_local-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for detect_file_type_local-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6fed207794e462086ab4af28bb075edf4cfcce762c8e9a147b42db598e05d730
MD5 db3a1d71cdea5d4e92b47a610cbdc52c
BLAKE2b-256 caa3ff36eae2fe635c05f99cf67ae2fb2127f6b2b39f6f739f253fc59230bc0d

See more details on using hashes here.

Provenance

The following attestation bundles were made for detect_file_type_local-0.1.2-py3-none-any.whl:

Publisher: pypi-publish.yml on pgeraghty/openclaw-detect-file-type-local

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page