Skip to main content

Security-focused local file type detection powered by Google Magika

Project description

Detect File Type - Local

CI License: MIT PyPI Python 3.8+ Inference: Local/Offline API Keys

An OpenClaw skill for AI-powered local file type detection.

Wraps Google Magika to provide ML-based file type identification that runs entirely offline. No API keys, no network calls — just local inference on an embedded ONNX model.

Features

  • 214 file types detected by content, not extension
  • Fully offline — no network access required
  • Fast — only reads the bytes needed for classification
  • Batch support — process multiple files or entire directories
  • Multiple output formats — JSON, human-readable, bare MIME type
  • Security-focused triage — detect extension/content mismatch and suspicious polyglot content
  • Stdin support — default mode spools and classifies like file-path mode

Security Use Cases

  • Catch extension masquerading (invoice.pdf.exe, report.xlsx.lnk) before execution or ingestion.
  • Detect content/extension mismatch in upload and download pipelines.
  • Flag suspicious polyglot payloads where one file can be parsed as multiple formats (for example PDF/ZIP or PDF/HTA-style delivery chains).
  • Keep all analysis local for sensitive data workflows.

Related references:

Quick Start

pip install detect-file-type-local

# Detect a single file
detect_file_type document.pdf

# Batch detect
detect_file_type --human *.pdf *.png

# Recursive directory scan
detect_file_type -r ./uploads/

# Pipe from stdin
cat mystery_file | detect_file_type -

# Stdin fast path (best effort): read only first 1 MB
cat mystery_file | detect_file_type --stdin-mode head --stdin-max-bytes 1048576 -

Output Formats

JSON (default):

{
  "path": "photo.jpg",
  "label": "jpeg",
  "mime_type": "image/jpeg",
  "score": 0.99,
  "group": "image",
  "description": "JPEG image",
  "is_text": false
}

Human-readable:

photo.jpg: JPEG image (image/jpeg) [score: 0.99]

MIME-only:

image/jpeg

OpenClaw Skill

See SKILL.md for the OpenClaw skill definition, including structured output schemas and usage guidance for LLM integration.

OpenClaw skill metadata now auto-installs from PyPI package detect-file-type-local.

Stdin note: default --stdin-mode spool writes stdin to a temporary file and uses Magika path-based detection so begin/end file features are handled consistently with normal file input. --stdin-mode head is available as an explicit speed tradeoff.

Development

pip install -e '.[dev]'
pytest tests/ -v
ruff check .

Release

PyPI publishing is automated via GitHub Actions (Publish Python Package workflow):

  1. Create a GitHub release with a tag matching package version (for example, v0.2.0)
  2. Workflow builds and validates artifacts
  3. Workflow publishes to PyPI via trusted publishing

After PyPI release, update and republish the ClawHub skill metadata to enable auto-install from detect-file-type-local.

License

MIT — see LICENSE.

This project uses Google Magika (Apache-2.0). See NOTICE and THIRD_PARTY_LICENSES.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

detect_file_type_local-0.2.0.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

detect_file_type_local-0.2.0-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file detect_file_type_local-0.2.0.tar.gz.

File metadata

  • Download URL: detect_file_type_local-0.2.0.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for detect_file_type_local-0.2.0.tar.gz
Algorithm Hash digest
SHA256 e93d2dc9f569abbf20fc52523e1d31f1c16806279d6ce767dfab49e2d065d14a
MD5 19a0c0a024c65a1434bda7c378b86ac8
BLAKE2b-256 830950346840481b5b308c0d87742b60c59caac4b70c1b98b283170c8a92777a

See more details on using hashes here.

Provenance

The following attestation bundles were made for detect_file_type_local-0.2.0.tar.gz:

Publisher: pypi-publish.yml on pgeraghty/openclaw-detect-file-type-local

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file detect_file_type_local-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for detect_file_type_local-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4ce03a1251e92c4c257ffa5aac095c55c8236c7f4a69521dd6903a730b51b4a7
MD5 e95ac4e990ffaa10f8aee3f2c18cc724
BLAKE2b-256 6f01705eb5dafd55de748b5475cda35bd64ac7714fdd7a58cf237e0073e95fc1

See more details on using hashes here.

Provenance

The following attestation bundles were made for detect_file_type_local-0.2.0-py3-none-any.whl:

Publisher: pypi-publish.yml on pgeraghty/openclaw-detect-file-type-local

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page