Skip to main content

Detects files in a directory and its subdirectories.

Project description

FileDetect

PyPI - Version PyPI - Python Version


Table of Contents

Installation

pip install filedetect

Description

filedetect is a Python package that provides a simple way to detect file formats in a directory tree. This project relies on the pathlib library to traverse the directory tree and separate files suffixes from their full name. One can specify the suffixes to detect, the maximum depth of detection, and the formats to look for.

For convinence, some common suffixes are already defined for the following formats:

  • video => {".mp4", ".mkv", ".avi", ".mov", ".wmv", ".flv", ".webm", ".m4v", ".mpeg", ".mpg", ".3gp", ".3g2", ".vob", ".ts", ".m2ts", ".rmvb", ".mxf", ".drc", ".amv", ".f4v", ".svi", ".m1v", ".m2v"}
  • audio => {".mp3", ".wav", ".flac", ".aac", ".ogg", ".wma", ".m4a", ".opus", ".alac", ".aiff"}
  • image => {".jpg", ".jpeg", ".png", ".gif", ".bmp", ".tiff", ".webp", ".svg", ".heif", ".raw"}
  • plain_text => {".txt"}
  • csv => {".csv"}
  • json => {".json"}
  • html => {".html", ".htm"}
  • xml => {".xml"}
  • excel => {".xls", ".xlsx", ".odf", ".ods"}
  • pdf => {".pdf"}
  • doc => {".doc", ".docx", ".odt"}
  • ppt => {".ppt", ".pptx", ".odp"}
  • archive => {".zip", ".tar", ".gz", ".bz2", ".xz", ".7z", ".rar", ".tar.gz", ".tar.bz2", ".tar.xz"}
  • text => {".txt", ".csv", ".json", ".xml", ".html", ".htm", ".py", ".java", ".c", ".cpp", ".h", ".js", ".css", ".md", ".rst", ".tex", ".sql", ".css", ".yaml", ".yml", ".toml", ".ini", ".properties", ".log"}

But you can also specify your own suffixes to detect by passing a set of suffixes to the suffixes parameter instead of the format parameter. And you also are welcome to suggest new formats to be added to the package or existing formats to be updated by creating a pull request or an issue on the GitHub repository !

Usage

CLI

filedetect [-h] [--list_formats] [--version] path [--format {video,image,audio,plain_text,csv,json,html,xml,excel,pdf,doc,ppt,archive,text,all,}] [--deep int] [--only_stems stem1,stem2] [--suffixes sfx1,sfx2]
  • The path argument is required and should be the path to the directory you want to scan.

  • You can either specify the --format or the --suffixes to detect, but not both at the same time. If both aren't specified, all formats will be detected.

  • The --only_stems option allows you to specify a set of stems to detect to filter for but is optional.

  • The --deep option allows you to specify the maximum depth of detection. If not specified, the default value is -1, which means unlimited depth.

  • The --list_formats option allows you to list all the formats available in the package and their corresponding suffixes and exits.

  • The --version option allows you to display the version of the package and exits.

  • The --help option allows you to display the help message and exits."

Python

from filedetect import FileDetect

detector = FileDetect.find(
path="path/to/dir",
format=None, # ["video", "audio", "image", "text", "csv"; "json", "html"] | None for all formats
deep=-1, # positive int for maximum depth of detection or -1 for unlimited
only_stems=None, # None | a set of stems to detect,
suffixes=None, # None | a set of suffixes to detect (in place of format),
)
print(detector.result)

or

from filedetect import FileDetect

detector = FileDetect(
format=None, # ["video", "audio", "image", "text", "csv"; "json", "html"] | None for all formats
deep=-1, # positive int for maximum depth of detection or -1 for unlimited
only_stems=None, # None | a set of stems to detect,
suffixes=None, # None | a set of suffixes to detect (in place of format),
)

detector.run(
    path="path/to/dir",
)

detector.run(
    path="another/path/to/dir",
)

...

print(detector.result)

License

filedect is distributed under the terms of the AGPLv3 license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filedetect-0.1.6.tar.gz (19.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

filedetect-0.1.6-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file filedetect-0.1.6.tar.gz.

File metadata

  • Download URL: filedetect-0.1.6.tar.gz
  • Upload date:
  • Size: 19.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.2

File hashes

Hashes for filedetect-0.1.6.tar.gz
Algorithm Hash digest
SHA256 f1504cd3b13c0021bd04ae70444e4924fd8bf89ce4d695660f294b4b7fd2de0f
MD5 3c2ba612bf76c55711ddfc3ba8dced96
BLAKE2b-256 68aaf0e4cc7489f14c58fdb561c73ed917dbaf4e2e6924be6acbfa9ea35cc77c

See more details on using hashes here.

File details

Details for the file filedetect-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: filedetect-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 18.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.2

File hashes

Hashes for filedetect-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 bc9c9a7bf0e4c69a52f3731b262805a542eb48e497b087b16e6a2c0e461a8db7
MD5 6e1d9eb43effdf2b3e7afb532654925c
BLAKE2b-256 25d0d17cf14401661ed81c06f92e417b0f343b978e339aafb5b8e32420ff9c24

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page