Skip to main content

Python function collecting the metadata of a directory and its contents

Project description

directory-structure-py

Python function collecting the metadata of a directory and its contents.

Release Notes CI License GitHub star chart Open Issues

Requirements

  • Python ≥ 3.12

Data model of metadata

get_metadata_of_single_file returns a dict object with the following format:

// for File
{
    "@id": "id unique in the metadata tree",
    "type": "File",
    "parent": "parent directory info including '@id'",
    "basename": "basename (ex. test.dat)",
    "name": "file name (ex. test.dat -> test)",
    "extension": "file extension (ex. test.dat -> .dat)",
    "mimetype": "MIME type",
    "contentSize": "file size (Byte)",
    "sha256": "SHA-256 hash value",
    "dateCreated": "creation datetime (%Y-%m-%dT%H:%M:%S)",
    "dateModified": "modification datetime (%Y-%m-%dT%H:%M:%S)"
}

// for Directory
{
    "@id": "id unique in the metadata tree",
    "type": "Directory",
    "parent": "parent directory info including '@id'",
    "basename": "basename (ex. test.dat)",
    "name": "directory name (same as the basename)",
    "hasPart": ["`@id` or metadata of file or directory"],
    "contentSize": "the total size of files included (Byte)",
    "extension": ["unique file extension (ex. test.dat -> .dat)"],
    "mimetype": ["unique MIME type"],
    "numberOfContents": "the number of child contents",
    "numberOfFiles": "the number of child files",
    "numberOfFilesPerExtension": {"key = extension": "value = the number of files with the extension"},
    "contentSizeOfAllFiles": "The total size of files within the directory and all its descendant directories in bytes",
    "numberOfAllContents": "The total number of child items (files and subdirectories) within the directory and all its descendant directories",
    "numberOfAllFiles": "The total number of files within the directory and all its descendant directories",
    "numberOfAllFilesPerExtension": {"key = extension": "value = the number of the descendant files with the extension"},
    "extensionsOfAllFiles": ["unique file extension (ex. test.dat -> .dat) extracted from the descendant files"],
    "dateCreated": "creation datetime (%Y-%m-%dT%H:%M:%S)",
    "dateModified": "modification datetime (%Y-%m-%dT%H:%M:%S)",
}

Functions

In get_metadata

Function Overview
generate_id Generates a unique ID from a given path, optionally relative to a root path.
get_metadata_of_single_file Retrieves metadata for a given file or directory using the pathlib module.
get_metadata_of_files_in_list_format Recursively retrieves metadata for files and directories within a given path.

get_metadata_of_files_in_list_format returns a dict object with the following format:

{
    "root_path": "root path. The value will be '.' if the 'include_root_path' option is not set",
    "@graph": ["metadata returned by get_metadata_of_single_file"]
}

In conversion

Function Overview
convert_meta_list_json_to_tsv Converts a list of dictionaries (JSON-like structure) into a TSV-compatible list of lists.
convert_meta_list_json_to_tsv_from_file Converts a JSON file containing a list of dictionaries into a TSV-compatible list of lists.
list2tree Constructs a hierarchical tree structure from a metadata dictionary.
list2tree_from_file Constructs a hierarchical tree structure from a JSON metadata file.
convert_meta_list_json_to_rocrate Converts a metadata list JSON structure into a Research Object Crate (ROCrate).

Installation

pip install from repository

pip install git+https://github.com/Surpris/directory-structure-py.git

git clone and pip install

git clone https://github.com/Surpris/directory-structure-py.git
cd directory-structure-py
pip install .

portable (only for Windows)

The portable file is also provided only for Windows. You can download it via the release section and decompress it.

Usage

CLI

python -m:

python -m directory_structure_py <file_or_directory_path> \
    --dst <output_path> \
    --include_root_path \ // option
    --in_rocrate \ // option
    --to_tsv \ // option
    --in_tree \ // option
    --structure_only \ // option
    --log_config_path <log_config_path> \ // option
    --log_output_path <log_output_path> // option
    --preview_template_path <preview_template_path> \\ option

Main options:

Item Type Description
dst str destination path of the json output. If empty, the metadata file will be output to the same directory as that of the input file.
include_root_path (bool) include file_or_directory_path with the key root_path if this option is set
in_rocrate (bool) output an RO-Crate-format file instead of the list format one if this option is set
to_tsv (bool) output a TSV-format file if this option is set
in_tree (bool) output the metadata in a tree format if this option is set
structure_only (bool) output only the structure in a tree format if this option is set
preview_template_path str file path of the template for the preview file output by the RO-Crate.

Logging options:

Item Type Description
log_config_path str a log config path. See config/logging.json for the detail of the content format.
log_output_path str destination path of the log.

Batch file (Windows and Ubuntu)

Drag the directory or file and drop it on the batch file directory_structure_py.bat or on the shell script directory_structure_py.sh. By default, the following files are output to the output directory in the directory where the batch file or the shell script is located.

  • directory_structure_metadata.json: the list-formatted metadata is included.
  • directory_structure_metadata_tree.json: the directory tree is included.
  • directory_structure_metadata.tsv: a metadata list is included.
  • ro-crate-metadata.json: a metadata is included in the RO-Crate format.
  • ro-crate-preview.html: a preview file of the RO-Crate metadata.

You can change the output formats by modifying the options set in teh batch file.

python

get_metadata_of_single_file

from directory_structure_py.src.get_metadata import get_metadata_of_single_file

fpath: str = "file_or_directory_path"
metadata: dict = get_metadata_of_single_file(fpath)

get_metadata_of_files_in_list_format

from directory_structure_py.src.get_metadata import get_metadata_of_files_in_list_format

fpath: str = "file_or_directory_path"
metadata: dict = get_metadata_of_files_in_list_format(fpath)

Output examples

Please see output/sample (jump to the GitHub repository).

Contributions

Any feedback is welcome via the Issue section!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

directory_structure_py-0.2.6.tar.gz (38.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

directory_structure_py-0.2.6-py3-none-any.whl (46.0 kB view details)

Uploaded Python 3

File details

Details for the file directory_structure_py-0.2.6.tar.gz.

File metadata

  • Download URL: directory_structure_py-0.2.6.tar.gz
  • Upload date:
  • Size: 38.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for directory_structure_py-0.2.6.tar.gz
Algorithm Hash digest
SHA256 da22119832da108130ed01ea3bedbb773f5208861e1742ee012f8741bbc34a78
MD5 306dcbcf73ad5dfe420975f1d5b949db
BLAKE2b-256 d4a5e9adb2ea2486fe387b6584f78fe2c5f9d1e8c52cbc10b07639268b04b366

See more details on using hashes here.

File details

Details for the file directory_structure_py-0.2.6-py3-none-any.whl.

File metadata

File hashes

Hashes for directory_structure_py-0.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 17b9a5d254bdd1e3eb23c6b35a8598b8c38ac03859401bc39e00267639a6aa99
MD5 3acb0218b56a2a2d298aa76a949ea407
BLAKE2b-256 afa4e4e169e34dc85f4043566607643297d92a90a7abb7b100c93fa3c0996558

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page