Python function collecting the metadata of a directory and its contents
Project description
directory-structure-py
Python function collecting the metadata of a directory and its contents.
Requirements
- Python ≥ 3.12
Data model of metadata
get_metadata_of_single_file returns a dict object with the following format:
// for File
{
"@id": "id unique in the metadata tree",
"type": "File",
"parent": "parent directory info including '@id'",
"basename": "basename (ex. test.dat)",
"name": "file name (ex. test.dat -> test)",
"extension": "file extension (ex. test.dat -> .dat)",
"mimetype": "MIME type",
"contentSize": "file size (Byte)",
"sha256": "SHA-256 hash value",
"dateCreated": "creation datetime (%Y-%m-%dT%H:%M:%S)",
"dateModified": "modification datetime (%Y-%m-%dT%H:%M:%S)"
}
// for Directory
{
"@id": "id unique in the metadata tree",
"type": "Directory",
"parent": "parent directory info including '@id'",
"basename": "basename (ex. test.dat)",
"name": "directory name (same as the basename)",
"hasPart": ["`@id` or metadata of file or directory"],
"contentSize": "the total size of files included (Byte)",
"extension": ["unique file extension (ex. test.dat -> .dat)"],
"mimetype": ["unique MIME type"],
"numberOfContents": "the number of child contents",
"numberOfFiles": "the number of child files",
"numberOfFilesPerExtension": {"key = extension": "value = the number of files with the extension"},
"contentSizeOfAllFiles": "The total size of files within the directory and all its descendant directories in bytes",
"numberOfAllContents": "The total number of child items (files and subdirectories) within the directory and all its descendant directories",
"numberOfAllFiles": "The total number of files within the directory and all its descendant directories",
"numberOfAllFilesPerExtension": {"key = extension": "value = the number of the descendant files with the extension"},
"extensionsOfAllFiles": ["unique file extension (ex. test.dat -> .dat) extracted from the descendant files"],
"dateCreated": "creation datetime (%Y-%m-%dT%H:%M:%S)",
"dateModified": "modification datetime (%Y-%m-%dT%H:%M:%S)",
}
Functions
In get_metadata
| Function | Overview |
|---|---|
generate_id |
Generates a unique ID from a given path, optionally relative to a root path. |
get_metadata_of_single_file |
Retrieves metadata for a given file or directory using the pathlib module. |
get_metadata_of_files_in_list_format |
Recursively retrieves metadata for files and directories within a given path. |
get_metadata_of_files_in_list_format returns a dict object with the following format:
{
"root_path": "root path. The value will be '.' if the 'include_root_path' option is not set",
"@graph": ["metadata returned by get_metadata_of_single_file"]
}
In conversion
| Function | Overview |
|---|---|
convert_meta_list_json_to_tsv |
Converts a list of dictionaries (JSON-like structure) into a TSV-compatible list of lists. |
convert_meta_list_json_to_tsv_from_file |
Converts a JSON file containing a list of dictionaries into a TSV-compatible list of lists. |
list2tree |
Constructs a hierarchical tree structure from a metadata dictionary. |
list2tree_from_file |
Constructs a hierarchical tree structure from a JSON metadata file. |
convert_meta_list_json_to_rocrate |
Converts a metadata list JSON structure into a Research Object Crate (ROCrate). |
Installation
pip install from repository
pip install git+https://github.com/Surpris/directory-structure-py.git
git clone and pip install
git clone https://github.com/Surpris/directory-structure-py.git
cd directory-structure-py
pip install .
portable (only for Windows)
The portable file is also provided only for Windows. You can download it via the release section and decompress it.
Usage
CLI
python -m:
python -m directory_structure_py <file_or_directory_path> \
--dst <output_path> \
--include_root_path \ // option
--in_rocrate \ // option
--to_tsv \ // option
--in_tree \ // option
--structure_only \ // option
--log_config_path <log_config_path> \ // option
--log_output_path <log_output_path> // option
--preview_template_path <preview_template_path> \\ option
Main options:
| Item | Type | Description |
|---|---|---|
dst |
str | destination path of the json output. If empty, the metadata file will be output to the same directory as that of the input file. |
include_root_path |
(bool) | include file_or_directory_path with the key root_path if this option is set |
in_rocrate |
(bool) | output an RO-Crate-format file instead of the list format one if this option is set |
to_tsv |
(bool) | output a TSV-format file if this option is set |
in_tree |
(bool) | output the metadata in a tree format if this option is set |
structure_only |
(bool) | output only the structure in a tree format if this option is set |
preview_template_path |
str | file path of the template for the preview file output by the RO-Crate. |
Logging options:
| Item | Type | Description |
|---|---|---|
log_config_path |
str | a log config path. See config/logging.json for the detail of the content format. |
log_output_path |
str | destination path of the log. |
Batch file (Windows and Ubuntu)
Drag the directory or file and drop it on the batch file directory_structure_py.bat or on the shell script directory_structure_py.sh.
By default, the following files are output to the output directory in the directory where the batch file or the shell script is located.
directory_structure_metadata.json: the list-formatted metadata is included.directory_structure_metadata_tree.json: the directory tree is included.directory_structure_metadata.tsv: a metadata list is included.ro-crate-metadata.json: a metadata is included in the RO-Crate format.ro-crate-preview.html: a preview file of the RO-Crate metadata.
You can change the output formats by modifying the options set in teh batch file.
python
get_metadata_of_single_file
from directory_structure_py.src.get_metadata import get_metadata_of_single_file
fpath: str = "file_or_directory_path"
metadata: dict = get_metadata_of_single_file(fpath)
get_metadata_of_files_in_list_format
from directory_structure_py.src.get_metadata import get_metadata_of_files_in_list_format
fpath: str = "file_or_directory_path"
metadata: dict = get_metadata_of_files_in_list_format(fpath)
Output examples
Please see output/sample (jump to the GitHub repository).
Contributions
Any feedback is welcome via the Issue section!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file directory_structure_py-0.2.6.tar.gz.
File metadata
- Download URL: directory_structure_py-0.2.6.tar.gz
- Upload date:
- Size: 38.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da22119832da108130ed01ea3bedbb773f5208861e1742ee012f8741bbc34a78
|
|
| MD5 |
306dcbcf73ad5dfe420975f1d5b949db
|
|
| BLAKE2b-256 |
d4a5e9adb2ea2486fe387b6584f78fe2c5f9d1e8c52cbc10b07639268b04b366
|
File details
Details for the file directory_structure_py-0.2.6-py3-none-any.whl.
File metadata
- Download URL: directory_structure_py-0.2.6-py3-none-any.whl
- Upload date:
- Size: 46.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
17b9a5d254bdd1e3eb23c6b35a8598b8c38ac03859401bc39e00267639a6aa99
|
|
| MD5 |
3acb0218b56a2a2d298aa76a949ea407
|
|
| BLAKE2b-256 |
afa4e4e169e34dc85f4043566607643297d92a90a7abb7b100c93fa3c0996558
|