Skip to main content

A helpful cross-platorm utility for grouping files across many locations.

Project description

fgroup

A helpful cross-platorm utility for grouping files across many locations.

fgroup is a tool designed to help keep track of and organize files on your entire system, for doing things like backing up certain files, cleaning up junk files created by various programs, or organizing the files on your desktop based on their extension. Ungrouped files will be automatically sent to the "unknown" group so you always know what you missed. Before getting started, I recommend reading the pitfalls section below.

All functionality is thoroughly tested via pytest. If you do still find any issues or want to request a feature, feel free to do so!

You can install the script and run it via your tool of choice through PyPI (pip install fgroup).

License

The license (MIT) is available in the repo. If you are doing anything sensitive with this program - please always make a backup first.

Contact

Contact me on Discord and support me on Ko-Fi!

Pitfalls

  • Files that cannot be found regularly or via permission errors will not be placed into a group.
  • In fgroup globs, the characters *?[] and the string , have special meaning. E.g. multiple globs on one line are separated by a comma followed by exactly one space. If you need to escape any of these characters, put it in a character group by itself: [*] or [[]
  • By default, if a parent directory is placed into a group first, none of it's children files/folders can be placed into one. This is intentional - it wouldn't make sense to say, "backup this entire folder which includes this file", then to say, "also, delete this file.". To get around this, place the subfile into a group first, then the parent directory (or the rest of it's children with *) into a group.
  • As an extension of the previous point, * will re-match any folders that have not had all of their children placed into a group yet. This rule does not apply if -d is set. For more info, see the second example below.
  • The -d flag overrides this behavior to allow paths to be in distinct groups. As a consequence, for performance reasons, ungrouped files will not be placed into the unknown group by default. The result of any operations on the grouped files (say, backing up files and deleting others) may be different depending on the order they are performed in.
  • Use -a to output absolute paths when using a custom root. By default, paths are given relative to the root directory.
  • On Windows, all paths are made absolute and evaluated relative to \\?\, so there's no need to worry about path limits. However, this also means you should not use \\?\ in any glob or the root path. You can use UNC\system\drive with a blank ("") root to mean \\?\UNC\system\drive.
  • YAML files have some strange features that may cause the file to not function how you expect it to (e.g., * characters have special meaning), so if a string doesn't do what you expect and gives an error to fgroup, put keys and values in quotes like so: "Some File, Another File": "backup". You MAY NOT, however, do this: "Some File", "Another File": "backup". You can also use a JSON file as a config instead if you would prefer.

Examples

You can provide a list of globs directly to fgroup on the command-line to tell it how to group files:

$ ls
script.py  update.py  random.txt  important.txt  letters.txt
$ fgroup -m "*.py:python" "important*:important" "*.txt:text"
important
important.txt

python
script.py
update.py

text
letters.txt
random.txt

Or you can provide a yaml/json config with a tree-like structure to fgroup instead:

# config.yaml
root: /home/mgz
files:
    Desktop, Downloads: sync
    Music, Videos, Public: sync
    Pictures, Templates: backup

    # Define dotfiles to include in backup or similar
    .config:
        GIMP: backup
        obs-studio: backup
    .local/share: sync
    .ssh: backup

    # Remove other dotfiles because they clutter up space
    "*": delete

    # Note that "*" here will include .local/state and similar,
    # because .local was not fully defined, only .local/share was.
    # It will not match .config/* though, as ".config" is fully defined.
    # Prefer to expand out paths (like .config above) instead because of this.

    # Or just... don't delete files by default.
$ fgroup output.json -c config.yaml -i -a
$ cat output.json
{
    "backup": [
        "/home/mgz/.config/GIMP",
        "/home/mgz/.config/obs-studio",
        "/home/mgz/.ssh",
        "/home/mgz/Pictures",
        "/home/mgz/Templates"
    ],
    "delete": [
        "/home/mgz/.cache",
        "/home/mgz/.local/state",
        "/home/mgz/.npm",
        "/home/mgz/.pki",
        ...
    ],
    "sync": [
        "/home/mgz/.local/share",
        "/home/mgz/Desktop",
        "/home/mgz/Downloads",
        "/home/mgz/Music",
        "/home/mgz/Videos"
    ],
    "unknown": [
        "/home/mgz/.config/QDirStat",
        "/home/mgz/.config/VSCodium",
        "/home/mgz/.config/qalculate",
        ...
    ]
}

Detailed help is available on the command line with -h or in the documentation text below.

Automation

Grouped paths can be outputed to stdout, a txt/json/yaml file, or a bunch of text files in a folder.

Choose the format you want and send any outputted files to another utility:

$ fgroup groups -f folder -c config.yaml
$ restic -r backups-repo backup --files-from-verbatim groups/backup.txt

Or create a python script to execute with the collected files:

# script.py
def run_action_backup(file_list: list[str]):
    print(f"Group backup has {len(file_list)} files")
def run_actions(groups: dict[str, list[str]]):
    print(f"{len(groups)} groups total")
$ fgroup -c config.yaml -s script.py
Group backup has 5 files
4 groups total

fgroup can also be used as a normal python library:

import fgroup
grouper = fgroup.group(".", {"*": "here"}, absolute=True)
print(grouper.groups)

Documentation

usage: fgroup [-h] [-a] [-d] [-c CONFIG] [-m [P:G ...]] [-r [ROOT]] [-f TYPE]
              [-t [N]] [-g GROUP] [-i [N]] [-o [G:N ...]] [-s SCRIPT]
              [-A [ARG ...]]
              [output]

A helpful cross-platorm utility for grouping files across many locations.

Groups paths based on the globs given through the -m option and the "config"
file (-c) if provided. Outputs the result to the given "output" path. If a file
is not matched, it is placed into the default group ("unknown"). For an example
config file, see the git repo.

By default, if a parent directory is grouped, none of it's children can be. To
allow them to be grouped separately, use the -d option.

positional arguments:
  output                    Path to output to. If blank, stdout is used.

options:
  -h, --help                Show this help message and exit.
  -a, --absolute            Output paths as absolute paths instead of paths
                            relative to the root path.
  -d, --distinct            If set, a parent folder and it's descendants can be
                            given distinct groups. Consequently, unmatched paths
                            will not be placed in the default group.
  -c, --config CONFIG       A config file used to group various files/folders.
  -m, --manual [P:G ...]    File globs (P) executed on the root path. Matching
                            paths will be given group (G). These have higher
                            priority than the globs in the config file.
  -r, --root [ROOT]         Changes the root path where files/folders are
                            grouped from. This setting has higher priority than
                            the root set in the config. (default: ".", default
                            with option: "")
  -f, --format TYPE         Change the output format used to print out results.
                            Options are "text", "json", "yaml", and "folder".
  -t, --top [N]             Output top N path weights (all: 0, default: 10).
                            Paths that glob more have a higher weight.
                            See below for more info on this metric.
                            Not compatible with -g.
  -g, --group GROUP         If set, outputs only the paths in the given group.
                            Not compatible with -t.
  -i, --indent [N]          For formats "json" and "yaml", indents and nicely
                            formats output. (default: 4)
  -o, --override [G:N ...]  A list of group overrides, one per argument.
                            Using group G directly will instead use group N.
  -s, --script SCRIPT       A python script to load and execute with all groups.
                            Using this argument will prevent output to the given
                            output path.See below for more info. Not compatible
                            with -f, -t, -g, or -i.
  -A, --args [ARG ...]      Additional arguments to pass on to the -s script as
                            strings. Each argument is only given to the script's
                            run_actions() function.

Output is given in four different formats. "text" is a long header-separated
format, "json" and "yaml" output as dictionaries with the keys as groups and the
values lists of paths in those groups, and "folder" creates a directory at the
given output and places one file per group into it (group.txt)

If -t is set, output will be in the form of a table with the first column being
the weight (right aligned) and the second column being the path. "text" is a
simple-text table, "json" and "yaml" are 2-list deep tables (with the weight and
filepath indexes switched), and "folder" is unsupported.

If -g is set, only a single list is printed, with "text" being one path per
line, "json" and "yaml" being lists in those formats, and "folder" only outputs
to one sub-file.

The weight metric generally represents how much work a given path has taken to
glob through, and it is good for finding paths that are taking too long to
process and might need to be given a manual override in the config. Currently,
the number is equal to the number of times any given path has been globbed on
directly or indirectly through recursive globs, plus the number of times a
descendant node has been created.

The script given by -s will be loaded through an import statement. Any function
with the name "run_action_(group)" will be called with the list for that group.
The function "run_actions()" will be called with the full dictionary of all
groups. Groups without functions will be ignored.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fgroup-1.2.2.tar.gz (26.5 kB view details)

Uploaded Source

Built Distribution

fgroup-1.2.2-py3-none-any.whl (26.7 kB view details)

Uploaded Python 3

File details

Details for the file fgroup-1.2.2.tar.gz.

File metadata

  • Download URL: fgroup-1.2.2.tar.gz
  • Upload date:
  • Size: 26.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.5 Linux/6.6.52

File hashes

Hashes for fgroup-1.2.2.tar.gz
Algorithm Hash digest
SHA256 e4e6e3165c549c90ceafa4c44803ac657a2981ec8bc5d609be8ab4fd9aa1b4c9
MD5 660fa6e42b02e668e1ddc974947b8e10
BLAKE2b-256 cd370f541c890a60442350b647444977b4518ad50131e0731f38b0c5dc0b9230

See more details on using hashes here.

File details

Details for the file fgroup-1.2.2-py3-none-any.whl.

File metadata

  • Download URL: fgroup-1.2.2-py3-none-any.whl
  • Upload date:
  • Size: 26.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.5 Linux/6.6.52

File hashes

Hashes for fgroup-1.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7ae30ef25f48be50d0e7391df5f6d94d9cd93f4900ebc599a07b4743e15439a7
MD5 340196092c0715dda30827463ca97695
BLAKE2b-256 abef63955855049b97099760d2bef5a477bd5827353d8823b94cd64a79a20264

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page