Skip to main content

Relabel files in order to work on them blind

Project description

Blind files

Generates a mapping from file names to blind but memorable file names. This script assumes that you have a directory that contains files and / or subdirectories with samples from an experiment. The names of these files and directories reveal which group the samples belong to, but the contents of the files do not.

The script will move these files to a new directory, renaming them so that the new file names do not reveal which group the samples belong to. It will also generate a mapping file to indicate how the new files map to the original files.

Installing

Using pipx

pipx install blind_files

Running

This script takes an input dir, and generates a directory containing a script, blind.sh, that can be used to blind the files in the input dir. It also generates a mapping csv, mapping.csv, that can be used after the user has done the analysis to see how the original names map to blinded names.

The script has two modes of operation:

Using a delimiter

In the first mode of operation, you can specify a delimiter to use such that all the text before the delimiter in each file name will be replaced. For example:

blind-files \
   --mode delimiter \
   --delimiter _foo \
   --input-dir input_dir \
   --output-dir output_dir \
   --mapping-dir mapping_dir

In this case, if input_dir contains the following files:

sample_1_foo.txt
sample_1_foo-bar.csv
sample_2_foo.txt
hello.txt

Then after running mapping_dir/blind.sh, output_dir will contain

golf_elbow_foo.txt
golf_elbow_foo-bar.csv
co-producer_reputation_foo.txt
hello.txt

In mapping_dir you will also find a file mapping.csv with the contents:

original,blinded
sample_1,golf_elbow
sample_2,co-producer_reputation

Limitations

This will only replace names at the top level of the input directory. If you have a more complex nested directory structure, where the identifer names may be buried in the directory tree, use identifier list approach described below.

Using a list of identifiers

In the second mode of operation, you can specify list of identifiers that should be blinded whenever they are encountered in the input directory tree. For example, if identifiers.txt contains the following:

group_a_1
group_b_1

then running

blind-files \
   --mode identifiers \
   --identifiers identifiers.txt \
   --input-dir input_dir \
   --output-dir output_dir \
   --mapping-dir mapping_dir

In this case, if input_dir contains the following files:

group_a_1/group_a_1/foo.txt
group_b_1/group_b_1/foo.txt
hello.txt

Then after running mapping_dir/blind.sh, output_dir will contain

head_bottle/head_bottle/foo.txt
eponym_curtain/eponym_curtain/foo.txt
hello.txt

In mapping_dir you will also find a file mapping.csv with the contents:

original,blinded
group_a_1,head_bottle
group_b_1,eponym_curtain

Limitations

No identifier can be a substring of any other identifier. For example, it is not allowed to have identifiers sample_1 and sample_11. However, sample_01 and sample_11 would be fine.

General limitations

  • This script should work on any platform, but has only been tested on Mac OS.
  • This script should handle symlinks by simply moving the symlink, without following it, but this behavior has not been tested.

Credits

This package was created with Cookiecutter.

nounlist from here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blind_files-0.2.6.tar.gz (23.0 kB view details)

Uploaded Source

Built Distribution

blind_files-0.2.6-py3-none-any.whl (22.3 kB view details)

Uploaded Python 3

File details

Details for the file blind_files-0.2.6.tar.gz.

File metadata

  • Download URL: blind_files-0.2.6.tar.gz
  • Upload date:
  • Size: 23.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.10.3 Darwin/21.4.0

File hashes

Hashes for blind_files-0.2.6.tar.gz
Algorithm Hash digest
SHA256 f8bd59698a729553f78ee648fef42e52bcb6707b60e07fbc08b94011fa468f78
MD5 438f444b521c344398c0758814ae6165
BLAKE2b-256 ef374e0d95482577e43862fbd20b7b5dfa03cd3c10c245a12d896615faea264f

See more details on using hashes here.

File details

Details for the file blind_files-0.2.6-py3-none-any.whl.

File metadata

  • Download URL: blind_files-0.2.6-py3-none-any.whl
  • Upload date:
  • Size: 22.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.10.3 Darwin/21.4.0

File hashes

Hashes for blind_files-0.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 c38b8cf21f7abd72ff41e88374fb685b69458f44cbd131d802003975770e5f52
MD5 adfb9c788c716f72fb9ccb23cdaa6b52
BLAKE2b-256 7eabff68d9a66d46609a267dc742a59cbdfb5f898b8e70a9fc30a71c86deb1c4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page