Skip to main content

Relabel files in order to work on them blind

Project description

Blind files
===========

Generates a mapping from file names to blind but memorable file names. This
script assumes that you have a directory that contains files and / or
subdirectories with samples from an experiment. The names of these files and
directories reveal which group the samples belong to, but the contents of the
files do not.

The script will move these files to a new directory, renaming them so that the
new file names do not reveal which group the samples belong to. It will also
generate a mapping file to indicate how the new files map to the original
files.

Installing on OS X
------------------

1. Install [Homebrew](http://brew.sh/)
1. From a terminal, run

```
brew install python3
pip3 install -e .
```

Running on OS X
---------------

This script takes an input dir, and generates a directory containing a script,
`blind.sh`, that can be used to blind the files in the input dir. It also
generates a mapping csv, `mapping.csv`, that can be used after the user has
done the analysis to see how the original names map to blinded names.

The script has two modes of operation:

### Using a delimiter
In the first mode of operation, you can specify a delimiter to use such that
all the text before the delimiter in each file name will be replaced. For
example:

```sh
blind_files \
--mode delimiter \
--delimiter _foo \
--input-dir input_dir \
--output-dir output_dir \
--mapping-dir mapping_dir
```

In this case, if `input_dir` contains the following files:

```
sample_1_foo.txt
sample_1_foo-bar.csv
sample_2_foo.txt
hello.txt
```

Then after running `mapping_dir/blind.sh`, `output_dir` will contain

```
golf_elbow_foo.txt
golf_elbow_foo-bar.csv
co-producer_reputation_foo.txt
hello.txt
```

In `mapping_dir` you will also find a file `mapping.csv` with the contents:

```
original,blinded
sample_1,golf_elbow
sample_2,co-producer_reputation
```

#### Limitations
This will only replace names at the top level of the input directory. If you
have a more complex nested directory structure, where the identifer names may
be buried in the directory tree, use identifier list approach described below.

### Using a list of identifiers
In the second mode of operation, you can specify list of identifiers that
should be blinded whenever they are encountered in the input directory tree.
For example, if `identifiers.txt` contains the following:

```
group_a_1
group_b_1
```

then running

```sh
blind_files \
--mode identifiers \
--identifiers identifiers.txt \
--input-dir input_dir \
--output-dir output_dir \
--mapping-dir mapping_dir
```

In this case, if `input_dir` contains the following files:

```
group_a_1/group_a_1/foo.txt
group_b_1/group_b_1/foo.txt
hello.txt
```

Then after running `mapping_dir/blind.sh`, `output_dir` will contain

```
head_bottle/head_bottle/foo.txt
eponym_curtain/eponym_curtain/foo.txt
hello.txt
```

In `mapping_dir` you will also find a file `mapping.csv` with the contents:

```
original,blinded
group_a_1,head_bottle
group_b_1,eponym_curtain
```

#### Limitations
No identifier can be a substring of any other identifier. For example, it is
not allowed to have identifiers `sample_1` and `sample_11`. However,
`sample_01` and `sample_11` would be fine.

Credits
-------
This package was created with
[Cookiecutter](https://github.com/audreyr/cookiecutter-pypackage).

nounlist from [here](http://www.desiquintans.com/downloads/nounlist/nounlist.txt).


=======
History
=======

0.1.0 (2018-05-15)
------------------

* First release on PyPI.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for blind-files, version 0.2.1
Filename, size File type Python version Upload date Hashes
Filename, size blind_files-0.2.1-py2.py3-none-any.whl (6.5 kB) File type Wheel Python version py2.py3 Upload date Hashes View
Filename, size blind_files-0.2.1.tar.gz (15.3 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page