dtool CLI utilities for working with per item metadata
Project description
Installation
pip install dtool-overlay
Example usage
Get a dataset to play with:
LOCAL_DS_URI=$(dtool cp -q http://bit.ly/Ecoli-ref-genome .)
Show the existing overlays:
$ dtool overlays show $LOCAL_DS_URI identifiers,relpaths 23ebd7cd21a905d5f255919ca1d0491901cb8718,reference.4.bt2 37e2d68bb38271036d96b6979d24666e0d4fd814,reference.rev.1.bt2 41fb9ae5d4f6c37226ff324c701b84bc3110709e,reference.1.bt2 828ebf503926b7c1b8b07c1995b4ca818814b404,reference.rev.2.bt2 b445ff5a1e468ab48628a00a944cac2e007fb9bc,U00096.3.fasta d21454a7338c53eabc8d8ed7c2f9c3ff4585c4cf,reference.3.bt2 dda8452b346d51b9cf60f0662ef3d6e3b6da2e74,reference.2.bt2
The output above show that there are no overlays on this dataset. (The “identifiers” and “relpaths” columns are there for bookkeeping).
Create a “is_fasta” boolean overlay template by using a glob pattern:
$ dtool overlays template glob $LOCAL_DS_URI is_fasta '*.fasta' > is_fasta.csv $ cat is_fasta.csv identifiers,is_fasta,relpaths 23ebd7cd21a905d5f255919ca1d0491901cb8718,False,reference.4.bt2 37e2d68bb38271036d96b6979d24666e0d4fd814,False,reference.rev.1.bt2 41fb9ae5d4f6c37226ff324c701b84bc3110709e,False,reference.1.bt2 828ebf503926b7c1b8b07c1995b4ca818814b404,False,reference.rev.2.bt2 b445ff5a1e468ab48628a00a944cac2e007fb9bc,True,U00096.3.fasta d21454a7338c53eabc8d8ed7c2f9c3ff4585c4cf,False,reference.3.bt2 dda8452b346d51b9cf60f0662ef3d6e3b6da2e74,False,reference.2.bt2
Write the overlay template to the dataset:
$ dtool overlays write $LOCAL_DS_URI is_fasta.csv
Show the newly created overlay:
$ dtool overlays show $LOCAL_DS_URI identifiers,is_fasta,relpaths 23ebd7cd21a905d5f255919ca1d0491901cb8718,False,reference.4.bt2 37e2d68bb38271036d96b6979d24666e0d4fd814,False,reference.rev.1.bt2 41fb9ae5d4f6c37226ff324c701b84bc3110709e,False,reference.1.bt2 828ebf503926b7c1b8b07c1995b4ca818814b404,False,reference.rev.2.bt2 b445ff5a1e468ab48628a00a944cac2e007fb9bc,True,U00096.3.fasta d21454a7338c53eabc8d8ed7c2f9c3ff4585c4cf,False,reference.3.bt2 dda8452b346d51b9cf60f0662ef3d6e3b6da2e74,False,reference.2.bt2
To extract multiple pieces of metadata from the items’ relpath one can use the dtool overlays template parse command. This takes as input a dataset URI, a parse rule (see https://pypi.org/project/parse/ for more details) and a glob rule. The latter decides which relpaths to apply the parsing to.
Consider for example the dataset below:
$ dtool ls http://bit.ly/Ecoli-reads-minified 8bda245a8cd526673aab775f90206c8b67d196af ERR022075_2.fastq.gz 9760280dc6313d3bb598fa03c5931a7f037d7ffc ERR022075_1.fastq.gz
The command below could be used to generate a template for the overlays “useful_name” and “read”:
$ dtool overlays template parse \ http://bit.ly/Ecoli-reads-minified \ '{useful_name}_{read:d}.fastq.gz'
Results in the CSV output below:
identifiers,read,useful_name,relpaths 8bda245a8cd526673aab775f90206c8b67d196af,2,ERR022075,ERR022075_2.fastq.gz 9760280dc6313d3bb598fa03c5931a7f037d7ffc,1,ERR022075,ERR022075_1.fastq.gz
To ignore a variable element when parsing one can use unnamed curly braces. The command below for example only generates the overlay “useful_name”:
$ dtool overlays template parse \ http://bit.ly/Ecoli-reads-minified \ '{useful_name}_{:d}.fastq.gz' identifiers,useful_name,relpaths 8bda245a8cd526673aab775f90206c8b67d196af,ERR022075,ERR022075_2.fastq.gz 9760280dc6313d3bb598fa03c5931a7f037d7ffc,ERR022075,ERR022075_1.fastq.gz
Sometimes it is useful to be able to find pairs of items. For example when dealing with genomic sequencing data that has forward and reverse reads.
One can create a “pair_id” overlay CSV template for this dataset using the command below:
$ dtool overlays template pairs http://bit.ly/Ecoli-reads-minified .fastq.gz identifiers,pair_id,relpaths 8bda245a8cd526673aab775f90206c8b67d196af,9760280dc6313d3bb598fa03c5931a7f037d7ffc,ERR022075_2.fastq.gz 9760280dc6313d3bb598fa03c5931a7f037d7ffc,8bda245a8cd526673aab775f90206c8b67d196af,ERR022075_1.fastq.gz
In the above the suffix “.fastq.gz” is used to extract the prefix ERR022075_ that is used to find matching pairs.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file dtool-overlay-0.3.1.tar.gz
.
File metadata
- Download URL: dtool-overlay-0.3.1.tar.gz
- Upload date:
- Size: 7.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac10507de5f529219a8e3307e21d6492ed1876ef06669783f934c9fbaa03dda7 |
|
MD5 | 8099011a0d5d2337fd1a207c06b68926 |
|
BLAKE2b-256 | 82810d6d6752b22b8c12f990d0cb6de562325cf7909517f58b07ef5815727f30 |