Skip to main content

Small tools/scripts written in Python for MDU

Project description

MDU Python Tools

CircleCI PyPI - Python Version PyPI PyPI - License

Background

Some simple tools in python for MDU

Tools

mdu-merge-ngs-lanes

Use it to correctly merge lanes from an Illumina run into the a single FASTQ.

Get help:

mdu-merge-ngs-lanes --help

Basic usage:

mdu-merge-ngs-lanes -i /path/to/fastq_folder -o /path/to/output > cmd.sh

Advanced usage:

You can split the output to muliple subfolders of the output folder by adding --subfolder to the command line. The option can be used multiple times, and takes two space separated values as input: path regex. The path gives a name of the subfolder in the output folder, and the regex expression determines which samples go in that subfolder.

For instance, the command below will split samples starting the NTC in to a subfolder called ntc, while all other samples will be added to a subfolder called data.

mdu-merge-ngs-lanes -i /path/to/fastq -o /path/to/output --subfolder 'data' '(?!NTC).*' --subfolder 'ntc' '(?<=NTC).*' > cmd.sh

mdu-sra-uploads

Use to it to upload FASTQ data to NCBI SRA.

Requires a file with tab-separated values of MDU ID and AUSMDUID. For example:

mdu1\tausmdu1

mdu2\tausmdu2

Getting help:

mdu-sra-uploads --help
Usage: mdu-sra-upload [OPTIONS] ISOLATES

Options:
  -f, --folder TEXT         Folder on NCBI to upload. Used to find the reads
                            when submitting via the SRA portal.  [default:
                            mdu]
  -r, --reads-folder TEXT   Where reads are located (uses MDU_READS env
                            variable if available).
  -k, --ascp-key TEXT       Path to ascp ssh upload key (uses ASCP_UPLOAD_KEY
                            env variable if available). This can be obtained
                            from the SRA Submission Portal.
  -s, --sra-subfolder TEXT  SRA subfolder owned by you where data will copied
                            to (uses SRA_SUBFOLDER env variable is available).
  --help                    Show this message and exit.

Basic usage:

cd /path/for/upload
# copy paste isolates.txt
mdu-sra-uploads isolates.txt
# when completing the submission, search for pre-uploaded files in the folder called mdu

Environmental variables that can be used to set options

  • MDU_READS: full path to where FASTQ data is stored
  • ASCP_UPLOAD_KEY: full path to where your Aspera NCBI upload key is located (obtain one from the SRA submission portal under the Aspera command line instructions)
  • SRA_FOLDER: path to your folder at SRA. Usually composed by your email plus an "_" and some random alphanumeric characters. This can be obtained from SRA submission portal under the Aspera command line instructions (e.g., john.doe@doe.industries.com_qEWo9).

Development

Development environment

To develop with the same environment use vagrant and virtualbox:

vagrant up
vagrant ssh

Once logged in to the VM, the shared folder is in /vagrant.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mdu-pytools-0.2.0.tar.gz (5.5 kB view hashes)

Uploaded Source

Built Distribution

mdu_pytools-0.2.0-py3-none-any.whl (6.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page