Simple bioinformatic tools

These details have not been verified by PyPI

Project links

Homepage

Project description

WangLab

Provide several simple bioinformatic scripts

Introduction

This project was written by a master student who studied under Dr. Wang Qiyao's supervision. This project contains several simple bioinformatic and sequence operating scripts and will not update nor provide support when this student is graduated.

The project contains 4 modules: Sequence_operate, TIS, ChIP_seq and RNA_seq. For detailed usage, users can refer to Usage.

Install

The easiest way to install WangLab in through PyPI. Please check the INSTALL document for detail.

In general, you can install through PyPI as pip install WangLab. To use virtual environment is highly recommended. Or you can install after unzipping the released package downloaded from Github, then place scripts into corresponding folders of specific python environment.

Usage

See Sequence_operate, ChIP_seq, TIS, RNA_seq in docs

Example Usage

Here are examples of how to use subcommands in different modules.

Sequence_operate

extract_seqs

wanglab extract_seqs -i pos.txt -r ref.fa -o out1.fa

In this example, pos.txt contain 3 columns, each represent name ,start ,end. ref.fa is contain a single sequence. The program will extract sequences between start and end, and output to out1.fa.

wanglab extract_seqs -i names.txt -r seqs.fa -o out2.fa

In this example, names.txt contain 1 column, which is the names that will be extracted. seqs.fa contain several sequences. The program will extract sequences whose names are in names.txt, and output to out2.fa

calc_content

wanglab calc_content -i pos.txt -r ref.fa -o out3.txt

In this example, pos.txt and ref.fa are exactly the same as in extract_seqs. The program will calculate GC content of each extracted sequences and output to out3.txt

file_merge

wanglab file_merge -d ./input_files/ -f fasta -o out4.fa

In this example, input_files is a directory that contains several fasta files. The program will merge all fasta files in input_files directory and output to out4.fa

ossutil

wanglab ossutil -conf config.txt

In this example, config.txt contains information used to download NGS data through oss-util, i.e., AccessKeyId, AccessKeySecret, OSS_path, endpoint_path and local_dir. The former four are given by company and local_dir is the directory that downloaded files are stored.

Note: To use this subcommand, users need to manually install oss-util.

TIS

Start with fq.gz files, the following examples are a common workflow of TIS analysis.

cutadapt

wanglab cutadapt -d . -p Tn5 -t tn-seq

In this example, all fq.gz files are in the current folder . and is specified by -d. Also, -p and -t are used to specify tn-seq and plasmid Tn5. This step will remove 3' and 5' adaptor sequences of raw data.

bowtie

wanglab bowtie -d . -r ./ref.fa -@ 4 -t tn-seq

In this example, all trimmed files generate by cutadapt are still in the current folder, so we set -d as .. Also, we use -r to set the path of the reference genome file ref.fa, whose first line (i.e. reference sequence name) is Contig00001 will be used later. -@ is used to make this process faster, and again, we set -t as tn-seq to specify data type.

This step will map trimmed reads to the reference genome.

count_reads

wanglab count_reads -d . -i ./annot.gff -c Contig00001 -r "ID=(.+?);" -l 4703168

In this example, all SAM files generated by bowtie are in the current folder, which is specified by -d ..Similarly, gff formated annotation file annot.gff was specified by -i. Contig00001, which has been mentioned in bowtie, is also the first column in annot.gff. Then, we use -r to set a regular expression ID=(.+?); to extract the feature name between ID= and ;. Finally, -l is set as 47603168, which means the region between the end of the last feature and 47603168 are the last one we will count reads.

This step will yield a table that contains read counts of every features.

Note: The 3rd column of annot.gff should be CDS.

annot.gff

combine_reads

wanglab combine_reads -d .

Finally, we combine all the csv files in the current folder . set by -d and export one excel file.

ChIP_seq

cutadapt

wanglab cutadapt -d . -t chip-seq

In this example, all fq.gz files are in the current folder . and is specified by -d. Also, -t is used to specify chip-seq. This step will remove 3' adaptor sequence of raw data.

bowtie

wanglab bowtie -d . -r ./ref.fa -@ 4 -t chip-seq

In this example, all trimmed files generate by cutadapt are still in the current folder, so we set -d as .. Also, we use -r to set the path of the reference genome file ref.fa. -@ is used to make this process faster, and again, we set -t as chip-seq to specify data type.

This step will map trimmed reads to the reference genome, and convert sam files to bam files, which is much smaller. Also, the program will sort bam files and generate index files for visualization by other tools like IGV.

macs

wanglab macs -t ./treat.bam -c ./control.bam -l eib202 --keep-dup all

In this example, treatment and control files set by -t and -c were used to call peak using macs3 callpeak. -l is set as eib202 to search the genome length of eib202 and use in macs3 program. Also, we set --keep-dup as all to keep all reads.

Note: The macs subcommand has not enabled most parameters provided by macs3 callpeak. Thus, it is recommended to directly use macs3 callpeak.

RNA_seq

Logs

23.12.21

Write docs of TIS and example usage in README.md
Write docs of ChIP-seq and example usage in README.md

23.12.20

Write docs of Sequence_operate
Write example usage in README.md

23.12.19

Enable installation using PyPI
Test Sequence_operate module, the following need to be fixed: primer-generator, primer_blast, qPCR
Test TIS module.
Test ChIP-seq module.

TO DO

Fix: primer-generator, primer_blast, qPCR
Test RNA_seq
Write doc of usage.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.0.0b0 pre-release

May 13, 2024

This version

0.1.1

Apr 12, 2024

0.0.18

Mar 12, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

WangLab-0.1.1.tar.gz (26.6 kB view details)

Uploaded Apr 12, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

WangLab-0.1.1-py3-none-any.whl (30.3 kB view details)

Uploaded Apr 12, 2024 Python 3

File details

Details for the file WangLab-0.1.1.tar.gz.

File metadata

Download URL: WangLab-0.1.1.tar.gz
Upload date: Apr 12, 2024
Size: 26.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.11.3

File hashes

Hashes for WangLab-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`904dddcd189a24acba013c6270d1644222205a014702c4a8ca1596c4bbf57929`
MD5	`e5314975cdee16d79edce5fa795bae0d`
BLAKE2b-256	`0562a4266f6cdef6e012bea314c5aefa19e4a29118b872040bf4bba7959e7b99`

See more details on using hashes here.

File details

Details for the file WangLab-0.1.1-py3-none-any.whl.

File metadata

Download URL: WangLab-0.1.1-py3-none-any.whl
Upload date: Apr 12, 2024
Size: 30.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.11.3

File hashes

Hashes for WangLab-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`94940eb29e36e0d056803f857281ac41b4f144509cf16c745824db98b1f2ea5c`
MD5	`8767b3d5b899ab7f6a22ac345d833a27`
BLAKE2b-256	`8c16e244c50aad211c3ace4a185ddf92f6051216b2714ab4ce84e7c5a58683be`

See more details on using hashes here.

WangLab 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

WangLab

Introduction

Install

Usage

Example Usage

Sequence_operate

TIS

ChIP_seq

RNA_seq

Logs

TO DO

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes