PlasmidPoolAnalysis

Analysis for plasmid pool sequencing data

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.7
Topic
- Software Development :: Build Tools

Project description

Dependencies

Build reference

The pre-built reference files used for the analysis can be found in
1. human grch37: /home/rothlab/rli/02_dev/06_pps_pipeline/fasta/human_ensembl/grch37
2. human grch38: /home/rothlab/rli/02_dev/06_pps_pipeline/fasta/human_ensembl/grch38
3. human 9.1: /home/rothlab/rli/02_dev/06_pps_pipeline/fasta/human_91
4. yeast (palte specific): /home/rothlab/rli/02_dev/06_pps_pipeline/fasta/yeast_ref_all
If you need to build new references, please make sure:
1. Name for the reference is the same as name for the sequencing files. For example, the corresponding reference for scORFeome-HIP-05_L001.fastq.gz is scORFeome-HIP-05
2. ID for each sequence matches the ORF-id in the summary file

Make summary file

The summary files for human and yeast are premade before running the pipeline, the raw data can be found: /home/rothlab/rli/02_dev/06_pps_pipeline/target_orfs/human_summary.csv and /home/rothlab/rli/02_dev/06_pps_pipeline/target_orfs/yeast_summary.csv
If you are making your own summary file, make sure you have a column with the name orf_name, which is the unique identifier for each ORF, this should also map with the sequence names in the fasta file you make. You can modify in main.py: analysisHuman or analysisYeast to select columns you want to keep

Input FASTQ files

FASTQ files:
1. human (files from the same group are merged together): /home/rothlab/rli/01_ngsdata/PPS_data/Human_pool/merged_pool9-1/
2. yeast (files from the same plate are merged together): /home/rothlab/rli/01_ngsdata/PPS_data/yeast_pps_fastq/yeast_pps_fastq/

Install and Run

install the package using: ``

usage: pps [-h] [--align] [-f FASTQ] [-n NAME] -o OUTPUT -r REF
       [--refName REFNAME] [--summaryFile SUMMARYFILE] [--orfseq ORFSEQ]
Plasmid pool sequencing analysis

required arguments:
-f FASTQ, --fastq FASTQ
                    path to fastq files
-o OUTPUT, --output OUTPUT
                    Output directory
-r REF, --ref REF     Path to reference
-m MODE, --mode MODE  human or yeast
--summaryFile SUMMARYFILE
                    Yeast or Human summary file

optional arguments:
-h, --help            show this help message and exit
--align               provide this argument if users want to start with
                    alignment, otherwise the program assumes alignment was
                    done and will analyze the vcf files.
-n NAME, --name NAME  Run name (default set to pps)

--refName REFNAME     grch37, grch38, cds_seq. Required if mode == human
-l LOG, --log LOG logging mode, default set to info

Example: Human (with alignment to grch37)

pps -f ~/01_ngsdata/PPS_data/Human_pool/merged_pool9-1/ -o ../../output/ -n Human91 --refName human91 --summaryFile ../../target_orfs/human_summary.csv -m human -r ../../fasta/human_91/ --align
Yeast

pps -f ~/01_ngsdata/PPS_data/yeast_pps_fastq/yeast_pps_fastq/ -o ../../output/ -n testpackYeast --summaryFile ../../target_orfs/yeast_summary.csv -m yeast -r ../../fasta/yeast_ref_all/
The pipeline first submit alignment jobs to the cluster (slurm), after all the jobs are done, it filters vcf files, output summary and mutations

Output

All the intermediate files will be saved into your output directory, a new folder will be made with the -n parameter
For each fastq file, a folder will be made. It contains the following files:
1. *.sh: alignment job script used for alignment
2. all_summary_plateORFs.csv: summary for this plate/group
3. *.log: log file
4. *_raw.vcf: raw vcf file generated from pileup
5. *_variants.vcf: vcf file with variants only
6. *_filtered.vcf: filtered vcf file
After the run is finished, the following files will be generated in the master output folder:
1. alignment_log.csv: shows the alignment rate for each plate/group
2. all_mutations.csv: contains all the variants passed filter
3. all_summary.csv: contains all ORFs and if they were found/fully covered in the sequencing
4. genes_stats.csv: overall stats

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.7
Topic
- Software Development :: Build Tools

Release history Release notifications | RSS feed

This version

0.1.0

Oct 27, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

PlasmidPoolAnalysis-0.1.0-py3-none-any.whl (51.3 kB view details)

Uploaded Oct 27, 2021 Python 3

File details

Details for the file PlasmidPoolAnalysis-0.1.0-py3-none-any.whl.

File metadata

Download URL: PlasmidPoolAnalysis-0.1.0-py3-none-any.whl
Upload date: Oct 27, 2021
Size: 51.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.7.10

File hashes

Hashes for PlasmidPoolAnalysis-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fe235f0f1f9c86a915c95e455701c2167be0f1624340db9224130425bea3f7bb`
MD5	`7157b36ee10009dfcdecbf81aada8067`
BLAKE2b-256	`1b733fd39862dfe4377bb965f66b9d9c5ca5956373fd9c76a01b2088f094685f`

See more details on using hashes here.

PlasmidPoolAnalysis 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Dependencies

Build reference

Make summary file

Input FASTQ files

Install and Run

Output

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes