Analysis scripts for BFG-Y2H data
Project description
BFG Y2H Analysis Pipeline
Requirements
- Python 3.7
- Bowtie 2 and Bowtie2 build
Files required
The pipeline requires reference files before running. They can be found on GALEN:
all reference files contain all the barcodes in fasta format
path: /home/rothlab/rli/02_dev/08_bfg_y2h/bfg_data/reference/
Before running the pipeline, you need to copy everything in these two folders to your designated directory.
Build new reference
If you need to build a new reference for your analysis, please follow:
- You can refer to the create_fasta.py script to build the new fasta file
- Make sure the name for the sequences follows the format:
>*;ORF-BC-ID;*;up/dn
. In other words, the ORF-ID should always be the second item, and the up/dn identifier should always be the last item. (see examples below) - Example sequences in output fasta file:
>G1;YDL169C_BC-1;7;up
CCCTTAGAACCGAGAGTGTGGGTTAAATGGGTGAATTCAGGGATTCACTCCGTTCGTCACTCAATAA
>G1;YMR206W_BC-1;1.0;DB;up
CCATACGAGCACATTACGGGGCTTGAGTTATATAGTCGATCCGGGCTAACTCGCATACCTCTGATAAC
>G09;56346_BC-1;24126.0;DB;dn
TCGATAGGTGCGTGTGAAGGATGTTCCCCCGGTCACCGGGCCAGTCCTCAGTCGCTCAGTCAAG
- After making the fasta file, build index with bowtie2-build
bowtie2-build filename.fasta filename
- Update main.py to use the summary files you generated
- Edit parse_input_files() to add a case
Running the pipeline
-
Install from pypi (recommend):
python -m pip install BFG-Y2H
-
Install and build from github, the update.sh might need to be modified before you install
1. download the package from github
2. inside the root folder, run ./update.sh
- Input arguments:
usage: bfg [-h] [--fastq FASTQ] [--output OUTPUT] --mode MODE [--alignment]
[--ref REF] [--cutOff CUTOFF]
BFG-Y2H
optional arguments:
-h, --help show this help message and exit
--fastq FASTQ Path to all fastq files you want to analyze
--output OUTPUT Output path for sam files
--mode MODE pick yeast or human or virus or hedgy or LAgag
--alignment turn on alignment
--ref REF path to all reference files
--cutOff CUTOFF assign cut off
-
All the input fastq files should have names following the format: y|hADDBGFP(pre|med|high) (for human and yeast)
-
Run the pipeline on GALEN
# this will run the pipeline using slurm
# all the fastq files in the given folder will be processed
# run with alignment
bfg --fastq /path/to/fastq_files/ --output /path/to/output_dir/ --mode yeast/human/virus/hedgy --alignment --ref path/to/reference
# if alignment was finished, you want to only do read counts
bfg --fastq /path/to/fastq_files/ --output /path/to/output_dir/ --mode yeast/human/virus/hedgy --ref path/to/reference
Output files
-
After running the pipeline, one folder will be generated for each group pair (yADDB)
-
The folder called
GALEN_jobs
saves all the bash scripts submited to GALEN -
In the output folder for each group pair, we aligned R1 and R2 separately to the reference sequences for GFP_pre, GFP_med and GFP_high.
-
*_sorted.sam
: Raw sam files generated from bowtie2 -
*_noh.csv
: shrinked sam files, used for scoring -
*_counts.csv
: barcode counts for uptags, dntags, and combined (up+dn)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file BFG-Y2H-0.1.2.tar.gz
.
File metadata
- Download URL: BFG-Y2H-0.1.2.tar.gz
- Upload date:
- Size: 27.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/3.10.0 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e50c28cd7a17d064a15bcf7d77d97a6f26f9778f38a6801dba493cf4fd8e32f9 |
|
MD5 | ad7f3ffa8a69fc6dc3bcea7befa16c97 |
|
BLAKE2b-256 | 5c7cd61e8e2a4fa3a4bfb097b044609a1f3980860368e398d8c55fe9ede6bcea |