Filters .gtf file of suspected HIV isoforms and confirms the isoform identities
Project description
HIV Isoform Checker
This package takes a .gtf file of preliminarily filtered HIV transcripts from ONT sequencing and filters them to include only correctly assigned transcripts using the following filters.
- FILTER 1: only include class codes =, J, and m
- FILTER 2: only include samples with end values >= min_end_bp and start values <= max_start_bp
- FILTER 3: get rid of any samples with read errors/small gaps
- FILTER 4: keep only correct Env samples
- FILTER 5: keep only correct Nef samples(long samples added to possible_misassigned)
- FILTER 6: keep only correct Rev samples(long samples added to possible_misassigned)
- FILTER 7: keep only correct Tat samples(long samples added to possible_misassigned)
- FILTER 8: keep only correct Vif samples
- FILTER 9: keep only correct Vpr samples
- FILTER 10: check possible_misassigned for partial splice compatibility (vif -> vpr -> unslpiced_tat -> env)
Note: This code currently relies on a very specific setup of the gtf file to work properly. The note must be in the order designated in the []. transcript entry = ref genome, analysis_pathway, transcript, start, end, ".", "+", ".", [transcript id; gene id; gene_name; xloc; ref_gene_id; contained_in; cmp_ref; class_code; tss_id] exon entry = ref genome, analysis_pathway, exon, start, end, ".", "+", ".", [transcript id; gene id; exon number]
Installation
Fast install:
pip install HIV_Isoform_Checker
Usage
HIV_Isoform_Checker [options] input_file_name output_file_prefix ref_file_name
positional arguments: input_file_name Designates input file to be filtered. This is required. output_file_prefix Designates output file prefix. This is required. ref_file_name Designates reference CDS file name. This should be a python file with only a dictionary with the splice donor sites, splice acceptor sites and gene CDS regions defined. This is required. An example is available in the test data set.
options:
Arguement | Function |
---|---|
-h, --help | show this help message and exit |
-g value, --gap value | Sets gap tolerance. Default is 15. |
-a value, --startBP value | Sets maximum starting bp. Default is 700. |
-z value, --endBP value | Sets minimum ending bp. Default is 9500. |
-l value, --lengthFS value | Sets maximum fully spliced transcript length. Default is 2100. |
-n value, --NCE value | When set to True, csv file will have y/n columns for the precence of NCEs. Default is False. |
License
MIT - Copyright (c) 2023 Jessica Lauren Albert
HIV_Isoform_Checker
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for hiv_isoform_checker-1.1.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 83d4f65e38ddcedd7a29a86fd022223fbc4fed2f5def76bcb1d0c67adcba4a61 |
|
MD5 | 2f138de40960c042e3972555015787f1 |
|
BLAKE2b-256 | 19e0581537eabbfa4d8eb72e533d0904ce90ddefc5adec0f5755583be99562cb |
Hashes for HIV_Isoform_Checker-1.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa9f80cd5af9f3f4b74c3f2b2181561c19e073529efb0bacbeed3055c177ecde |
|
MD5 | 184f016015254e8f87ac104eb74cd1e6 |
|
BLAKE2b-256 | 6d40b94fc544030d6f4220d608b477932b27426092d9b66d72a1fe8bc63310c7 |