To deal with large amount of WGS data in suppressor screening and to predict suppressor gene

Project description

Gene Torch 1.3.5

REALLY REALLY A LOT OF UPDATES

include a protein folder which contains all alphafold model of C.elegans protein with .apf(Aminoacid position file)

You can read it using module pcluster.

What is new in version 1.2.0

genetorch.finder.plotpro(readfile,size,c='Reds',sub_c='RdPu',origin = a) A more professional version to distinguish gene via binominal distribution instead of overlap. The calculation is a bit slow but more accurate. The genes you tried will be saved in a.candidate and all the sample names of the candidate gene will be listed in a.suppressor_group. You should also notice that in the previous version, genetorch.finder.plot(result,size,c,sub_c) you shall input a result file. In this function, you may need to input the whole readfile() object or getfile() object instead. genetorch.reader.get_impact(readfile) This function can now annotate splice_donor and splice_acceptor variation with 'X_donor' and 'X_acceptor' because we noticed that splice variation do have an impact on protein function. genetorch.finder.get_p_sup(gene,readfile) return a Dataframe including all the p values the gene you chose vs others. genetorch.finder.get_p_between(readfile,genea,geneb) return the p value between genea and geneb genetorch.finder.find(readfile)/filter(readfile,lengthlimit,rid) We canceled the threshold in 'filter' because we noticed that it is not very useful and can make our users confused. These two functions will automatically do a genetorch.reader.get_impact now. Also, in some cases you may have to reset the candidate and suppressor_group list. You only need to do a filter or find to do this. genetorch.simulator.false_positive(readfile,gene,[0.05,0.04,0.03,0.02,0.01]) Give the estimated value of the false positive cases using the gene you chose to compare with others. This step will cost about 5 mins for each value.

What is new in version 1.1.1

1. You can get rid of some annoying background variation by just removing them when 'filter'

genetorch.finder.filter(taglist,lengthlimit,threshold,rid = ['ttn-1','cla-1'])

by setting `rid = []` , the genes in the list will no longer exist in your results.

2. You can use the intersection function to find multiple candidate suppressors.

genetorch.finder.plot(result,size,c='Reds',sub_c='RdPu',intersection_factor = 0.03)

c = color map used in the major graph

sub_c = color map used in other graphs

intersection_num = 0.03

For example: If you think gene A is a suppressor, it means that in samples with other mutations,there is less opportunity for them to carry a gene A mutation. intersection_num means the least percent (default 3%) of the sample number you allow to carry more than one suppressor.

You can now find the intersection by click on the bubble of the gene on the graph

Other changes

genetorch.finder.intersection(result,samplelist)

You can use this function to find genes not in these samples. Return a Dataframe

genetorch.finder.gsamplename(result,gene)

You can get a list of your sample carry this gene.

genetorch.reader.get_impact(taglist)

Remove the lines in your data which do not have an amino_acid change

1.0.3 update:

bug FIXING........

filter is more accurate and faster

solve the problem that file cannot be decoded by 'utf-8' in MacOS

pip install genetorch

PLEASE INSTALL: pandas, matplotlib, seaborn BEFORE USE

This package can help find candidate genes in high throughput mutagenesis and suppressor screening experiments without mapping. Please call variants via freebayes and annotate with snpeff before using this package. Any advise is welcomed, please contact

[e-mail]:guozhengyang980525@yahoo.co.jp

genetorch.reader genetorch.reader.readfile(filepath) multiple renamed vcf files must be included in the filepath.

filepath
|---1.vcf
|---2.vcf
|---3.vcf
|---4.vcf
|---5.vcf
|---6.vcf

genetorch.reader.getfile(filepath,filename) multiple renamed folders must be included in the filepath, and a vcf file with filename must be included in the folders, the name of the folder must be splited with '_' to divide the folder name into strain name and WGS order name: examples:

filepath
|---cas113_20221011jxskaosdosh---filename.vcf
|---cas114_20221011jxskaosdosh---filename.vcf
|---cas115_20221011jxskaosdosh---filename.vcf
|---cas116_20221011jxskaosdosh---filename.vcf
|---cas117_20221011jxskaosdosh---filename.vcf
|---cas118_20221011jxskaosdosh---filename.vcf

a temp folder will be automatically created in the filepath including renamed vcf files:

filepath
|----temp
|     |---cas113.vcf
|     |---cas114.vcf
|     |---cas115.vcf
|     |---cas116.vcf
|     |---cas117.vcf
|     |---cas118.vcg
|---cas113_20221011jxskaosdosh---filename.vcf
|---cas114_20221011jxskaosdosh---filename.vcf
|---cas115_20221011jxskaosdosh---filename.vcf
|---cas116_20221011jxskaosdosh---filename.vcf
|---cas117_20221011jxskaosdosh---filename.vcf
|---cas118_20221011jxskaosdosh---filename.vcf

a = genetorch.readfile() a = genetorch.getfile() a.taglist : a list of Dataframes which included columns: 'gene', 'ID', 'type', 'base', 'protein'，'tag' column 'tag' will be filled with strain name example: a.taglist[1]:

gene	ID	type	base	protein	tag
ttn-1	WBGenexxxx	missense	C<G	Asp666Asn	cas113
cla-1	WBGenexxxx	missense	C<G	Asp223Asn	cas114
`genetorch.finder`
`genetorch.finder.find(readfile())`
return a pandas Dataframe item.
example:
sample	sample size	gene	variation	variation_number
----------	--------------	-----	----	-----
cas113,cas114	2	ttn-1	Asp666Asn,1 Glu374Gln,2	2
ttn-1 mutation is found in 2 input files, there are three amino acid variation events, 2 * Glu374Gln, 1 * Asp666Asn.
There are two kinds of variations in this gene.
`genetorch.finder.filter(readfile(),lengthlimit = 0.6,rid = ['ttn-1','cla-1'])`
return a pandas Dataframe item same as find().
provide filtered data:
lengthlimit: if n(amino acid variation) > lengthlimit * n(input files), exclude the variation

if the gene has no variation left after filtered, the gene will be deleted from the result Dataframe genetorch.finder.plot(result,size,intersection_num) result: result Dataframe from find() or filter() size: use the first n lines to show the plot.
x: gene name y:variation number size: size color: variation number/size automatically show annotation of each bubble genetorch.stocker Module for data upload to LIM database genetorch.stocker.stock(path,filename,outpath)

path & filename

multiple renamed folders must be included in the filepath, and a vcf file with filename must be included in the folders, the name of the folder must be splited with '_' to divide the folder name into strain name and WGS order name: examples:

filepath
|---cas113_20221011jxskaosdosh---filename.vcf
|---cas114_20221011jxskaosdosh---filename.vcf
|---cas115_20221011jxskaosdosh---filename.vcf
|---cas116_20221011jxskaosdosh---filename.vcf
|---cas117_20221011jxskaosdosh---filename.vcf
|---cas118_20221011jxskaosdosh---filename.vcf

a temp folder will be automatically created in the filepath including renamed vcf files:

filepath
|----temp
|     |---cas113.vcf
|     |---cas114.vcf
|     |---cas115.vcf
|     |---cas116.vcf
|     |---cas117.vcf
|     |---cas118.vcg
|---cas113_20221011jxskaosdosh---filename.vcf
|---cas114_20221011jxskaosdosh---filename.vcf
|---cas115_20221011jxskaosdosh---filename.vcf
|---cas116_20221011jxskaosdosh---filename.vcf
|---cas117_20221011jxskaosdosh---filename.vcf
|---cas118_20221011jxskaosdosh---filename.vcf

outpath:

outpath is an empty folder. output csv files in this folder which can be uploaded to LIM database.

Examples

import genetorch as gt
a = gt.reader.readfile(path) # read vcf files from path
b = gt.reader.getfile(path,filename) # read vcf files with the filename in the individual folders in the path 
                                             # a has the same properties as b
gt.reader.get_impact(a) # get impact results from a
d = gt.finder.find(a) # get the result table
e = gt.finder.filter(a,lengthlimit = 0.6,threshold = 1, rid = ['ttn-1','cla-1']) # get filtered result table
gt.finder.plot(e,1000,1) # draw a picture using the first 1000 lines in the result table,
# allowing 1 sample have more than one suppressor gene
gt.stocker.stockfile(path,filename,outpath) #get stock file

Project details

Release history Release notifications | RSS feed

This version

1.3.6

Sep 24, 2022

1.3.5

Sep 24, 2022

1.3.4

Sep 24, 2022

1.3.3

Sep 23, 2022

1.3.2

Sep 23, 2022

1.3.1

Sep 11, 2022

1.3

Sep 10, 2022

1.2.51

Sep 10, 2022

1.2.6

Sep 10, 2022

1.2.5

Sep 8, 2022

1.2.4

Sep 7, 2022

1.2.3

Sep 6, 2022

1.2.2

Sep 6, 2022

1.2.1

Sep 6, 2022

1.2.0

May 8, 2022

1.1.3

May 6, 2022

1.1.2

May 1, 2022

1.1.1

Apr 24, 2022

1.1.0

Apr 24, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genetorch-1.3.6.tar.gz (93.4 MB view details)

Uploaded Sep 24, 2022 Source

Built Distribution

genetorch-1.3.6-py3-none-any.whl (103.2 MB view details)

Uploaded Sep 24, 2022 Python 3

File details

Details for the file genetorch-1.3.6.tar.gz.

File metadata

Download URL: genetorch-1.3.6.tar.gz
Upload date: Sep 24, 2022
Size: 93.4 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for genetorch-1.3.6.tar.gz
Algorithm	Hash digest
SHA256	`6a47951d77a03f15ae8911841f2a86947654dab35792c9a4caee019b99d80f4d`
MD5	`028a1079877c42f254c7f60dda7f2f3f`
BLAKE2b-256	`83e4e6da596e659a5aa440469e5b16b557f293beba831e2d0bf8981a93410e46`

See more details on using hashes here.

File details

Details for the file genetorch-1.3.6-py3-none-any.whl.

File metadata

Download URL: genetorch-1.3.6-py3-none-any.whl
Upload date: Sep 24, 2022
Size: 103.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for genetorch-1.3.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5da0b80d388f3c54dcff3b33b3152cccac5ceaf2b88757c08799a16f0a234520`
MD5	`5f27b9cc7ed0587487d8ec87a9c8385d`
BLAKE2b-256	`21a682167b5c30f3e7b9d5be388d861995f9643e4e72c2f8b479dca06a31b887`

See more details on using hashes here.

genetorch 1.3.6

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Gene Torch 1.3.5

REALLY REALLY A LOT OF UPDATES

What is new in version 1.2.0

What is new in version 1.1.1

1. You can get rid of some annoying background variation by just removing them when 'filter'

by setting rid = [] , the genes in the list will no longer exist in your results.

2. You can use the intersection function to find multiple candidate suppressors.

c = color map used in the major graph

sub_c = color map used in other graphs

intersection_num = 0.03

For example: If you think gene A is a suppressor, it means that in samples with other mutations,there is less opportunity for them to carry a gene A mutation. intersection_num means the least percent (default 3%) of the sample number you allow to carry more than one suppressor.

You can now find the intersection by click on the bubble of the gene on the graph

Other changes

You can use this function to find genes not in these samples. Return a Dataframe

You can get a list of your sample carry this gene.

Remove the lines in your data which do not have an amino_acid change

1.0.3 update:

bug FIXING........

filter is more accurate and faster

solve the problem that file cannot be decoded by 'utf-8' in MacOS

PLEASE INSTALL: pandas, matplotlib, seaborn BEFORE USE

[e-mail]:guozhengyang980525@yahoo.co.jp

path & filename

outpath:

Examples

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

by setting `rid = []` , the genes in the list will no longer exist in your results.