Tools to analysis biology sequence
Project description
BioSequences
关于本项目
BioSequences是一个集合了基本的常用的生物序列分析工具的包,旨在提高日常一些基本序列分析流程的工作效率,以及为大数据分析提供一些基础支持。
完整文档请看这里Document。
安装
pip 安装
pip install biosequences
下载源码安装
windows下需要安装Microsoft VC++
编译工具, Linux 需要安装gcc或其他编译工具。
git clone https://github.com/Dragon-GCS/BioSequences.git
cd BioSequences
python -m pip install BioSequences
示例
加载序列信息
bioseq
可以从标准fasta格式的文件或NCBI/Ensemble数据库读取序列信息。当fetch
方法的参数为列表时可以批量抓取目标序列。
>>> from bioseq.utils import loadFasta, fetchNCBI, fetchENS
>>> sequence1 = loadFasta("/path/to/file.fasta")
>>> bsa = fetchNCBI("NP_851335.1")
>>> actin = fetchENS("ENST00000614376")
序列基本操作
bioseq.RNA
,bioseq.DNA
和 bioseq.Peptide
都继承自 bioseq.Sequence
,因此三者基本操作基本一致。
-
查看序列的基本属性
>>> actin.GC, actin.length (0.5, 102) >>> actin.composition {'A': 24, 'C': 18, 'G': 33, 'T': 27} >>> actin.seq 'AGAAACTTTAGCATCTGGCTAGGAGCATCTGTGGTGGCTCACCTTTCTACCTATACGTGTGAGTGGGTGACCTGAGAGGAGTACGGTGAGCATATGAGGATG' >>> round(bsa.weight, 1) 69334.4 >>> bsa.pI 6.805 >>> round(bsa.chargeInpH(7.4), 2) -13.76
-
DNA序列或RNA序列可以进行转录
transcript()
,DNA序列有translate()
方法可以翻译为RNA序列。 还可以通过bioseq.config.START_CODON
自定义起始密码子,以及通过修改bioseq.config.CODON——TABLE
自定义密码子表。>>> from bioseq.config import START_CODON, CODON_TABLE >>> actin.transcript() >>> START_CODON[0] = 'AGA' >>> actin.transcript() [N-RNFSIWLGASVVAHLSTYTCEWVT-C] >>> CODON_TABLE["AAC"] = "Y" >>> actin.transcript() [N-RYFSIWLGASVVAHLSTYTCEWVT-C]
-
两个相同类型的序列可以进行拼接
>>> from bioseq import DNA >>> dna1 = DNA("ATCG") >>> dna2 = DNA("GCAT") >>> dna1 + dna2 "5'-ATCGGCAT-3'" >>> dna2 + dna1 "5'-GCATATCG-3'"
-
通过
mutation()
方法对序列进行修改>>> dna1.mutation("ATC", "GGG") 'GGGG' >>> dna1.mutation(0, "AT") 'ATGG' >>> dna1.mutation([0, 3], "C") 'CTGC'
-
Sequence
用C语言实现了Needleman-Wunsch
全局比对和Smith-Waterman
局部比对两种基本的序列匹配算法,可以用来快速比对序列(局部比对仅返回匹配的局部序列)。>>> DNA("GCATGCT").align("GATTACA") ('GCA-TGCT', 'G-ATTACA', -4.0) >>> DNA("GCATGCT").align("GATTACA", 2) ('AT', 'AT', 4.0)
比对返回的前两个参数为比对后的序列,第三个参数为匹配得分,可以通过
bioseq.utils.printAlign()
来优化比对结果的显示。>>> from bioseq.utils import printAlign >>> seq1, seq2, score = DNA("GCATGCT").align("GATTACA") >>> printAlign(seq1, seq2) 1 GCA-TGCT ┃━┃━┃•┃• 1 G-ATTACA
可以通过修改
bioseq.config.AlignmentConfig
来修改匹配时的罚分,默认为MATCH(2.0), MISMATCH(-3.0), GAP_OPEN: (-3.0), GAP_EXTEND(-3.0)
>>> from bioseq.config import AlignmentConfig >>> AlignmentConfig.GAP_OPEN = -10 >>> DNA("GCATGCT").align("GATTACA") ('GCATGCT', 'GATTACA', -6.0)
贡献者
致谢
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file BioSequences-1.1.5.tar.gz
.
File metadata
- Download URL: BioSequences-1.1.5.tar.gz
- Upload date:
- Size: 34.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 288283bb56860871648fda6c7557acd34ce95cb6f4bb362ab4bf60da916a21f2 |
|
MD5 | 8c8fb209585fcda479b87d95d4503ee6 |
|
BLAKE2b-256 | 87d8d6834991f7c36ad87bf13595e31938541ca76c7a1b03717cdda0cad4d403 |
File details
Details for the file BioSequences-1.1.5-cp38-cp38-win_amd64.whl
.
File metadata
- Download URL: BioSequences-1.1.5-cp38-cp38-win_amd64.whl
- Upload date:
- Size: 37.9 kB
- Tags: CPython 3.8, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0e7b077b26e3cc55bf9543998f5e483a650c4ad4f6f62255860c7cde64c5b086 |
|
MD5 | ce93fbf7add119dd984d3621a7f0cf4a |
|
BLAKE2b-256 | 4e5dd3afa41c747c8fc85a29f4964bdd4e637d0439c2903d41150e84094fe0fe |
File details
Details for the file BioSequences-1.1.5-cp38-cp38-manylinux2014_x86_64.whl
.
File metadata
- Download URL: BioSequences-1.1.5-cp38-cp38-manylinux2014_x86_64.whl
- Upload date:
- Size: 51.0 kB
- Tags: CPython 3.8
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 809a3c08e4e2218a1cf9c19a07b2dad60de6f38ab26756e216bed5646bebd366 |
|
MD5 | 94bacf3604759456315b436db22fbdf2 |
|
BLAKE2b-256 | 5eb64fca469d54fcbe33114811f3c34f9f31874643bf2299120db3aa33935861 |