Skip to main content

A small example package

Project description

RWexptest

This is a simple example package.

Pseqpa

这个工具包是用于对蛋白质序列进行简单处理的工作,其中涉及的主要函数功能有:

excel_csv_to_fasta

from RWexptest import Pseqpa

# 指定输入和输出文件夹路径
input_folder = "<需要转换的目标文件的路径>"
output_folder = "<保存路径>"
entry_column_name = "Entry"  # 请替换为您的entry列的名称
sequence_column_name = "Sequence"  # 请替换为您的sequence列的名称

# 调用函数并传递输入和输出文件夹路径
excel_csv_to_fasta(input_folder, output_folder, entry_column_name, sequence_column_name)

process_fasta_files

from RWexptest import Pseqpa

# 指定输入目录、输出目录、批次大小以及最小和最大蛋白质序列长度
input_directory = "<fasta格式文件路径>"
output_directory = "<处理后的保存路径>"
batch_size = 500 #将fasta格式蛋白质序列按500个进行一次划分
min_sequence_length = 10 #筛选蛋白质序列最低不能小于10个氨基酸
max_sequence_length = 6000 #筛选蛋白质序列最高不能超过6000个氨基酸

# 处理fasta文件并将其分成批次,只保留符合长度条件的序列
process_fasta_files(input_directory, output_directory, batch_size, min_sequence_length, max_sequence_length)

create_blast_database(需要你的终端环境已经配置好了NCBI Blast工具)

from RWexptest import Pseqpa

# 指定构建数据库对象、数据库位置和数据库类型
input_fasta_path = "<your_train_data_path/train_data.fasta>" #路径不能有空格,路径必须是英文
output_db_path = "<Blast_database_path/Train_protein_seq_database>" #路径不能有空格,路径必须是英文
dbtype = "prot"  # 蛋白质数据库

# 构建数据库
result_message = create_blast_database(input_fasta_path, output_db_path, dbtype)

run_blastp(需要你的终端环境已经配置好了NCBI Blast工具)

from RWexptest import Pseqpa

# 指定balst对象、数据库、结果目录和结果格式
query_fasta_path = "<your_test_data_path/test_data.fasta>" #路径不能有空格,路径必须是英文
blast_db_path = "<Blast_database_path/Train_protein_seq_database>" #路径不能有空格,路径必须是英文
output_file_path = "<your_save_path/test_data_blast_results.xml>" #路径不能有空格,路径必须是英文
custom_outfmt = 5  # 自定义输出格式

#进行同源性blast
result_message = run_blastp(query_fasta_path, blast_db_path, output_file_path, custom_outfmt)

execute_blast_workflow(需要你的终端环境已经配置好了NCBI Blast工具)

from RWexptest import Pseqpa

# 指定相关路径
input_fasta_path = "<your_train_data_path/train_data.fasta>" #路径不能有空格,路径必须是英文
output_db_path = "<Blast_database_path/Train_protein_seq_database>" #路径不能有空格,路径必须是英文
query_fasta_path = "<your_test_data_path/test_data.fasta>" #路径不能有空格,路径必须是英文
custom_outfmt = 5 #路径不能有空格,路径必须是英文
dbtype = "prot" #或者
xml_result_path = "<your_save_path/test_data_blast_results.xml>"  # 自定义XML结果的保存位置,路径不能有空格,路径必须是英文

# 一次性完成数据库的创建和blast工作并获得xml文件
result = execute_blast_workflow(input_fasta_path, output_db_path, dbtype, query_fasta_path, custom_outfmt, xml_result_path)

parse_blast_xml_to_excel

import pandas as pd
from Bio import SearchIO
from RWexptest import Pseqpa

# 调用函数并传递输入XML文件和输出Excel文件的路径
input_xml = '<经过NCBI Blast处理后获得的xml文件路径/result.xml>' #路径不能有空格,路劲必须是英文
output_excel = '<保存路径/reuslt.xlsx>' #路径不能有空格,路劲必须是英文

parse_blast_xml_to_excel(input_xml, output_excel)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

RWexptest-0.0.14.tar.gz (14.0 kB view details)

Uploaded Source

Built Distribution

RWexptest-0.0.14-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file RWexptest-0.0.14.tar.gz.

File metadata

  • Download URL: RWexptest-0.0.14.tar.gz
  • Upload date:
  • Size: 14.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for RWexptest-0.0.14.tar.gz
Algorithm Hash digest
SHA256 c21b8f1126730e3532e41c0df289fefa624db62977ea29cf438562a4a772e0a2
MD5 8c01df8b004076d1c28f08e820c0b5de
BLAKE2b-256 6836326fbdde3e000f3c4fe996f602a9d85a41eed487ec84109c6d843cce4192

See more details on using hashes here.

File details

Details for the file RWexptest-0.0.14-py3-none-any.whl.

File metadata

  • Download URL: RWexptest-0.0.14-py3-none-any.whl
  • Upload date:
  • Size: 16.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for RWexptest-0.0.14-py3-none-any.whl
Algorithm Hash digest
SHA256 c1a5051496177ce2ef575321d4fb17d9b5f830ba1a9c2127daaea0c83d7603ef
MD5 cff3d3ab17dae6eea63cbb4216484439
BLAKE2b-256 e871b2f4ed6df3da8292d577528ac16b369668d28483cb37fcc72cb37ac36dcc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page