Bioinformatics tool for compering large sequence files
Project description
Database_comparator
A program for searching and analyzing databases using various algorithms.
Table of Contents
Overview
The program is for comparing and analyzing databases using various methods.
It utilizes the provided configuration to perform exact matching, sequence alignment,
BLAST searches, and calculates Hamming distances between sequences. The class allows for exporting the results to
different file formats, such as Excel, CSV, and Markdown.
Configuration of program is given by config_file.txt
Installation
# Example installation command
pip install Database_comparator
Configuration file
# Databases
QUERY HEDIMED__230620_Hedimed_1_22_basic--table_EF_predelana.xlsx part3
DB Databases/Nakayama.csv CDR3b [Clone/SequenceID, Epitope]
DB Databases/McPAS-TCR-filtred.csv CDR3.beta.aa [PubMed.ID, Pathology, Additional.study.details]
DB Databases/vdjdb.csv cdr3 [antigen.gene, antigen.species, mhc.a, gene]
DB Databases/TCRdb_all_sequnces.csv AASeq [TCRDB_project_ID, RunId, cloneFraction]
# Smith–Waterman algorithm
SWA_tolerance 0.9
SWA_gap_score -1000
SWA_mismatch_score 0
SWA_match_score 1
# Blastp Algorithm
BLAST_e_value 0.05
BLAST_database_name clip_seq_db
BLAST_output_name blastp_output.txt
# Hamming distance
HD_max_distance 1
# Multiprocessing
number_of_processors 3
Syntax of config file:
# QUERY - query database
QUERY >Name of query database< >Name of column with sequence<
# DB - Databases with the data we want to analyze
DB >Name of data database< >Name of column with sequence< >identifiers of sequence<
# SWA_tolerance - tolerance of Smith Waterman algorithm (score/max_score)
SWA_tolerance >float<
# Smith Waterman scoring
SWA_gap_score >int<
SWA_mismatch_score >int<
SWA_match_score >int<
BLAST_e_value >float<
BLAST_database_name >the name of the blast database that will be created if needed<
BLAST_output_name >name of output file<
HD_max_distance >Maximum Hamming distance(int)<
number_of_processors >number of processors for multprocessing(int)<
Inserting config file to program:
from Database_comparator import db_compare
cfg_file = 'path_to_config_file.txt'
db = db_compare.DB_comparator(cfg_file)
Usage
from Database_comparator import db_compare
cfg_file = 'path_to_config_file.txt'
db = db_compare.DB_comparator(cfg_file)
# Modules
db_exact_match = db.exact_match # Used fot exact match search
db_aligner = db.aligner # Used for Smith Waterman algorithm
db_blast = db.blast # Used for BLAST search
db_hamming = db.hamming_distances # Used for finding Hamming distances between sequences
# Exporting results
from Database_comparator import db_compare
cfg_file = 'path_to_config_file.txt'
db = db_compare.DB_comparator(cfg_file)
# Data computing....
db.export_data_frame(output_file="Results.xlsx", data_format="xlsx")
db.export_data_frame(output_file="Results.csv", data_format="csv")
Exact match
from Database_comparator import db_compare
cfg_file = 'path_to_config_file.txt'
db = db_compare.DB_comparator(cfg_file)
# Program will search in single database for exact match with query database
db.exact_match.exact_match_search_in_single_database(database_index=0)
#Multiprocessing
db.exact_match.exact_match_search_in_single_database(database_index=0, parallel=True)
db.exact_match.exact_match_search_in_single_database_MULTIPROCESSING(database_index=0)
# Program will search all given databases for exact match with query database
db.exact_match.exact_match_search_in_all_databases()
# User can also use multiprocessing
db.exact_match.exact_match_search_in_all_databases(parallel=True)
# or
db.exact_match.exact_match_search_in_all_databases_MULTIPROCESSING()
Aligner
from Database_comparator import db_compare
cfg_file = 'path_to_config_file.txt'
db = db_compare.DB_comparator(cfg_file)
#Single core
db.aligner.smithWatermanAlgorithm_match_search_in_single_database(database_index=0)
db.aligner.smithWatermanAlgorithm_match_search_in_all_databases()
#Multiprocessing
db.aligner.smithWatermanAlgorithm_match_search_in_single_database(database_index=0, parallel=True)
db.aligner.smithWatermanAlgorithm_match_search_in_single_database_MULTIPROCESSING()
db.aligner.smithWatermanAlgorithm_match_search_in_all_databases(parallel=True)
db.aligner.smithWatermanAlgorithm_match_search_in_all_databases_MULTIPROCESSING()
BLAST
from Database_comparator import db_compare
cfg_file = 'path_to_config_file.txt'
db = db_compare.DB_comparator(cfg_file)
# Info about BLAST
db.blast.blast_database_info()
# Making BLAST database
db.blast.blast_make_database(name="BLAST_Database")
db.blast.blast_search_for_match_in_database() #Query is input database
db.blast.analyze_matches_in_database() #BLAST output will be analyzed with aligner
# User can also use this function.
db.blast.blast_search_and_analyze_matches_in_database()
Hamming distances
from Database_comparator import db_compare
cfg_file = 'path_to_config_file.txt'
db = db_compare.DB_comparator(cfg_file)
db.hamming_distances.find_hamming_distances_for_single_database(database_index=0)
db.hamming_distances.find_hamming_distances_for_all_databases()
# User can also do this
db.hamming_distances.find_hamming_distances_for_single_database(database_index=0, analyze=False)
db.hamming_distances.analyze_single_hamming_matrix(database_index=0)
db.hamming_distances.find_hamming_distances_for_all_databases(analyze=False)
db.hamming_distances.analyze_all_hamming_matrices()
# Hamming matrices are stored in >hamming_matrices_for_all_databases<
db_matrices = db.hamming_distances.hamming_matrices_for_all_databases
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Database_comparator-1.0.1.tar.gz
(14.2 kB
view details)
Built Distribution
File details
Details for the file Database_comparator-1.0.1.tar.gz
.
File metadata
- Download URL: Database_comparator-1.0.1.tar.gz
- Upload date:
- Size: 14.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac4c0eb62f010ef8d492fcc65e2645a34801a0c38ffb04e4c42fe5efd5b6d223 |
|
MD5 | c618c57d4db4c33da7c87b2743a90437 |
|
BLAKE2b-256 | a7ec0d141956c5aa453afcb661c4aa0b0fe515761f6c4e24d040ff291be703dd |
File details
Details for the file Database_comparator-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: Database_comparator-1.0.1-py3-none-any.whl
- Upload date:
- Size: 18.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e837ad0ac23a434ff9e4ad86065619be59dc3d619bf4c580f8b15ecb2576f676 |
|
MD5 | cf9f108f59221ff50f6e2983782185ff |
|
BLAKE2b-256 | 77bf4aa066ae1dc2fe2d9bcc6c1ec59b2c4b273e9e19eebaa34b5172d0a3a932 |