Skip to main content

Bioinformatics tool for compering large sequence files

Project description

Database_comparator

License: MIT

A program for searching and analyzing databases using various algorithms.

Table of Contents

Overview

The program is for comparing and analyzing databases using various methods.

It utilizes the provided configuration to perform exact matching, sequence alignment,

BLAST searches, and calculates Hamming distances between sequences. The class allows for exporting the results to

different file formats, such as Excel, CSV, and Markdown.

Configuration of program is given by config_file.txt

Installation

# Example installation command

pip install Database_comparator

Configuration file

# Databases

QUERY HEDIMED__230620_Hedimed_1_22_basic--table_EF_predelana.xlsx part3



DB Databases/Nakayama.csv CDR3b [Clone/SequenceID, Epitope]

DB Databases/McPAS-TCR-filtred.csv CDR3.beta.aa [PubMed.ID, Pathology, Additional.study.details]

DB Databases/vdjdb.csv cdr3 [antigen.gene, antigen.species, mhc.a, gene]

DB Databases/TCRdb_all_sequnces.csv AASeq [TCRDB_project_ID, RunId, cloneFraction]



# Smith–Waterman algorithm

SWA_tolerance 0.9

SWA_gap_score -1000

SWA_mismatch_score 0

SWA_match_score 1



# Blastp Algorithm

BLAST_e_value 0.05

BLAST_database_name clip_seq_db

BLAST_output_name blastp_output.txt



# Hamming distance

HD_max_distance 1



# Multiprocessing

number_of_processors 3

Syntax of config file:


# QUERY - query database 

QUERY >Name of query database< >Name of column with sequence<



# DB - Databases with the data we want to analyze

DB >Name of data database< >Name of column with sequence< >identifiers of sequence<



# SWA_tolerance - tolerance of Smith Waterman algorithm (score/max_score)

SWA_tolerance >float<



# Smith Waterman scoring

SWA_gap_score >int<

SWA_mismatch_score >int<

SWA_match_score >int<



BLAST_e_value >float<

BLAST_database_name >the name of the blast database that will be created if needed<

BLAST_output_name >name of output file<



HD_max_distance >Maximum Hamming distance(int)<



number_of_processors >number of processors for multprocessing(int)<

Inserting config file to program:

from Database_comparator import db_compare



cfg_file = 'path_to_config_file.txt'

db = db_compare.DB_comparator(cfg_file)

Usage

from Database_comparator import db_compare



cfg_file = 'path_to_config_file.txt'

db = db_compare.DB_comparator(cfg_file)



# Modules

db_exact_match = db.exact_match # Used fot exact match search

db_aligner = db.aligner # Used for Smith Waterman algorithm

db_blast = db.blast # Used for BLAST search

db_hamming = db.hamming_distances # Used for finding Hamming distances between sequences
# Exporting results

from Database_comparator import db_compare



cfg_file = 'path_to_config_file.txt'

db = db_compare.DB_comparator(cfg_file)



# Data computing....



db.export_data_frame(output_file="Results.xlsx", data_format="xlsx")

db.export_data_frame(output_file="Results.csv", data_format="csv")

Exact match

from Database_comparator import db_compare



cfg_file = 'path_to_config_file.txt'

db = db_compare.DB_comparator(cfg_file)



# Program will search in single database for exact match with query database

db.exact_match.exact_match_search_in_single_database(database_index=0)

#Multiprocessing

db.exact_match.exact_match_search_in_single_database(database_index=0, parallel=True)

db.exact_match.exact_match_search_in_single_database_MULTIPROCESSING(database_index=0)

# Program will search all given databases for exact match with query database

db.exact_match.exact_match_search_in_all_databases()



# User can also use multiprocessing

db.exact_match.exact_match_search_in_all_databases(parallel=True)

# or

db.exact_match.exact_match_search_in_all_databases_MULTIPROCESSING()

Aligner

from Database_comparator import db_compare



cfg_file = 'path_to_config_file.txt'

db = db_compare.DB_comparator(cfg_file)



#Single core

db.aligner.smithWatermanAlgorithm_match_search_in_single_database(database_index=0)

db.aligner.smithWatermanAlgorithm_match_search_in_all_databases()



#Multiprocessing

db.aligner.smithWatermanAlgorithm_match_search_in_single_database(database_index=0, parallel=True)

db.aligner.smithWatermanAlgorithm_match_search_in_single_database_MULTIPROCESSING()

db.aligner.smithWatermanAlgorithm_match_search_in_all_databases(parallel=True)

db.aligner.smithWatermanAlgorithm_match_search_in_all_databases_MULTIPROCESSING()

BLAST

from Database_comparator import db_compare

cfg_file = 'path_to_config_file.txt'

db = db_compare.DB_comparator(cfg_file)

# Info about BLAST

db.blast.blast_database_info()



# Making BLAST database

db.blast.blast_make_database(name="BLAST_Database")



db.blast.blast_search_for_match_in_database() #Query is input database

db.blast.analyze_matches_in_database() #BLAST output will be analyzed with aligner



# User can also use this function.

db.blast.blast_search_and_analyze_matches_in_database()

Hamming distances

from Database_comparator import db_compare

cfg_file = 'path_to_config_file.txt'

db = db_compare.DB_comparator(cfg_file)



db.hamming_distances.find_hamming_distances_for_single_database(database_index=0)

db.hamming_distances.find_hamming_distances_for_all_databases()



# User can also do this

db.hamming_distances.find_hamming_distances_for_single_database(database_index=0, analyze=False)

db.hamming_distances.analyze_single_hamming_matrix(database_index=0)



db.hamming_distances.find_hamming_distances_for_all_databases(analyze=False)

db.hamming_distances.analyze_all_hamming_matrices()



# Hamming matrices are stored in >hamming_matrices_for_all_databases<

db_matrices = db.hamming_distances.hamming_matrices_for_all_databases

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Database_comparator-1.0.1.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

Database_comparator-1.0.1-py3-none-any.whl (18.3 kB view details)

Uploaded Python 3

File details

Details for the file Database_comparator-1.0.1.tar.gz.

File metadata

  • Download URL: Database_comparator-1.0.1.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for Database_comparator-1.0.1.tar.gz
Algorithm Hash digest
SHA256 ac4c0eb62f010ef8d492fcc65e2645a34801a0c38ffb04e4c42fe5efd5b6d223
MD5 c618c57d4db4c33da7c87b2743a90437
BLAKE2b-256 a7ec0d141956c5aa453afcb661c4aa0b0fe515761f6c4e24d040ff291be703dd

See more details on using hashes here.

File details

Details for the file Database_comparator-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for Database_comparator-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e837ad0ac23a434ff9e4ad86065619be59dc3d619bf4c580f8b15ecb2576f676
MD5 cf9f108f59221ff50f6e2983782185ff
BLAKE2b-256 77bf4aa066ae1dc2fe2d9bcc6c1ec59b2c4b273e9e19eebaa34b5172d0a3a932

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page