biotext

The biotext library offers resources to support text mining strategy using bioinformatics tool

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

The biotext library offers resources to support text mining strategy using bioinformatics tool.

Installation

To install aminocode through pip:

pip install biotext

Tested Platforms

Python:

3.7.4

Windows (64bits):

10

Ubuntu (64bits)

18.04.1 LTS

Required external libraries

numpy
pandas
scipy
scikit-learn
matplotlib
unidecode
biopython
sweep

Functions

Function Name	Description	Input	Output
biotext.aminocode.encode_string	Encodes a string with AMINOcode.	input_string : string: Natural language text string to be encoded. detail : string: Set details in coding. ‘d’ for details in digits; ‘p’ for details on the punctuation; ‘dp’ or ‘pd’ for both.	encoded_string : string Encoded text.
biotext.aminocode.decode_string	Decodes a string with AMINOcode reverse.	input_string : string: Text string encoded with AMINOcode. detail : string: Set details in coding. ‘d’ for details in digits; ‘p’ for details on the punctuation; ‘dp’ or ‘pd’ for both.	decoded_string : string Decoded text.
biotext.aminocode.encode_list	Encodes all strings in a list with AMINOcode.	string_list : list List of string to be encoded. detail : string Set details in coding. ‘d’ for details in digits; ‘p’ for details on the punctuation; ‘dp’ or ‘pd’ for both. verbose : bool If True displays progress.	encoded_list : list List with all encoded text in string format.
biotext.aminocode.decode_list	Decodes all strings in a list with reverse AMINOcode.	string_list : list List of string encoded with aminocode. detail : string Set details in coding. ‘d’ for details in digits; ‘p’ for details on the punctuation; ‘dp’ or ‘pd’ for both. verbose : bool If True displays progress.	decoded_list : list of string List with all decoded text.
biotext.dnabits.encode_string	Encodes a string with DNAbits.	input_string : string: Natural language text string to be encoded.	encoded_string : string Encoded text.
biotext.dnabits.decode_string	Decodes a string with DNAbits reverse.	input_string : string: Text string encoded with AMINOcode.	decoded_string : string Decoded text.
biotext.dnabits.encode_list	Encodes all strings in a list with DNAbits.	string_list : list List of string to be encoded. verbose : bool If True displays progress.	encoded_list : list List with all encoded text in string format.
biotext.dnabits.decode_list	Decodes all strings in a list with reverse DNAbits.	string_list : list List of string encoded with aminocode. verbose : bool If True displays progress.	decoded_list : list of string List with all decoded text.
create_seqrecord_list	Creates a list of SeqRecordSeqRecord from a string list.	seq_list : list of string List of biological sequences in string format. header : list of string List of headers in string format, if set to ‘None’ the headers will be automatically defined with numbers in increasing order.	seqrecord_list : list of SeqRecord* List of SeqRecord*.
biotext.fastatools.import_fasta	Uses biopython to import a FASTA file.	input_file_name : string (valid file name) Input fasta file name.	seqrecord_list : list of SeqRecord* List of SeqRecord* imported from file.
biotext.fastatools.export_fasta	Creates a file using a SeqRecordSeqRecord list.	seqrecord_list : list of SeqRecord* List of SeqRecord*. output_file_name : string Output fasta file name.	A file is saved with the defined name.
biotext.fastatools.get_header	Get the header from all items in a list of SeqRecordSeqRecord.	seqrecord_list : list of SeqRecord* List of SeqRecord*.	header_list : list of string List of all headers extracted from input.
biotext.fastatools.get_seq	Get the sequences from all items in a list of SeqRecordSeqRecord.	seqrecord_list : list of SeqRecord* List of SeqRecord*.	seq_list : list of string List of all sequences extracted from input.
biotext.fastatools.remove_pattern	Removes patterns from a SeqRecord* range based on regular expression.	seq_list : list of SeqRecord* List of SeqRecord*. rex : string regular expression.	seq_list : list of SeqRecord* List of SeqRecord* with removal applied.
biotext.fastatools.run_clustalo	Uses the Clustal Omega to align the strings in a FASTA file.	input_file_name : string (valid file name) Input fasta file name.	alignment : MultipleSeqAlignment** Alignment result.
biotext.fastatools.get_consensus	Applies clustalo and obtain alignment consensus.	seqrecord_list : list of SeqRecord* List of SeqRecord*.	consensus : string Alignment consensus. alignment : list of string List of sequences with alignment gaps.
biotext.fastatools.fasta_to_mat	Performs a vectorization of a list of SeqRecord* using the SWeeP.	seq_list : list of string List of strings in FASTA format.	mat : ndarray*** Matrix with the generated vectors.
biotext.treetools.mat_to_tree	Creates a dendrogram in newick format from a matrix.	mat : ndarray*** Matrix. ids : list of string List with line identifiers in mat. method : string Method to create the dendrogram. Available options are ‘complete’, scipy library implementation, and ‘nj’ (neighbor joining), skbio library implementation. The default is the ‘complete’ method.	tree : string tree: dendrogram in newick format.

*Bio.SeqRecord.SeqRecord: Biopython object to store biological sequences and its information, as described in <https://biopython.org/docs/1.76/api/Bio.SeqRecord.html>.

**Bio.Align.MultipleSeqAlignment: Biopython object to store biological multiple sequence alignment, as described in <https://biopython.org/docs/1.76/api/Bio.Align.html>.

***numpy.ndarray: Numpy object to represent array, as described in <https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html>.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

3.0.1.0

Sep 18, 2023

3.0.0.1

Sep 16, 2023

3.0.0.0

Jul 13, 2023

This version

2.4.1.3

Nov 9, 2022

2.4.1.2

Nov 9, 2022

2.4.1.1

Nov 9, 2022

2.4.1.0

Nov 9, 2022

2.4.0.0

Mar 22, 2022

2.3.2.0

May 21, 2021

2.3.1.0

Sep 16, 2020

2.3.0.0

Aug 31, 2020

2.2.0.1

Aug 28, 2020

2.2.0.0

Aug 28, 2020

2.1.1.0

Aug 27, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

biotext-2.4.1.3-py3.8.egg (13.4 kB view hashes)

Uploaded Nov 9, 2022 Source

Hashes for biotext-2.4.1.3-py3.8.egg

Hashes for biotext-2.4.1.3-py3.8.egg
Algorithm	Hash digest
SHA256	`fd9888acdc0cbf4ff6aeda6015bf1ebfbc05f547157b3e1834ff00b339b68a25`
MD5	`576c3799d4f1786825518253c206d4bd`
BLAKE2b-256	`9d0d82b93ce924e97c16a12ace5abd224c966711afdda30b785640c8523842fe`