Skip to main content

The biotext library offers resources to support text mining strategy using bioinformatics tool

Project description

The biotext library offers resources to support text mining strategy using bioinformatics tool.

Installation

To install aminocode through pip:

pip install biotext

Tested Platforms

  • Python:

  • 3.7.4

  • Windows (64bits):

  • 10

  • Ubuntu (64bits)

  • 18.04.1 LTS

Required external libraries

  • numpy

  • pandas

  • scipy

  • scikit-learn

  • matplotlib

  • unidecode

  • biopython

  • sweep

Functions

Function Name

Description

Input

Output

biotext.aminocode.encode_string

Encodes a string with AMINOcode.

input_string : string:

Natural language text string to be encoded.

detail : string:

Set details in coding. ‘d’ for details in digits; ‘p’ for details on the punctuation; ‘dp’ or ‘pd’ for both.

encoded_string : string

Encoded text.

biotext.aminocode.decode_string

Decodes a string with AMINOcode reverse.

input_string : string:

Text string encoded with AMINOcode.

detail : string:

Set details in coding. ‘d’ for details in digits; ‘p’ for details on the punctuation; ‘dp’ or ‘pd’ for both.

decoded_string : string

Decoded text.

biotext.aminocode.encode_list

Encodes all strings in a list with AMINOcode.

string_list : list

List of string to be encoded.

detail : string

Set details in coding. ‘d’ for details in digits; ‘p’ for details on the punctuation; ‘dp’ or ‘pd’ for both.

verbose : bool

If True displays progress.

encoded_list : list

List with all encoded text in string format.

biotext.aminocode.decode_list

Decodes all strings in a list with reverse AMINOcode.

string_list : list

List of string encoded with aminocode.

detail : string

Set details in coding. ‘d’ for details in digits; ‘p’ for details on the punctuation; ‘dp’ or ‘pd’ for both.

verbose : bool

If True displays progress.

decoded_list : list of string

List with all decoded text.

biotext.dnabits.encode_string

Encodes a string with DNAbits.

input_string : string:

Natural language text string to be encoded.

encoded_string : string

Encoded text.

biotext.dnabits.decode_string

Decodes a string with DNAbits reverse.

input_string : string:

Text string encoded with AMINOcode.

decoded_string : string

Decoded text.

biotext.dnabits.encode_list

Encodes all strings in a list with DNAbits.

string_list : list

List of string to be encoded.

verbose : bool

If True displays progress.

encoded_list : list

List with all encoded text in string format.

biotext.dnabits.decode_list

Decodes all strings in a list with reverse DNAbits.

string_list : list

List of string encoded with aminocode.

verbose : bool

If True displays progress.

decoded_list : list of string

List with all decoded text.

create_seqrecord_list

Creates a list of SeqRecord*SeqRecord* from a string list.

seq_list : list of string

List of biological sequences in string format.

header : list of string

List of headers in string format, if set to ‘None’ the headers will be automatically defined with numbers in increasing order.

seqrecord_list : list of SeqRecord*

List of SeqRecord*.

biotext.fastatools.import_fasta

Uses biopython to import a FASTA file.

input_file_name : string (valid file name)

Input fasta file name.

seqrecord_list : list of SeqRecord*

List of SeqRecord* imported from file.

biotext.fastatools.export_fasta

Creates a file using a SeqRecord*SeqRecord* list.

seqrecord_list : list of SeqRecord*

List of SeqRecord*.

output_file_name : string

Output fasta file name.

A file is saved with the defined name.

biotext.fastatools.get_header

Get the header from all items in a list of SeqRecord*SeqRecord*.

seqrecord_list : list of SeqRecord*

List of SeqRecord*.

header_list : list of string

List of all headers extracted from input.

biotext.fastatools.get_seq

Get the sequences from all items in a list of SeqRecord*SeqRecord*.

seqrecord_list : list of SeqRecord*

List of SeqRecord*.

seq_list : list of string

List of all sequences extracted from input.

biotext.fastatools.remove_pattern

Removes patterns from a SeqRecord* range based on regular expression.

seq_list : list of SeqRecord*

List of SeqRecord*.

rex : string

regular expression.

seq_list : list of SeqRecord*

List of SeqRecord* with removal applied.

biotext.fastatools.run_clustalo

Uses the Clustal Omega to align the strings in a FASTA file.

input_file_name : string (valid file name)

Input fasta file name.

alignment : MultipleSeqAlignment**

Alignment result.

biotext.fastatools.get_consensus

Applies clustalo and obtain alignment consensus.

seqrecord_list : list of SeqRecord*

List of SeqRecord*.

consensus : string

Alignment consensus.

alignment : list of string

List of sequences with alignment gaps.

biotext.fastatools.fasta_to_mat

Performs a vectorization of a list of SeqRecord* using the SWeeP.

seq_list : list of string

List of strings in FASTA format.

mat : ndarray***

Matrix with the generated vectors.

biotext.treetools.mat_to_tree

Creates a dendrogram in newick format from a matrix.

mat : ndarray***

Matrix.

ids : list of string

List with line identifiers in mat.

method : string

Method to create the dendrogram. Available options are ‘complete’, scipy library implementation, and ‘nj’ (neighbor joining), skbio library implementation. The default is the ‘complete’ method.

tree : string

tree: dendrogram in newick format.

*Bio.SeqRecord.SeqRecord: Biopython object to store biological sequences and its information, as described in <https://biopython.org/docs/1.76/api/Bio.SeqRecord.html>.

**Bio.Align.MultipleSeqAlignment: Biopython object to store biological multiple sequence alignment, as described in <https://biopython.org/docs/1.76/api/Bio.Align.html>.

***numpy.ndarray: Numpy object to represent array, as described in <https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html>.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

biotext-2.4.1.3-py3.8.egg (13.4 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page