Skip to main content

The biotext library offers resources to support text mining strategy using bioinformatics tool

Project description

The biotext library offers resources to support text mining strategy using bioinformatics tool.

Stand alone tools based on library are available at link <https://sourceforge.net/projects/biotext-tools/>.

Installation

To install aminocode through pip:

pip install biotext

Tested Platforms

  • Python:

  • 3.7.4

  • Windows (64bits):

  • 10

  • Ubuntu (64bits)

  • 18.04.1 LTS

Required external libraries

  • numpy

  • pandas

  • scipy

  • scikit-learn

  • matplotlib

  • unidecode

  • biopython

  • sweep

Functions

Function Name

Description

Input

Output

biotext.aminocode.encode_string

Encodes a string with AMINOcode.

input_stringstring:

Natural language text string to be encoded.

detailstring:

Set details in coding. ‘d’ for details in digits; ‘p’ for details on the punctuation; ‘dp’ or ‘pd’ for both.

encoded_stringstring

Encoded text.

biotext.aminocode.decode_string

Decodes a string with AMINOcode reverse.

input_stringstring:

Text string encoded with AMINOcode.

detailstring:

Set details in coding. ‘d’ for details in digits; ‘p’ for details on the punctuation; ‘dp’ or ‘pd’ for both.

decoded_stringstring

Decoded text.

biotext.aminocode.encode_list

Encodes all strings in a list with AMINOcode.

string_listlist

List of string to be encoded.

detailstring

Set details in coding. ‘d’ for details in digits; ‘p’ for details on the punctuation; ‘dp’ or ‘pd’ for both.

verbosebool

If True displays progress.

encoded_listlist

List with all encoded text in string format.

biotext.aminocode.decode_list

Decodes all strings in a list with reverse AMINOcode.

string_listlist

List of string encoded with aminocode.

detailstring

Set details in coding. ‘d’ for details in digits; ‘p’ for details on the punctuation; ‘dp’ or ‘pd’ for both.

verbosebool

If True displays progress.

decoded_listlist of string

List with all decoded text.

biotext.dnabits.encode_string

Encodes a string with DNAbits.

input_stringstring:

Natural language text string to be encoded.

encoded_stringstring

Encoded text.

biotext.dnabits.decode_string

Decodes a string with DNAbits reverse.

input_stringstring:

Text string encoded with AMINOcode.

decoded_stringstring

Decoded text.

biotext.dnabits.encode_list

Encodes all strings in a list with DNAbits.

string_listlist

List of string to be encoded.

verbosebool

If True displays progress.

encoded_listlist

List with all encoded text in string format.

biotext.dnabits.decode_list

Decodes all strings in a list with reverse DNAbits.

string_listlist

List of string encoded with aminocode.

verbosebool

If True displays progress.

decoded_listlist of string

List with all decoded text.

create_seqrecord_list

Creates a list of SeqRecord*SeqRecord* from a string list.

seq_listlist of string

List of biological sequences in string format.

headerlist of string

List of headers in string format, if set to ‘None’ the headers will be automatically defined with numbers in increasing order.

seqrecord_listlist of SeqRecord*

List of SeqRecord*.

biotext.fastatools.import_fasta

Uses biopython to import a FASTA file.

input_file_namestring (valid file name)

Input fasta file name.

seqrecord_listlist of SeqRecord*

List of SeqRecord* imported from file.

biotext.fastatools.export_fasta

Creates a file using a SeqRecord*SeqRecord* list.

seqrecord_listlist of SeqRecord*

List of SeqRecord*.

output_file_namestring

Output fasta file name.

A file is saved with the defined name.

biotext.fastatools.get_header

Get the header from all items in a list of SeqRecord*SeqRecord*.

seqrecord_listlist of SeqRecord*

List of SeqRecord*.

header_listlist of string

List of all headers extracted from input.

biotext.fastatools.get_seq

Get the sequences from all items in a list of SeqRecord*SeqRecord*.

seqrecord_listlist of SeqRecord*

List of SeqRecord*.

seq_listlist of string

List of all sequences extracted from input.

biotext.fastatools.remove_pattern

Removes patterns from a SeqRecord* range based on regular expression.

seq_listlist of SeqRecord*

List of SeqRecord*.

rexstring

regular expression.

seq_listlist of SeqRecord*

List of SeqRecord* with removal applied.

biotext.fastatools.run_clustalo

Uses the Clustal Omega to align the strings in a FASTA file.

input_file_namestring (valid file name)

Input fasta file name.

alignmentMultipleSeqAlignment**

Alignment result.

biotext.fastatools.get_consensus

Applies clustalo and obtain alignment consensus.

seqrecord_listlist of SeqRecord*

List of SeqRecord*.

consensusstring

Alignment consensus.

alignmentlist of string

List of sequences with alignment gaps.

biotext.fastatools.fasta_to_mat

Performs a vectorization of a list of SeqRecord* using the SWeeP.

seq_listlist of string

List of strings in FASTA format.

matndarray***

Matrix with the generated vectors.

biotext.treetools.mat_to_tree

Creates a dendrogram in newick format from a matrix.

matndarray***

Matrix.

idslist of string

List with line identifiers in mat.

methodstring

Method to create the dendrogram. Available options are ‘complete’, scipy library implementation, and ‘nj’ (neighbor joining), skbio library implementation. The default is the ‘complete’ method.

treestring

tree: dendrogram in newick format.

*Bio.SeqRecord.SeqRecord: Biopython object to store biological sequences and its information, as described in <https://biopython.org/docs/1.76/api/Bio.SeqRecord.html> **Bio.Align.MultipleSeqAlignment: Biopython object to store biological multiple sequence alignment, as described in <https://biopython.org/docs/1.76/api/Bio.Align.html> ***ndarray: Numpy object to represent array, as described in https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

biotext-2.4.1.0-py3.8.egg (13.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page