Skip to main content

The biotext library offers resources to support text mining strategy using bioinformatics tool

Project description

The biotex library offers resources to support text mining strategy using bioinformatics tool.

Stand alone tools based on library are available at link <https://sourceforge.net/projects/biotex-tools/>.

Installation

To install aminocode through pip:

pip install biotex

Tested Platforms

  • Python:

  • 3.7.4

  • Windows (64bits):

  • 10

  • Ubuntu (64bits)

  • 18.04.1 LTS

Required external libraries

  • numpy

  • pandas

  • scipy

  • scikit-learn

  • matplotlib

  • unidecode

  • biopython

  • sweep

Functions

Function Name

Description

Input

Output

biotex.aminocode.encodeText biotex.aminocode.encodetext biotex.aminocode.et

Encodes a string with AMINOcode.

text: natural language text string to be encoded; detailing: details in coding. ‘d’ for details in digits. ‘p’ for details on the punctuation. ‘dp’ or ‘pd’ for both.

encode text in string format.

biotex.aminocode.decodeText biotex.aminocode.decodetext biotex.aminocode.dt

Decodes a string with reverse AMINOcode.

text: text string encoded using the encodefile function to be decode; detailing: details used in the text to be decoded. ‘d’ for details in digits. ‘p’ for details on the punctuation. ‘dp’ or ‘pd’ for both.

decode text in string format.

biotex.aminocode.encodeFile biotex.aminocode.encodefile biotex.aminocode.ef

Encodes a text file or a list of strings with AMINOcode.

input_file_name: text file name or list of string. It can also be used in a list of SeqRecord, in which case the function will automatically extract the headers to do the encoding; output_file_name: the name for the output file. If not defined, the result will only be returned as a variable; detailing: same as in the encodetext function; header_format: format for the headers of the generated FASTA. It can be ‘number+originaltext’, ‘number’ or ‘originaltext’. ‘number’ is a count of the lines in the input file. Blank lines are considered in the count, but are not added to the FASTA file. ‘originaltext’ is the input text itself; verbose: if True displays progress.

list of SeqRecord*; If defined output_file_name a file will be saved.

biotex.aminocode.decodeFile biotex.aminocode.decodefile biotex.aminocode.df

Decodes a fasta file or a list of SeqRecord with the reverse amino acid.

input_file_name: file name or list of SeqRecord; output_file_name: the name for the output file. If not defined, the result will only be returned as a variable; detailing: same as in the decodetext function; verbose: if True displays progress; output: string list. If defined output_file_name a file will be saved.

string list; if defined output_file_name a file will be saved.

biotex. dnabits.encodeText biotex.dnabits.encodetext biotex. dnabits.et

Encodes a string with DNAbits.

text: natural language text string to be encoded.

encode text in string format.

biotex.dnabits.decodeText biotex.dnabits.decodetext biotex.dnabits.dt

Decodes a string with reverse DNAbits.

text: text string encoded using the encodefile function to be decode.

decode text in string format.

biotex.dnabits.encodeFile biotex.dnabits.encodefile biotex.dnabits.ef

Encodes a text file or a list of strings with DNAbits.

input_file_name: text file name or list of string. It can also be used in a list of SeqRecord, in which case the function will automatically extract the headers to do the encoding; output_file_name: the name for the output file. If not defined, the result will only be returned as a variable; header_format: format for the headers of the generated FASTA. It can be ‘number+originaltext’, ‘number’ or ‘originaltext’. ‘number’ is a count of the lines in the input file. Blank lines are considered in the count, but are not added to the FASTA file. ‘originaltext’ is the input text itself; verbose: if True displays progress.

list of SeqRecord. if defined output_file_name a file will be saved.

biotex.dnabits.decodeFile biotex.dnabits.decodefile biotex.dnabits.df

Decodes a text file or a list of SeqRecord with reverse DNAbits.

input_file_name: file name or list of SeqRecord; output_file_name: the name for the output file. If not defined, the result will only be returned as a variable; verbose: if True displays progress.

string list; if defined output_file_name a file will be saved.

biotex.fastatools.list2SeqRecord biotex.fastatools.list2seqrecord biotex.fastatools.list2bioSeqRecord biotex.fastatools.list2bioseqrecord biotex.fastatools.list2fasta

Converts a list of strings to a list of SeqRecord, a biopython object that holds Biological sequences and information about it.

seq: list of biological sequences in string format; header: list of headers in string format, if set to “None” the headers will be automatically defined with an increasing number.

list of SeqRecord.

biotex.fastatools.fastaRead biotex.fastatools.fastaread

Uses biopython to import a FASTA file.

input_file_name: input fasta file name.

list of SeqRecord.

biotex.fastatools.fastaWrite biotex.fastatools.fastawrite

Create a file using a list of SeqRecord.

records: list of SeqRecord; output_file_name: output fasta file name.

a file is saved with the defined name.

biotex.fastatools.getHeader biotex.fastatools.getheader

Extracts the header from a list of SeqRecord.

records: list of SeqRecord.

list with headers.

biotex.fastatools.getSeq biotex.fastatools.getseq

Extracts the string from a list of SeqRecord.

records: list of SeqRecord.

list with sequences.

biotex.fastatools.removePattern biotex.fastatools.removepattern

Removes patterns from a SeqRecord range based on regular expression.

records: list of SeqRecord; rex: regular expression.

list of SeqRecord with removal applied.

biotex.fastatools.clustalOmega biotex.fastatools.clustalomega biotex.fastatools.clustalo

Uses the Clustal Omega to align the strings in a FASTA file.

input_file_name: input FASTA file name.

list with strings aligned in string format.

biotex.fastatools.getCons biotex.fastatools.getcons

Save a temporary file with the sequences from the list of SeqRecord, apply the clustalo function and obtain alignment consensus.

records: SeqRecord.list.

consensus for alignment in string format.

*SeqRecord: Biopython object to store biological sequences and its information, as described in <https://biopython.org/DIST/docs/api/Bio.SeqRecord.SeqRecord-class.html>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

biotex-1.0.0.0-py3-none-any.whl (17.1 kB view details)

Uploaded Python 3

File details

Details for the file biotex-1.0.0.0-py3-none-any.whl.

File metadata

  • Download URL: biotex-1.0.0.0-py3-none-any.whl
  • Upload date:
  • Size: 17.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.6

File hashes

Hashes for biotex-1.0.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a9bdf87feaa9ba3b0acbe4b46b18ba2a4957599e7adbfce05b4fc47e815b5ecf
MD5 c167ff8698e18c489b88193a3065e0ff
BLAKE2b-256 95ab548cdafe0930ea12d0c4045cacaa63c894b1f6cdaf778a8c867694916243

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page