SWeeP is a tool to representing large biological sequences datasets in compact vectors
Project description
This package is a python version of the tool described in the article available at <https://www.nature.com/articles/s41598-019-55627-4>. Please quote the article. Only amino acid sequence vectorization is currently available.
Use
To use SWeeP in python, install the package with the command “pip install sweep” and import the package in your code, as in the example:
from sweep import fastaread, fas2sweep
fasta = fastaread ("fasta_file_path")
vect = fas2sweep (fasta)
The output is the matrix already projected, with 600 columns. See the article if you need information about the projection method.
The default projection matrix has dimensions 160000x600. It is necessary to generate a new matrix in case other masks are used or another projection size is desired. To generate the orthonormal matrix for projection on the package, a function called orthbase is also available. For example, if the goal is to change the projection size to 300, just use:
from sweep import fastaread, fas2sweep, orthbase
ob = orthbase(160000,300)
fasta = fastaread ("fasta_file_path")
vect = fas2sweep (fasta, orthMat = ob)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.