Skip to main content

Easy to use String Kernels for SVM

Project description

pybabs

Machine Learning algorithms already support our daily life in many areas of application. Be it self-driving cars, intelligent robots, or the field of computational biology. Identi- fying specific features from biological sequences is essential in understanding the world surrounding us and its complex connections. Forensics, the development of new drugs, and tools for predicting certain Illnesses based on our DNA are important subareas of this Topic. One standard method to do so is in the form of Support Vector Machines (SVM). SVMs are well suited for this task for several reasons, but the biggest profit comes from their ability to use the so-called kernels for their calculation. They convert the alphabetical nature of biological sequences via similarity analysis into their own numerical similarity Matrices, called Kernel. This Kernel can then be used for internal calculations. In the case of biological sequences, this special feature clearly distinguishes the support vector machines from other machine learning algorithms This thesis is based on the KEBABS Package from Johannes Palme, which was im- plemented in R. Hence, the user base of R is declining, and the functionalities of the Kebabs Package are more and more sought after in other Programming languages. Hence Python is the go-to language for processing biological Data, it makes sense to provide an easy-to-use framework for manipulating and processing biological Data in this well-supported platform. The Package offers two Kernels for Kernel-Based Sequence analysis: the spectrum kernel and the gappy pair Kernel. Both can be used with DNA, RNA, and Aminoacids sequences. The Package also explicitly represents the calculated Kernel Matrix as a whole or in the sparse format. The Package also works seamlessly with multiple existing SVM Frameworks in the Python landscape, like Libsvm and the prevalent scikit learning library. The Package also provides Cross-validation, grid search, and an unbiased model se- lection. For better biological interpretability, the weights used for the calculations can easily be extracted. Also, prediction profiles have been implemented to understand bet- ter how some parts of a sequence contribute to the overall result.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pykebabs-1.0.1.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pykebabs-1.0.1-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file pykebabs-1.0.1.tar.gz.

File metadata

  • Download URL: pykebabs-1.0.1.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for pykebabs-1.0.1.tar.gz
Algorithm Hash digest
SHA256 7abe3c31a60391992b85e08e27cc8f09651785a02be167078a86081f0ffd20a9
MD5 413f0d822e5d7e78f258073431ebe96e
BLAKE2b-256 046bf410801323375faffcc02efc0899990fa9bdb592a5e06927883d3e83833e

See more details on using hashes here.

File details

Details for the file pykebabs-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: pykebabs-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 10.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for pykebabs-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8978f0a1c4b114a337c73f63fefafb685a25d255ce33944dffd18d61577dd5a7
MD5 4d42d4dcb81e5153dabcbea136e6c62a
BLAKE2b-256 a512200c6bc7d0be2e14809f19f505ec9020ddea7acd0e2684f20dbe4dfee657

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page