Skip to main content

Gradient boosted decision tree palindrome predictor, used to locate regions for further investigation thru http://palindromes.ibp.cz/

Project description

Palindrome tree

Palindrome tree tool is used for analyzing inverted repeats in various DNA sequences using decision trees. This tool takes provided sequences and finds interesting parts in which there's high probability of palindrome occurrence using decision tree. This process filters a big portion of data. Interesting data are then analyzed using API from Palindrome Analyzer. DNA Analyser is a web-based server for nucleotide sequence analysis. It has been developed thanks to cooperation of Department of Informatics, Mendel’s University in Brno and Institute of Biophysics, Academy of Sciences of the Czech Republic.

Requirements

Palindrome tree was built with Python 3.7+.

Installation

To install palindrome tree use Pypi repository.

pip install palindrome-tree

Usage

User has to initialize palindrome tree analyzer instance which is imported from main package palindrome_tree.

from palindrome_tree import PalindromeTree

tree = PalindromeTree()

Predict regions (without API validation)

To predict regions with possible palindromes, run analyse without setting check_with_api paramether.

sequence_file = open("/path/to/sequence/name.txt", "r")

results = tree.analyse(
    sequence=sequence_file.read(),
    check_with_api=False,
)

The results are then stored in results variable as pd.DataFrame.

position sequence
0 8 TTTGTAGAGACAGGGTCTTGCTGTGTTTCC
1 10 TGTAGAGACAGGGTCTTGCTGTGTTTCCCA
2 49 CGAACTCCTGGCCTCTAGGCAATCCTCCCA
3 102 ATCCCACTCTTTTTTGAAAAATAAAATCTA
4 105 CCACTCTTTTTTGAAAAATAAAATCTACCA

Predict regions (with API validation)

To predict regions with possible palindromes and afterward validation, run analyse with check_with_api paramether set.

sequence_file = open("/path/to/sequence/name.txt", "r")

results = tree.analyse(
    sequence=sequence_file.read(),
    check_with_api=True,
)

The results are also stored in results variable as pd.DataFrame.

original_index after before mismatches opposite position sequence signature spacer stability_NNModel
0 0 CC TTTGT 2 CTGTGTTT 5 AGAGACAG 8-7-2 GGTCTTG {'cruciform': -5.74, 'linear': -27.590000000000003, 'delta': 21.85}
1 0 TGCTG TTTGT 2 GGGTCT 5 AGAGAC 6-1-2 A {'cruciform': -2.54, 'linear': -13.84, 'delta': 11.3}
2 0 GTGTT TGTAG 2 CTTGCT 7 AGACAG 6-3-2 GGT {'cruciform': -1.94, 'linear': -17.509999999999998, 'delta': 15.569999999999999}
3 0 TTCC TAGAG 2 CTGTGT 9 ACAGGG 6-5-2 TCTTG {'cruciform': -3.7399999999999998, 'linear': -20.99, 'delta': 17.25}
4 1 CCCA TGT 2 CTGTGTTT 3 AGAGACAG 8-7-2 GGTCTTG {'cruciform': -5.74, 'linear': -27.590000000000003, 'delta': 21.85}

Dependencies

  • xgboost = "^1.5.1"
  • pandas = "^1.3.5"
  • scikit-learn = "^1.0.2"
  • requests = "^2.26.0"

Authors

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

palindrome-tree-1.0.0.tar.gz (8.2 MB view hashes)

Uploaded Source

Built Distribution

palindrome_tree-1.0.0-py3-none-any.whl (8.3 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page