Read bioinformatics sequence formats into a Pandas DataFrame
Project description
SeqPandas
Import genomic data to get a custom Pandas & Biopython hybrid class object with fancy shortcuts to make Machine Learning preprocessing easy!
Free software: MIT license
Documentation: https://seqpandas.readthedocs.io.
Installation
pip install seqpandas
Usage
import seqpandas as spd
# Direct File Path
df = spd.read_seq('file.fasta', format='fasta')
df = spd.read_seq('file.sam', format='sam')
df = spd.read_vcf('file.vcf', format='vcf')
df = spd.read_bed('file.bed', format='bed')
# Just need BioPython Seqs? No problem!
seqrecords = spd.read('file.fasta', format='fasta')
# Already Opened BioPython Handle
from Bio import SeqIO
seqrecords = SeqIO.parse('file.fasta', format='fasta')
df = spd.BioDataFrame.from_seqrecords(seqrecords)
Tutorial
For a complete walkthrough and to use it for a machine learning pipeline please follow the tutorial notebook.
Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
History
0.0.1 (2022-02-17)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
seqpandas-0.0.2.tar.gz
(11.4 MB
view hashes)
Built Distribution
Close
Hashes for seqpandas-0.0.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f0552d8adeb3eca96a62a0928c368f1b6e75817bb7f9882ffca5c5e6be9dab43 |
|
MD5 | 62618cb48c01eaf233e0cfaa4deddce6 |
|
BLAKE2b-256 | 7816cede1a50e99a45f0d9735245d848bbc68f99d57b191d9a16f0efecf368b8 |