pythonic access to fasta sequence files
Project description
- Description:
pythonic access to fasta sequence files.
- Email:
- License:
MIT
Implementation
Requires Python >= 2.6. Stores a flattened version of the fasta file without spaces or headers. And a pickle of the start, stop (for fseek) locations of each header in the fasta file for internal use.
Usage
>>> from pyfasta import Fasta
>>> f = Fasta('tests/data/three_chrs.fasta')
>>> sorted(f.keys())
['chr1', 'chr2', 'chr3']
>>> f['chr1']
FastaRecord('tests/data/three_chrs.fasta.flat', 0..80)
>>> f['chr1'][:10]
'ACTGACTGAC'
# the index stores the start and stop of each header from teh fasta file
>>> f.index
{'chr3': (160, 3760), 'chr2': (80, 160), 'chr1': (0, 80)}
# can query by a 'feature' dictionary
>>> f.sequence({'chr': 'chr1', 'start': 2, 'stop': 9})
'CTGACTGA'
# with reverse complement for - strand
>>> f.sequence({'chr': 'chr1', 'start': 2, 'stop': 9, 'strand': '-'})
'TCAGTCAG'
# creates a .flat and a .gdx pickle of the fasta and the index.
>>> import os
>>> sorted(os.listdir('tests/data/'))[1:]
['three_chrs.fasta', 'three_chrs.fasta.flat', 'three_chrs.fasta.gdx']
# cleanup (though for real use these will remain for faster access)
>>> os.unlink('tests/data/three_chrs.fasta.gdx')
>>> os.unlink('tests/data/three_chrs.fasta.flat')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyfasta-0.2.2.tar.gz
(5.1 kB
view details)
File details
Details for the file pyfasta-0.2.2.tar.gz
.
File metadata
- Download URL: pyfasta-0.2.2.tar.gz
- Upload date:
- Size: 5.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 20d048f5eec76cd55c327863e1a28b983430056b9c1f5c09f45d30c7aa645af8 |
|
MD5 | c744d505b8483a49834a3d10ab13d949 |
|
BLAKE2b-256 | 42d9fe2fc89cc25862f5c7fbacb4b0aa0cdff2ff8a47c2d6bd2307639e2cbb2f |