pythonic access to fasta sequence files
Project description
- Email:
- License:
MIT
Implementation
Requires Python >= 2.5. Stores a flattened version of the fasta file without spaces or headers. And a pickle of the start, stop (for fseek) locations of each header in the fasta file for internal use. Now supports the numpy array interface.
Usage
>>> from pyfasta import Fasta >>> f = Fasta('tests/data/three_chrs.fasta') >>> sorted(f.keys()) ['chr1', 'chr2', 'chr3'] >>> f['chr1'] FastaRecord('tests/data/three_chrs.fasta.flat', 0..80)
Slicing
>>> f['chr1'][:10] 'ACTGACTGAC' # get the 1st basepair in every codon (it's python yo) >>> f['chr1'][::3] 'AGTCAGTCAGTCAGTCAGTCAGTCAGT' # the index stores the start and stop of each header from the fasta file. # (you should never need this) >>> f.index {'chr3': (160, 3760), 'chr2': (80, 160), 'chr1': (0, 80)} # can query by a 'feature' dictionary >>> f.sequence({'chr': 'chr1', 'start': 2, 'stop': 9}) 'CTGACTGA' # with reverse complement for - strand >>> f.sequence({'chr': 'chr1', 'start': 2, 'stop': 9, 'strand': '-'}) 'TCAGTCAG'
Numpy Array Interface
# FastaRecords support the numpy array interface. >>> import numpy as np >>> a = np.array(f['chr2']) >>> a.shape[0] == len(f['chr2']) True >>> a[10:14] array(['A', 'A', 'A', 'A'], dtype='|S1') # cleanup (though for real use these will remain for faster access) >>> import os >>> os.unlink('tests/data/three_chrs.fasta.gdx') >>> os.unlink('tests/data/three_chrs.fasta.flat')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyfasta-0.2.5.tar.gz
(6.0 kB
view hashes)