Interconvert various file formats supported by biopython. Supports querying records with JMESPath.
Project description
BioPython-Convert
Interconvert various file formats supported by BioPython.
Supports querying records with JMESPath.
Installation
pip install biopython-convert
or:
conda install biopython-convert
or:
git clone https://github.com/brinkmanlab/BioPython-Convert.git cd BioPython-Convert ./setup.py install
Use
biopython.convert [-s] [-v] [-i] [-q JMESPath] input_file input_type output_file output_type -s Split records into seperate files -q JMESPath to select records. Must return list of SeqIO records or mappings. Root is list of input SeqIO records. -i Print out details of records during conversion -v Print version and exit
- Supported formats
abi, abi-trim, ace, cif-atom, cif-seqres, clustal, embl, fasta, fasta-2line, fastq-sanger, fastq, fastq-solexa, fastq-illumina, genbank, gb, ig, imgt, nexus, pdb-seqres, pdb-atom, phd, phylip, pir, seqxml, sff, sff-trim, stockholm, swiss, tab, qual, uniprot-xml, gff3, txt, json, yaml
JMESPath
The root node for a query is a list of SeqRecord objects. The query can return a list with a subset of these or a mapping, keying to the constructor parameters of a SeqRecord object.
If the formats are txt, json, or yaml, then the JMESPath resulting object will simply be dumped in those formats.
A web based tool is available to experiment with constructing queries in real time on your data. Simply convert your dataset to JSON and load it into the JMESPath playground to begin composing your query. It supports loading JSON files directly rather than trying to copy/paste the data.
split() and let() functions are available in addition to the JMESPath standard functions
extract(Seq, SeqFeature) is also made available to allow access to the SeqFeature.extract() function within the query
- Examples:
Append a new record:
[@, [{'seq': 'AAAA', 'name': 'my_new_record'}]] | []
Filter out any plasmids:
[?!(features[?type=='source'].qualifiers.plasmid)]
Keep only the first record:
[0]
Output taxonomy of each record (txt output):
[*].annotations.taxonomy
Output json object containing id and molecule type:
[*].{id: id, type: annotations.molecule_type}
Convert dataset to PTT format using text output:
[0].[join(' - 1..', [description, to_string(length(seq))]), join(' ', [to_string(length(features[?type=='CDS' && qualifiers.translation])), 'proteins']), join(`"\t"`, ['Location', 'Strand', 'Length', 'PID', 'Gene', 'Synonym', 'Code', 'COG', 'Product']), (features[?type=='CDS' && qualifiers.translation].[join('..', [to_string(sum([location.start, `1`])), to_string(location.end)]), [location.strand][?@==`1`] && '+' || '-', length(qualifiers.translation[0]), (qualifiers.db_xref[?starts_with(@, 'GI')].split(':', @)[1])[0] || '-', qualifiers.gene[0] || '-', qualifiers.locus_tag[0] || '-', '-', '-', qualifiers.product[0] ] | [*].join(`"\t"`, [*].to_string(@)) )] | []
Convert dataset to faa format using fasta output:
[0].let({org: (annotations.organism || annotations.source)}, &(features[?type=='CDS' && qualifiers.translation].{id: join('|', [ (qualifiers.db_xref[?starts_with(@, 'GI')].['gi', split(':', @)[1]]), (qualifiers.protein_id[*].['ref', @]), (qualifiers.locus_tag[*].['locus', @]), join('', [':', [location][?strand==`-1`] && 'c' || '', to_string(sum([location.start, `1`])), '..', to_string(location.end)]) ][][]), seq: qualifiers.translation[0], description: (org && join('', [qualifiers.product[0], ' [', org, ']']) || qualifiers.product[0])}))
See CONTRIBUTING.rst for information on contributing to this repo.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file biopython.convert-1.3.3.tar.gz
.
File metadata
- Download URL: biopython.convert-1.3.3.tar.gz
- Upload date:
- Size: 30.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5cb2df5321eb1455d469bb33bc27f2feb104b14778c733d275d95ef00b5ac1d8 |
|
MD5 | 4e0f5e15a3da0e371765b4af5df56f96 |
|
BLAKE2b-256 | 8ea9b79d1900c4e6522c96d44ce380671789dbb01368e1de80a0ab1b3a477a09 |