A small package to query data from the OceanIA services
Project description
OcéanIA Query Fasta
OcéanIA Query Fasta lets you query large FASTA files available in the OcéanIA Platform for extracting parts of biologic sequences
What is FASTA format
FASTA format is a text-based format for representing either nucleotide sequences or amino acid sequences, in which base pairs or amino acids are represented using single-letter codes. A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. An example sequence in FASTA format is:
>TARA_X000000368_G_scaffold34_1_gene23 strand:- start:199 stop:642 length:444 start_codon:yes stop_codon:yes gene_type:complete
ATGAATACTCTTACTCGAATAAGCCTGACAATTTTTTTACTTTTGATGGCAGCTGTCTAT
TTACCAATTGGTTTATGGGCAATTATTGCTCCAGCTCAGGATGCCCTTGGTTTAGAACTA
CCTTCTTTTTATGAAGCTGTAGGCTTATCTGTAATATCTCCAATTGGGTATTCAGAATTT
GCAGGTATATATGGGGGCATTAATATTGTCATTGGCGTGATGTTCCTAATAGGCGTTTTT
AAAAAACAGGTCGGACTATTTGCTATAAAAGTTCTTGTATTTCTTGTTGGCTCAATAGCT
CTTGGAAGATTCTTGCTAATGTTGCTTGGATCCCAGGCAGGATTACCTGCAGAAATTAAT
GCTTTTCTTATCTTTGAAATAATTGTTTTCTTTATAGGTATTATTTTTATTAAAGTCCTA
AAAAACACTGATCATGTTACTTAG
Installation and usage
Installation
OcéanIA Query Fasta can be installed by running pip install oceania-query-fasta
. It requires Python >= 3.6 to run.
Usage
The library can be used as a command line tool or imported as a python library.
As a python package
from oceania import get_sequences_from_fasta
TARA_SAMPLE_ID = "TARA_A100000171"
# REQUEST_PARAMS is a list of tuples that identify subsequences to extract
# each tuple must have the values (sequence_id, start_index, stop_index, sequence_type)
# sequence type accepted values are [raw, complement, reverse_complement], optional value if ommited defaults to "raw".
REQUEST_PARAMS = [
("TARA_A100000171_G_scaffold48_1", 10, 50, "complement"),
("TARA_A100000171_G_scaffold48_1", 10, 50),
("TARA_A100000171_G_scaffold48_1", 10, 50, "reverse_complement"),
("TARA_A100000171_G_scaffold181_1", 0, 50),
("TARA_A100000171_G_scaffold181_1", 100, 200),
("TARA_A100000171_G_scaffold181_1", 200, 230),
("TARA_A100000171_G_scaffold493_2", 54, 76),
("TARA_A100000171_G_scaffold50396_2", 87, 105),
("TARA_A100000171_G_C2001995_1", 20, 635),
("TARA_A100000171_G_C2026460_1", 0, 100),
]
request_result = get_sequences_from_fasta(
TARA_SAMPLE_ID,
REQUEST_PARAMS
)
# get_sequences_from_fasta returns a pandas.DataFrame with the extracted sequences
print(request_result)
Command line
In the command line the query feature is available as oceania query-fasta <key> <query_file> <output_format> <output_file>
> oceania query-fasta -h
Usage: oceania query-fasta [OPTIONS] <key> <query_file> <output_format> <output_file>
Extract secuences from a fasta file in the OcéanIA Platform.
<sample_id> sample id in the OcéanIA Platform
<query_file> CSV file containing the values to query.
Each line represents a sequence to extract in the format "sequence_id,start,end,type"
"sequence_id" sequence ID
"start" start index position of the sequence to be extracted
"end" end index position of the sequence to extract
"type" type of the sequence to extract
options are ["raw", "complement", "reverse_complement"]
type value is optional, if not provided default is "raw"
<output_format> results format
options are ["csv", "fasta"]
<output_file> name of the file to write the results
Code examples
For more examples visit this repository
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file oceania-query-fasta-0.1.7.tar.gz
.
File metadata
- Download URL: oceania-query-fasta-0.1.7.tar.gz
- Upload date:
- Size: 14.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e767be9384c3a9de95f8ca04cf9b41f257e976729bdd0843bef25b992e5bc959 |
|
MD5 | 4624d6e2d17da21ff1b11ad142ab82b5 |
|
BLAKE2b-256 | 40324e0568acf349fd2393c77d353d084811c6422e546af4f12a2ba3215977c0 |
File details
Details for the file oceania_query_fasta-0.1.7-py3-none-any.whl
.
File metadata
- Download URL: oceania_query_fasta-0.1.7-py3-none-any.whl
- Upload date:
- Size: 14.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5bd4b19917f5ec3072e0bed6edddaa6ce0c9194f59dbe18711875d344bb6df8 |
|
MD5 | 870aa1831da5313f01bbee891ca52017 |
|
BLAKE2b-256 | 76d1c26899824fde3a696c512c8766ef21336414ed4e23eeeb107274740f5e22 |