Skip to main content

A small package to query data from the OceanIA services

Project description

OcéanIA Query Fasta

OcéanIA Query Fasta lets you query large FASTA files available in the OcéanIA Platform for extracting parts of biologic sequences

What is FASTA format

FASTA format is a text-based format for representing either nucleotide sequences or amino acid sequences, in which base pairs or amino acids are represented using single-letter codes. A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. An example sequence in FASTA format is:

>TARA_X000000368_G_scaffold34_1_gene23 strand:- start:199 stop:642 length:444 start_codon:yes stop_codon:yes gene_type:complete
ATGAATACTCTTACTCGAATAAGCCTGACAATTTTTTTACTTTTGATGGCAGCTGTCTAT
TTACCAATTGGTTTATGGGCAATTATTGCTCCAGCTCAGGATGCCCTTGGTTTAGAACTA
CCTTCTTTTTATGAAGCTGTAGGCTTATCTGTAATATCTCCAATTGGGTATTCAGAATTT
GCAGGTATATATGGGGGCATTAATATTGTCATTGGCGTGATGTTCCTAATAGGCGTTTTT
AAAAAACAGGTCGGACTATTTGCTATAAAAGTTCTTGTATTTCTTGTTGGCTCAATAGCT
CTTGGAAGATTCTTGCTAATGTTGCTTGGATCCCAGGCAGGATTACCTGCAGAAATTAAT
GCTTTTCTTATCTTTGAAATAATTGTTTTCTTTATAGGTATTATTTTTATTAAAGTCCTA
AAAAACACTGATCATGTTACTTAG

Installation and usage

Installation

OcéanIA Query Fasta can be installed by running pip install oceania-query-fasta. It requires Python >= 3.6 to run.

Usage

The library can be used as a command line tool or imported as a python library.

As a python package

from oceania import get_sequences_from_fasta

TARA_SAMPLE_ID = "TARA_A100000171"

# REQUEST_PARAMS is a list of tuples that identify subsequences to extract
# each tuple must have the values (sequence_id, start_index, stop_index, sequence_type)
# sequence type accepted values are [raw, complement, reverse_complement], optional value if ommited defaults to "raw".
REQUEST_PARAMS = [
            ("TARA_A100000171_G_scaffold48_1", 10, 50, "complement"),
            ("TARA_A100000171_G_scaffold48_1", 10, 50),
            ("TARA_A100000171_G_scaffold48_1", 10, 50, "reverse_complement"),
            ("TARA_A100000171_G_scaffold181_1", 0, 50),
            ("TARA_A100000171_G_scaffold181_1", 100, 200),
            ("TARA_A100000171_G_scaffold181_1", 200, 230),
            ("TARA_A100000171_G_scaffold493_2", 54, 76),
            ("TARA_A100000171_G_scaffold50396_2", 87, 105),
            ("TARA_A100000171_G_C2001995_1", 20, 635),
            ("TARA_A100000171_G_C2026460_1", 0, 100),
        ]

request_result = get_sequences_from_fasta(
    TARA_SAMPLE_ID,
    REQUEST_PARAMS
)

# get_sequences_from_fasta returns a pandas.DataFrame with the extracted sequences
print(request_result)

Command line

In the command line the query feature is available as oceania query-fasta <key> <query_file> <output_format> <output_file>

> oceania query-fasta -h
Usage: oceania query-fasta [OPTIONS] <key> <query_file> <output_format> <output_file>

  Extract secuences from a fasta file in the OcéanIA Platform.

  <sample_id> sample id in the OcéanIA Platform
  <query_file> CSV file containing the values to query.
               Each line represents a sequence to extract in the format "sequence_id,start,end,type"
               "sequence_id" sequence ID
               "start" start index position of the sequence to be extracted
               "end" end index position of the sequence to extract
               "type" type of the sequence to extract
                      options are ["raw", "complement", "reverse_complement"]
                      type value is optional, if not provided default is "raw"
  <output_format> results format
                  options are ["csv", "fasta"]
  <output_file> name of the file to write the results

Code examples

For more examples visit this repository

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oceania-query-fasta-0.1.7.tar.gz (14.4 kB view hashes)

Uploaded source

Built Distribution

oceania_query_fasta-0.1.7-py3-none-any.whl (14.6 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page