Skip to main content

A small package to query data from the OceanIA services

Project description

OcéanIA Query Fasta

OcéanIA Query Fasta lets you query large FASTA files available in the OcéanIA Platform for extracting parts of biologic sequences

What is FASTA format

FASTA format is a text-based format for representing either nucleotide sequences or amino acid sequences, in which base pairs or amino acids are represented using single-letter codes. A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. An example sequence in FASTA format is:

>TARA_X000000368_G_scaffold34_1_gene23 strand:- start:199 stop:642 length:444 start_codon:yes stop_codon:yes gene_type:complete
ATGAATACTCTTACTCGAATAAGCCTGACAATTTTTTTACTTTTGATGGCAGCTGTCTAT
TTACCAATTGGTTTATGGGCAATTATTGCTCCAGCTCAGGATGCCCTTGGTTTAGAACTA
CCTTCTTTTTATGAAGCTGTAGGCTTATCTGTAATATCTCCAATTGGGTATTCAGAATTT
GCAGGTATATATGGGGGCATTAATATTGTCATTGGCGTGATGTTCCTAATAGGCGTTTTT
AAAAAACAGGTCGGACTATTTGCTATAAAAGTTCTTGTATTTCTTGTTGGCTCAATAGCT
CTTGGAAGATTCTTGCTAATGTTGCTTGGATCCCAGGCAGGATTACCTGCAGAAATTAAT
GCTTTTCTTATCTTTGAAATAATTGTTTTCTTTATAGGTATTATTTTTATTAAAGTCCTA
AAAAACACTGATCATGTTACTTAG

Installation and usage

Installation

OcéanIA Query Fasta can be installed by running pip install oceania-query-fasta. It requires Python >= 3.6 to run.

Usage

The library can be used as a command line tool or imported as a python library.

As a python package

from oceania import get_sequences_from_fasta

TARA_SAMPLE_ID = "TARA_A100000171"

# REQUEST_PARAMS is a list of tuples that identify subsequences to extract
# each tuple must have the values (sequence_id, start_index, stop_index, sequence_type)
# sequence type accepted values are [raw, complement, reverse_complement], optional value if ommited defaults to "raw".
REQUEST_PARAMS = [
            ("TARA_A100000171_G_scaffold48_1", 10, 50, "complement"),
            ("TARA_A100000171_G_scaffold48_1", 10, 50),
            ("TARA_A100000171_G_scaffold48_1", 10, 50, "reverse_complement"),
            ("TARA_A100000171_G_scaffold181_1", 0, 50),
            ("TARA_A100000171_G_scaffold181_1", 100, 200),
            ("TARA_A100000171_G_scaffold181_1", 200, 230),
            ("TARA_A100000171_G_scaffold493_2", 54, 76),
            ("TARA_A100000171_G_scaffold50396_2", 87, 105),
            ("TARA_A100000171_G_C2001995_1", 20, 635),
            ("TARA_A100000171_G_C2026460_1", 0, 100),
        ]

request_result = get_sequences_from_fasta(
    TARA_SAMPLE_ID,
    REQUEST_PARAMS
)

# get_sequences_from_fasta returns a pandas.DataFrame with the extracted sequences
print(request_result)

Command line

In the command line the query feature is available as oceania query-fasta <key> <query_file> <output_format> <output_file>

> oceania query-fasta -h
Usage: oceania query-fasta [OPTIONS] <key> <query_file> <output_format> <output_file>

  Extract secuences from a fasta file in the OcéanIA Platform.

  <sample_id> sample id in the OcéanIA Platform
  <query_file> CSV file containing the values to query.
               Each line represents a sequence to extract in the format "sequence_id,start,end,type"
               "sequence_id" sequence ID
               "start" start index position of the sequence to be extracted
               "end" end index position of the sequence to extract
               "type" type of the sequence to extract
                      options are ["raw", "complement", "reverse_complement"]
                      type value is optional, if not provided default is "raw"
  <output_format> results format
                  options are ["csv", "fasta"]
  <output_file> name of the file to write the results

Code examples

For more examples visit this repository

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oceania-query-fasta-0.1.7.tar.gz (14.4 kB view details)

Uploaded Source

Built Distribution

oceania_query_fasta-0.1.7-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file oceania-query-fasta-0.1.7.tar.gz.

File metadata

  • Download URL: oceania-query-fasta-0.1.7.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.5

File hashes

Hashes for oceania-query-fasta-0.1.7.tar.gz
Algorithm Hash digest
SHA256 e767be9384c3a9de95f8ca04cf9b41f257e976729bdd0843bef25b992e5bc959
MD5 4624d6e2d17da21ff1b11ad142ab82b5
BLAKE2b-256 40324e0568acf349fd2393c77d353d084811c6422e546af4f12a2ba3215977c0

See more details on using hashes here.

File details

Details for the file oceania_query_fasta-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: oceania_query_fasta-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 14.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.5

File hashes

Hashes for oceania_query_fasta-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 e5bd4b19917f5ec3072e0bed6edddaa6ce0c9194f59dbe18711875d344bb6df8
MD5 870aa1831da5313f01bbee891ca52017
BLAKE2b-256 76d1c26899824fde3a696c512c8766ef21336414ed4e23eeeb107274740f5e22

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page