Skip to main content

A package to automatically access the inverted repeats of archived plastid genomes

Project description

airpg: Automatically accessing the inverted repeats of archived plastid genomes

Build Status PyPI status PyPI pyversions PyPI version shields.io PyPI license

A Python package for automatically accessing the inverted repeats of thousands of plastid genomes stored on NCBI Nucleotide

INSTALLATION

To get the most recent stable version of airpg, run:

pip install airpg

Or, alternatively, if you want to get the latest development version of airpg, run:

pip install git+https://github.com/michaelgruenstaeudl/airpg.git

EXAMPLE USAGE


EXAMPLE 1: Very short survey (runtime ca. 5 min.; for the impatient)

Survey of all plastid genomes of flowering plants submitted to NCBI Nucleotide within the past 10 days.

TODAY=$(date +%d)
if (($TODAY >= 6 && $TODAY <= 10)); then
    STARTDATE=$(date +%Y/%m/01)
elif (($TODAY >= 11 && $TODAY <= 15)); then
    STARTDATE=$(date +%Y/%m/05)
elif (($TODAY >= 16 && $TODAY <= 20)); then
    STARTDATE=$(date +%Y/%m/10)
elif (($TODAY >= 21 && $TODAY <= 25)); then
    STARTDATE=$(date +%Y/%m/15)
else
    PREVMONTH=$(printf "%02d" $(($(date +%m)-1)))
    STARTDATE=$(date +%Y/$PREVMONTH/20)
fi
ENDDATE=$(date +%Y/%m/%d)

airpg_identify.py \
-q "complete genome[TITLE] AND \
(chloroplast[TITLE] OR plastid[TITLE]) AND \
$STARTDATE:$ENDDATE[PDAT] AND \
50000:250000[SLEN] NOT unverified[TITLE] \
NOT partial[TITLE] AND Magnoliophyta[ORGN]" \
-o output_script1.tsv \
#&> output_script1.log

mkdir -p records
mkdir -p data

airpg_analyze.py \
-i output_script1.tsv \
-m john.smith@example.com \
-o output_script2.tsv \
--recordsdir records/ \
--datadir data/ \
#&> output_script2.log

EXAMPLE 2: Short survey (runtime ca. 15 min.; for testing)

Survey of all plastid genomes of flowering plants submitted to NCBI Nucleotide within the current month.

airpg_identify.py -q "complete genome[TITLE] AND \
(chloroplast[TITLE] OR plastid[TITLE]) AND \
$(date +%Y/%m/01):$(date +%Y/%m/%d)[PDAT] AND \
50000:250000[SLEN] NOT unverified[TITLE] \
NOT partial[TITLE] AND Magnoliophyta[ORGN]" \
-o output_script1.tsv # &> output_script1.log

airpg_analyze.py -i output_script1.tsv \
-m john.smith@example.com -o output_script2.tsv \
# &> output_script2.log

EXAMPLE 3: Medium survey (runtime ca. 5 hours)

Survey of all plastid genomes of flowering plants submitted to NCBI Nucleotide in 2019 only. Note: The results of this survey are available on Zenodo via DOI 10.5281/zenodo.4335906

airpg_update_blocklist.py -f airpg_blocklist.txt \
-m john.smith@example.com -q "inverted[TITLE] AND \
repeat[TITLE] AND loss[TITLE]"

airpg_identify.py -q "complete genome[TITLE] AND \
(chloroplast[TITLE] OR plastid[TITLE]) AND \
2019/01/01:2019/12/31[PDAT] AND 50000:250000[SLEN] \
NOT unverified[TITLE] NOT partial[TITLE] AND \
Magnoliophyta[ORGN]" \
-b airpg_blocklist.txt -o output_script1.tsv

airpg_analyze.py -i output_script1.tsv \
-m john.smith@example.com -o output_script2.tsv

EXAMPLE 4: Full survey (runtime ca. 19 hours; with explanations)

Survey of all plastid genomes of flowering plants submitted to NCBI Nucleotide from start of 2000 until end of October 2020. Note: The results of this survey are available on Zenodo via DOI 10.5281/zenodo.4335906

STEP 1: Querying NCBI Nucleotide for complete plastid genomes given an Entrez search string
TESTFOLDER=./angiosperms_Start2000toEndOct2020
DATE=$(date '+%Y_%m_%d')
ENTREZSTRING='complete genome[TITLE] AND (chloroplast[TITLE] OR plastid[TITLE]) AND 2000/01/01:2020/10/31[PDAT] AND 50000:250000[SLEN] NOT unverified[TITLE] NOT partial[TITLE] AND Magnoliophyta[ORGN]' # complete plastid genomes of all flowering plants between start of 2000 and end of October 2020
RECORDSTABLE=plastome_availability_table_${DATE}.tsv
mkdir -p $TESTFOLDER

# Updating blocklist
if [ ! -f ./airpg_blocklist.txt ]; then
    touch ./airpg_blocklist.txt
    airpg_update_blocklist.py -f ./airpg_blocklist.txt
fi
airpg_update_blocklist.py -f ./airpg_blocklist.txt -m john.smith@example.com -q "inverted[TITLE] AND repeat[TITLE] AND loss[TITLE]"

airpg_identify.py -q "$ENTREZSTRING" -o $TESTFOLDER/$RECORDSTABLE \
    --blocklist ./airpg_blocklist.txt 1>>$TESTFOLDER/airpg_identify_${DATE}.runlog 2>&1
STEP 2: Retrieving and parsing the genome records identified in step 1, analyzing the position and length of their IR annotations
IRSTATSTABLE=reported_IR_stats_table_${DATE}.tsv
mkdir -p $TESTFOLDER/records_${DATE}
mkdir -p $TESTFOLDER/data_${DATE}

airpg_analyze.py -i $TESTFOLDER/$RECORDSTABLE \
    -r $TESTFOLDER/records_${DATE}/ -d $TESTFOLDER/data_${DATE}/ \
    -m john.smith@example.com -o $TESTFOLDER/$IRSTATSTABLE 1>>$TESTFOLDER/airpg_analyze_${DATE}.runlog 2>&1

CHANGELOG

See CHANGELOG.md for a list of recent changes to the software.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airpg-1.0.6.tar.gz (23.2 kB view details)

Uploaded Source

Built Distribution

airpg-1.0.6-py3-none-any.whl (46.4 kB view details)

Uploaded Python 3

File details

Details for the file airpg-1.0.6.tar.gz.

File metadata

  • Download URL: airpg-1.0.6.tar.gz
  • Upload date:
  • Size: 23.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.25.0 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.18

File hashes

Hashes for airpg-1.0.6.tar.gz
Algorithm Hash digest
SHA256 1af1a3a48d961b1ac9767cf6cd6258dda49c21d07a4de0a1aac5e9825e99b805
MD5 06eebedb2c8d521ea2f7eaa4f52dfa14
BLAKE2b-256 6eef2b55a678bbaaaa92ff2b88ef53372074d308e1c07ab3a05baefdebf7a3ab

See more details on using hashes here.

File details

Details for the file airpg-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: airpg-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 46.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.25.0 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.18

File hashes

Hashes for airpg-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 7f90312aef41d6b8e6ecd12db47313c881b39c92b27ffb249b255caa44fc626b
MD5 d708b4fd21e2a4a9310f41ef68ccbfd9
BLAKE2b-256 5fa2bf22055fc0a1294c3047f4e0be26f853a63d7962bec167f8f561c2c770ca

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page