Skip to main content

A package to automatically access the inverted repeats of archived plastid genomes

Project description

airpg: Accessing the inverted repeats of archived plastid genomes

Build Status PyPI status PyPI pyversions PyPI version shields.io PyPI license

A Python package for automatically accessing the inverted repeats of thousands of plastid genomes stored on NCBI Nucleotide

INSTALLATION

To get the most recent stable version of airpg, run:

pip install airpg

Or, alternatively, if you want to get the latest development version of airpg, run:

pip install git+https://github.com/michaelgruenstaeudl/airpg.git

EXAMPLE USAGE

Short survey for the impatient / for testing (runtime ca. 4 hours)

Survey of all plastid genomes of flowering plants submitted to NCBI Nucleotide in 2019 only. Note: The results of this survey are available on Zenodo via DOI 10.5281/zenodo.4335906

airpg_update_blocklist.py -f airpg_blocklist.txt \
-m john.smith@example.com -q "inverted[TITLE] AND \
repeat[TITLE] AND loss[TITLE]"

airpg_identify.py -q "complete genome[TITLE] AND \
(chloroplast[TITLE] OR plastid[TITLE]) AND \
2019/01/01:2019/12/31[PDAT] AND 50000:250000[SLEN] \
NOT unverified[TITLE] NOT partial[TITLE] AND \
(Embryophyta[ORGN] AND Magnoliophyta[ORGN])" \
-b airpg_blocklist.txt -o output_script1.tsv

airpg_analyze.py -i output_script1.tsv \
-m john.smith@example.com -o output_script2.tsv

Full survey with explanations (runtime ca. 18 hours)

Survey of all plastid genomes of flowering plants submitted to NCBI Nucleotide from start of 2000 until end of October 2020. Note: The results of this survey are available on Zenodo via DOI 10.5281/zenodo.4335906

STEP 1: Querying NCBI Nucleotide for complete plastid genomes given an Entrez search string

TESTFOLDER=./angiosperms_Start2000toEndOct2020
DATE=$(date '+%Y_%m_%d')
ENTREZSTRING='complete genome[TITLE] AND (chloroplast[TITLE] OR plastid[TITLE]) AND 2000/01/01:2020/10/31[PDAT] AND 50000:250000[SLEN] NOT unverified[TITLE] NOT partial[TITLE] AND (Embryophyta[ORGN] AND Magnoliophyta[ORGN])' # complete plastid genomes of all flowering plants between start of 2000 and end of October 2020
RECORDSTABLE=plastome_availability_table_${DATE}.tsv
mkdir -p $TESTFOLDER
# Updating blocklist
if [ ! -f ./airpg_blocklist.txt ]; then
    touch ./airpg_blocklist.txt
    airpg_update_blocklist.py -f ./airpg_blocklist.txt
fi
airpg_update_blocklist.py -f ./airpg_blocklist.txt -m john.smith@example.com -q "inverted[TITLE] AND repeat[TITLE] AND loss[TITLE]"
airpg_identify.py -q "$ENTREZSTRING" -o $TESTFOLDER/$RECORDSTABLE \
    --blocklist ./airpg_blocklist.txt 1>>$TESTFOLDER/airpg_identify_${DATE}.runlog 2>&1

STEP 2: Retrieving and parsing the genome records identified in step 1, analyzing the position and length of their IR annotations

IRSTATSTABLE=reported_IR_stats_table_${DATE}.tsv
mkdir -p $TESTFOLDER/records_${DATE}
mkdir -p $TESTFOLDER/data_${DATE}
airpg_analyze.py -i $TESTFOLDER/$RECORDSTABLE \
    -r $TESTFOLDER/records_${DATE}/ -d $TESTFOLDER/data_${DATE}/ \
    -m john.smith@example.com -o $TESTFOLDER/$IRSTATSTABLE 1>>$TESTFOLDER/airpg_analyze_${DATE}.runlog 2>&1

CHANGELOG

See CHANGELOG.md for a list of recent changes to the software.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airpg-1.0.1.tar.gz (22.3 kB view details)

Uploaded Source

Built Distribution

airpg-1.0.1-py3-none-any.whl (46.0 kB view details)

Uploaded Python 3

File details

Details for the file airpg-1.0.1.tar.gz.

File metadata

  • Download URL: airpg-1.0.1.tar.gz
  • Upload date:
  • Size: 22.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/51.1.0.post20201221 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for airpg-1.0.1.tar.gz
Algorithm Hash digest
SHA256 ed89c8e37ebcc4d9687098a2a08ae5d0299c1f7136f13088b616ed4c343d43dc
MD5 2e2cacab25b3174145cc4bcd2829583b
BLAKE2b-256 7904771dd75b057cc84d33a5e905a9269f6447c0b98010728ed0cdd053b2335f

See more details on using hashes here.

File details

Details for the file airpg-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: airpg-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 46.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/51.1.0.post20201221 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for airpg-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f0a5537fe501fb04e9396c31365dc10d9c18b5d7b8b5d561ea9dc265c5446cc4
MD5 c90193fada1c240ebd4bff4883adadcd
BLAKE2b-256 d00ef2f3c55f055c3be702b908cba3e9e856e6322f9234b138d3679d9fe51b75

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page