Skip to main content

screens for presence of genes of interest (GOI) in bacterial assemblies

Project description

Screen assemblies

Pipeline that screens for presence of genes of interest (GOI) in bacterial assemblies. Generates multiple CSVs and plots that describe which genes are present and how variable their sequence is. Can use DNA or protein query sequences (GOIs) and DNA contigs/fastas or protein fastas as database (db) to search in.

Getting Started

You need one fasta file with all GOIs as the query and a folder with db contigs/fastas. Db files can only have one '.' in the name (i.e., sample_1.fa NOT sample.1.fa)



Python 3 and scypi/Biopython Command line blast


Clustal Omega, RAXML and or IQtree


  • Download the script and place it in your PATH:
    • git clone
    • Make sure its executable (chmod +x screen_assembly/
    • Export PATH="your_path:$PATH" (the command pwd will give you your PATH)
    • Best to permanently add it to you path by adding it to .bash_profile (mac) or .profile (unix)
  • Download
    • git clone
    • Make sure its executable (chmod +x common_modules/
    • Export PYTHONPATH="your_path:$PYTHONPATH"
    • Best to permanently add it to you path by adding it to .bash_profile (mac) or .profile (unix)
  • Place the common_modules folder next to the screen_assembly folder (as thats where it looks by default). OR use a text editor to set this line in to point at the dir you put in: sys.path.append('../common_modules') becomes sys.path.append('your_path/common_modules')

Check for updates

  • git pull

Running the tests

Once is in your PATH type -h . If you have all dependencies then the help menu will display. Otherwise read the erorr and install whichever dependency is missing.

Running the program

Please see the WIKI



This project is licensed under the MIT License - see the LICENSE file for details


  • Mark Davies lab and Jake for testing

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

read_overlap-1.0.0.tar.gz (18.0 kB view hashes)

Uploaded source

Built Distribution

read_overlap-1.0.0-py3-none-any.whl (22.8 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page