Skip to main content

python Guide aligned Sequences

Project description

pyGaS

python Guide aligned Sequences

cancerit

Docker and Singularity

There are pre-built images containing this codebase on quay.io. When pulling an image you must specify the version there is no latest.

The docker images are known to work correctly after import into a singularity image.

Command example

The code is intended to be used as an API, not through this command line, however limited use is possible.

pygas run -t examples/targets.txt.gz -q examples/queries.txt.gz -o your_result.tsv

Inputs

  • queries.txt
    • A unique list of sequences (for performance reasons), one per line
      • This could be reworked to handle internally, however memory is a consideration
    • Matching sequences back to real input data and related information would be the responsibility of wrapping code
  • targets.txt
    • One target sequence per line
    • Reverse compliment is handled automatically, see output format.
    • Targets need to be unique during mapping, expand out for things like dual guide permutations in your application

Output format

Very simple text output of values that are available in API:

#query	reversed	t_id	t_pos	cigar	seq	md	repeat_2-7...
AAAAATCGCTGCTACAGGT	False	48566	1	AAAAATCGCTGCTACAGGT	M19	19
CTGGTCTCGCACCCCAGGC	False	65601	1	CTGGTCTCGCACCCCAGGC	M19	18T
GGCGCGGTACTTGCCCAGA	False	34773	1	GGCGCGGTACTTGCCCAGA	S1M18	18
AAAAAAAAAAAAAAAAAAA	False	0	1	AAAAAAAAAAAAAAAAAAA	M19	19	True	1	1	TTTTTTTTTTTTTTTTTTT	M19	19
...

Where:

Column Description Interpretation
query Original query sequence
reversed Read was reversed to match the target following fields are based on this orientation
t_id ID of target mapped to 0-based numbering in order targets passed
t_pos Start position within target sequence 1-based
seq Query in mapped orientation Corresponds to cigar and md orientation
cigar cigar string for use in SAM like files For details see the SAM specification
md MD string for use in SAM like files For details see the SAM optional field specification

Development

Install

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python3 setup.py develop

# see later
pre-commit install

# remember to update requirements
pip freeze | grep -v virtualenv > requirements.txt

Testing

There are 4 layers to testing and standards:

  1. Local venv testing
  2. Local pre-commit hooks
  3. Tests embedded in docker build
  4. CI tests

Local venv testing

/tests/scripts/run_unit_tests.sh

Local pre-commit hooks

This project additionally uses git pre-commit hooks via the pre-commit tool. These are concerned with file formats and standards, not the actual execution of code. See ./.pre-commit-config.yaml.

Docker testing

The Docker build includes the unit tests, but removes many of the libraries before the final build stage. Mainly for CI tests.

CI tests

CI includes 2 additional tests, each based on the 2 datasets in the ./examples directory.

Updating licence headers

Please use skywalking-eyes.

Expected workflow:

  1. Check state before modifying .licenserc.yaml:
    • docker run -it --rm -v $(pwd):/github/workspace apache/skywalking-eyes header check
    • You should get some 'valid' here, those without a header as 'invalid'
  2. Modify .licenserc.yaml
  3. Apply the changes:
    • docker run -it --rm -v $(pwd):/github/workspace apache/skywalking-eyes header fix
  4. Add/commit changes

This is executed in the CI pipeline.

DO NOT edit the header in the files, please modify the date component of content in .licenserc.yaml. The exceptions being:

  • README.md
  • pygas/matrix.pyc
    • You will need to manually update, but the checks will accept it once updated

If you need to make more extensive changes to the license carefully test the pattern is functional.

LICENSE

Copyright (c) 2021

Author: CASM/Cancer IT <cgphelp@sanger.ac.uk>

This file is part of pygas.

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
published by the Free Software Foundation, either version 3 of the
License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.

1. The usage of a range of years within a copyright statement contained within
this distribution should be interpreted as being equivalent to a list of years
including the first and last year specified and all consecutive years between
them. For example, a copyright statement that reads ‘Copyright (c) 2005, 2007-
2009, 2011-2012’ should be interpreted as being identical to a statement that
reads ‘Copyright (c) 2005, 2007, 2008, 2009, 2011, 2012’ and a copyright
statement that reads ‘Copyright (c) 2005-2012’ should be interpreted as being
identical to a statement that reads ‘Copyright (c) 2005, 2006, 2007, 2008,
2009, 2010, 2011, 2012’.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygas-1.0.4.tar.gz (75.8 kB view details)

Uploaded Source

File details

Details for the file pygas-1.0.4.tar.gz.

File metadata

  • Download URL: pygas-1.0.4.tar.gz
  • Upload date:
  • Size: 75.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.2

File hashes

Hashes for pygas-1.0.4.tar.gz
Algorithm Hash digest
SHA256 fe8373d2c3cdb740277713d5f6a7814470222df9597399c36c60495f01d44ac9
MD5 46c27ed0ffc4b7fe0bf9748cb5b40872
BLAKE2b-256 ba4d005ea69068bc90d23a96125d5995d6e078ade21f5def1448aa36960c507a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page