python Guide aligned Sequences
Project description
pyGaS
python Guide aligned Sequences
Docker and Singularity
There are pre-built images containing this codebase on quay.io. When pulling an image you must specify
the version there is no latest
.
The docker images are known to work correctly after import into a singularity image.
Command example
The code is intended to be used as an API, not through this command line, however limited use is possible.
pygas run -t examples/targets.txt.gz -q examples/queries.txt.gz -o your_result.tsv
Inputs
queries.txt
- A unique list of sequences (for performance reasons), one per line
- This could be reworked to handle internally, however memory is a consideration
- Matching sequences back to real input data and related information would be the responsibility of wrapping code
- A unique list of sequences (for performance reasons), one per line
targets.txt
- One target sequence per line
- Reverse compliment is handled automatically, see output format.
- Targets need to be unique during mapping, expand out for things like dual guide permutations in your application
Output format
Very simple text output of values that are available in API:
#query reversed t_id t_pos cigar seq md repeat_2-7...
AAAAATCGCTGCTACAGGT False 48566 1 AAAAATCGCTGCTACAGGT M19 19
CTGGTCTCGCACCCCAGGC False 65601 1 CTGGTCTCGCACCCCAGGC M19 18T
GGCGCGGTACTTGCCCAGA False 34773 1 GGCGCGGTACTTGCCCAGA S1M18 18
AAAAAAAAAAAAAAAAAAA False 0 1 AAAAAAAAAAAAAAAAAAA M19 19 True 1 1 TTTTTTTTTTTTTTTTTTT M19 19
...
Where:
Column | Description | Interpretation |
---|---|---|
query |
Original query sequence | |
reversed |
Read was reversed to match the target | following fields are based on this orientation |
t_id |
ID of target mapped to | 0-based numbering in order targets passed |
t_pos |
Start position within target sequence | 1-based |
seq |
Query in mapped orientation | Corresponds to cigar and md orientation |
cigar |
cigar string for use in SAM like files |
For details see the SAM specification |
md |
MD string for use in SAM like files |
For details see the SAM optional field specification |
Development
Install
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python3 setup.py develop
# see later
pre-commit install
# remember to update requirements
pip freeze | grep -v virtualenv > requirements.txt
Testing
There are 4 layers to testing and standards:
- Local
venv
testing - Local
pre-commit
hooks - Tests embedded in
docker build
CI
tests
Local venv
testing
/tests/scripts/run_unit_tests.sh
Local pre-commit
hooks
This project additionally uses git pre-commit hooks via the pre-commit tool. These are concerned
with file formats and standards, not the actual execution of code. See ./.pre-commit-config.yaml
.
Docker testing
The Docker build includes the unit tests, but removes many of the libraries before the final build stage. Mainly for CI tests.
CI tests
CI includes 2 additional tests, each based on the 2 datasets in the ./examples
directory.
Updating licence headers
Please use skywalking-eyes.
Expected workflow:
- Check state before modifying
.licenserc.yaml
:docker run -it --rm -v $(pwd):/github/workspace apache/skywalking-eyes header check
- You should get some 'valid' here, those without a header as 'invalid'
- Modify
.licenserc.yaml
- Apply the changes:
docker run -it --rm -v $(pwd):/github/workspace apache/skywalking-eyes header fix
- Add/commit changes
This is executed in the CI pipeline.
DO NOT edit the header in the files, please modify the date component of content
in .licenserc.yaml
. The exceptions being:
README.md
pygas/matrix.pyc
- You will need to manually update, but the checks will accept it once updated
If you need to make more extensive changes to the license carefully test the pattern is functional.
LICENSE
Copyright (c) 2021
Author: CASM/Cancer IT <cgphelp@sanger.ac.uk>
This file is part of pygas.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
published by the Free Software Foundation, either version 3 of the
License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
1. The usage of a range of years within a copyright statement contained within
this distribution should be interpreted as being equivalent to a list of years
including the first and last year specified and all consecutive years between
them. For example, a copyright statement that reads ‘Copyright (c) 2005, 2007-
2009, 2011-2012’ should be interpreted as being identical to a statement that
reads ‘Copyright (c) 2005, 2007, 2008, 2009, 2011, 2012’ and a copyright
statement that reads ‘Copyright (c) 2005-2012’ should be interpreted as being
identical to a statement that reads ‘Copyright (c) 2005, 2006, 2007, 2008,
2009, 2010, 2011, 2012’.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pygas-1.0.4.tar.gz
.
File metadata
- Download URL: pygas-1.0.4.tar.gz
- Upload date:
- Size: 75.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe8373d2c3cdb740277713d5f6a7814470222df9597399c36c60495f01d44ac9 |
|
MD5 | 46c27ed0ffc4b7fe0bf9748cb5b40872 |
|
BLAKE2b-256 | ba4d005ea69068bc90d23a96125d5995d6e078ade21f5def1448aa36960c507a |