figleaf_fasta applies hard/soft masking to a FASTA file or excludes/extracts sub-sequences from a FASTA file.
Project description
figleaf_fasta applies hard/soft masking to a FASTA file or excludes/extracts sub-sequences from a FASTA file.
- hard_mask: replace sequence with N, X, or ?
- soft_mask: convert sequence to lowercase
- exclude: exclude sub-sequences and concatenate non-excluded remainder
- extract: extract and concatenate sub-sequences
Other tools for handling FASTA files (e.g. bedtools maskfasta
, bedtools getfasta
, pybedtools
) require sequence name(s), corresponding to FASTA header names, to be specified (in addition to range information); sequence name specification allows different masking operations to be applied to different records in a multi-FASTA file.
figleaf_fasta is a simple lightweight tool that takes as input a (multi-)FASTA and range start, end positions; masking/exclusion/extraction will be applied to sequence(s) within the (multi-)FASTA, regardless of FASTA header names. This is useful if a user wants to apply the same masking to all FASTA files or all records of a multi-FASTA. A common use case is when handling reference-aligned (same-length) consensus FASTAs.
Installation
From pypi
pip3 install figleaf_fasta
From GitHub repository
git clone https://github.com/AlexOrlek/figleaf_fasta.git
cd figleaf_fasta
pip3 install .
Options and usage
figleaf_fasta can be run from a Linux command-line as follows:
figleaf [
arguments...
]
figleaf_fasta can be used within a Python script as follows:
from figleaf_fasta.figleaf import figleaf
figleaf([
arguments...
])
Running figleaf -h
on the command-line produces a summary of the command-line options:
usage: figleaf [-h] -fi FASTA_INPUT -r RANGES_PATH -fo FASTA_OUTPUT [--task TASK] [--hard_mask_letter HARD_MASK_LETTER] [--inverse_mask]
figleaf_fasta: apply hard/soft mask to FASTA file or exclude/extract sub-sequences
optional arguments:
-h, --help show this help message and exit
Input:
-fi FASTA_INPUT, --fasta_input FASTA_INPUT
Filepath to input fasta file to be masked (required)
-r RANGES_PATH, --ranges_path RANGES_PATH
Two-column tsv file with rows containing 0-indexed end-exclusive ranges to be masked/excluded/extracted (required)
Output:
-fo FASTA_OUTPUT, --fasta_output FASTA_OUTPUT
Filepath for masked output fasta file (required)
Task:
--task TASK "hard_mask","soft_mask","exclude","extract" (default: hard_mask)
Mask:
--hard_mask_letter HARD_MASK_LETTER
Letter to represent hard_mask regions (N, X or ?) (default: N)
--inverse_mask If flag is provided, all except mask ranges will be masked
The same arguments are required when using the figleaf function within a Python script, except that start, end positions can be provided either as a filepath (ranges_path
), OR as a Python list (ranges_list
).
Example
To generate example output in the example/ directory, run:
python figleaf_fasta.py
or bash figleaf_fasta.sh
License
History
1.1.1
- Changed constraints on hardmask letters - can now use "?"
- Fixed bugs when using fasta file with more than one sequence, with --task='exclude' or with --inverse_mask=False
1.1.0
- First release on PyPI
Changed
- Packaged code with
setup.py
and unit testing; uploaded to PyPI
1.0.0
- First release, working code
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file figleaf_fasta-1.1.1.tar.gz
.
File metadata
- Download URL: figleaf_fasta-1.1.1.tar.gz
- Upload date:
- Size: 7.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.6.6rc1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef5791cf213a3e286b3b07d3a8d30a9ebab783db085d50f95c3ff122dc03d78f |
|
MD5 | 84212859da67d14c5a7b20862a927df3 |
|
BLAKE2b-256 | 37b8532460eca555a97ce8109494c61e8c0ef59acd0c91ed4472f7b92ef90114 |
File details
Details for the file figleaf_fasta-1.1.1-py3-none-any.whl
.
File metadata
- Download URL: figleaf_fasta-1.1.1-py3-none-any.whl
- Upload date:
- Size: 8.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.6.6rc1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5e826c1f2a37b249b2fec96c3fc1e5eaa87d172f9bbcd81d287faca25e82ea04 |
|
MD5 | f323d04e4d970c45a9a4ba70ada2c37c |
|
BLAKE2b-256 | c5365824a130fae2ff650d1669c75157d3a72ebc5cdb6cafcf695951582b9de8 |