Skip to main content

figleaf_fasta applies hard/soft masking to a FASTA file or excludes/extracts sub-sequences from a FASTA file.

Project description

DOI

figleaf_fasta applies hard/soft masking to a FASTA file or excludes/extracts sub-sequences from a FASTA file.

  • hard_mask: replace sequence with N, X, or ?
  • soft_mask: convert sequence to lowercase
  • exclude: exclude sub-sequences and concatenate non-excluded remainder
  • extract: extract and concatenate sub-sequences

Other tools for handling FASTA files (e.g. bedtools maskfasta, bedtools getfasta, pybedtools) require sequence name(s), corresponding to FASTA header names, to be specified (in addition to range information); sequence name specification allows different masking operations to be applied to different records in a multi-FASTA file.

figleaf_fasta is a simple lightweight tool that takes as input a (multi-)FASTA and range start, end positions; masking/exclusion/extraction will be applied to sequence(s) within the (multi-)FASTA, regardless of FASTA header names. This is useful if a user wants to apply the same masking to all FASTA files or all records of a multi-FASTA. A common use case is when handling reference-aligned (same-length) consensus FASTAs.

Installation

From pypi

pip3 install figleaf_fasta

From GitHub repository

git clone https://github.com/AlexOrlek/figleaf_fasta.git
cd figleaf_fasta
pip3 install .

Options and usage

figleaf_fasta can be run from a Linux command-line as follows:
figleaf [arguments...]

figleaf_fasta can be used within a Python script as follows:
from figleaf_fasta.figleaf import figleaf
figleaf([arguments...])

Running figleaf -h on the command-line produces a summary of the command-line options:

usage: figleaf [-h] -fi FASTA_INPUT -r RANGES_PATH -fo FASTA_OUTPUT [--task TASK] [--hard_mask_letter HARD_MASK_LETTER] [--inverse_mask]

figleaf_fasta: apply hard/soft mask to FASTA file or exclude/extract sub-sequences

optional arguments:
  -h, --help            show this help message and exit

Input:
  -fi FASTA_INPUT, --fasta_input FASTA_INPUT
                        Filepath to input fasta file to be masked (required)
  -r RANGES_PATH, --ranges_path RANGES_PATH
                        Two-column tsv file with rows containing 0-indexed end-exclusive ranges to be masked/excluded/extracted (required)

Output:
  -fo FASTA_OUTPUT, --fasta_output FASTA_OUTPUT
                        Filepath for masked output fasta file (required)

Task:
  --task TASK           "hard_mask","soft_mask","exclude","extract" (default: hard_mask)

Mask:
  --hard_mask_letter HARD_MASK_LETTER
                        Letter to represent hard_mask regions (N, X or ?) (default: N)
  --inverse_mask        If flag is provided, all except mask ranges will be masked

The same arguments are required when using the figleaf function within a Python script, except that start, end positions can be provided either as a filepath (ranges_path), OR as a Python list (ranges_list).

Example

To generate example output in the example/ directory, run:
python figleaf_fasta.py or bash figleaf_fasta.sh

License

MIT License

History

1.1.1

  • Changed constraints on hardmask letters - can now use "?"
  • Fixed bugs when using fasta file with more than one sequence, with --task='exclude' or with --inverse_mask=False

1.1.0

  • First release on PyPI

Changed

  • Packaged code with setup.py and unit testing; uploaded to PyPI

1.0.0

  • First release, working code

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

figleaf_fasta-1.1.1.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

figleaf_fasta-1.1.1-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file figleaf_fasta-1.1.1.tar.gz.

File metadata

  • Download URL: figleaf_fasta-1.1.1.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.6.6rc1

File hashes

Hashes for figleaf_fasta-1.1.1.tar.gz
Algorithm Hash digest
SHA256 ef5791cf213a3e286b3b07d3a8d30a9ebab783db085d50f95c3ff122dc03d78f
MD5 84212859da67d14c5a7b20862a927df3
BLAKE2b-256 37b8532460eca555a97ce8109494c61e8c0ef59acd0c91ed4472f7b92ef90114

See more details on using hashes here.

File details

Details for the file figleaf_fasta-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: figleaf_fasta-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 8.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.6.6rc1

File hashes

Hashes for figleaf_fasta-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5e826c1f2a37b249b2fec96c3fc1e5eaa87d172f9bbcd81d287faca25e82ea04
MD5 f323d04e4d970c45a9a4ba70ada2c37c
BLAKE2b-256 c5365824a130fae2ff650d1669c75157d3a72ebc5cdb6cafcf695951582b9de8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page