Skip to main content

figleaf_fasta applies hard/soft masking to a FASTA file or excludes/extracts sub-sequences from a FASTA file.

Project description

DOI

figleaf_fasta applies hard/soft masking to a FASTA file or excludes/extracts sub-sequences from a FASTA file.

  • hard_mask: replace sequence with N, X, or ?
  • soft_mask: convert sequence to lowercase
  • exclude: exclude sub-sequences and concatenate non-excluded remainder
  • extract: extract and concatenate sub-sequences

Other tools for handling FASTA files (e.g. bedtools maskfasta, bedtools getfasta, pybedtools) require sequence name(s), corresponding to FASTA header names, to be specified (in addition to range information); sequence name specification allows different masking operations to be applied to different records in a multi-FASTA file.

figleaf_fasta is a simple lightweight tool that takes as input a (multi-)FASTA and range start, end positions; masking/exclusion/extraction will be applied to sequence(s) within the (multi-)FASTA, regardless of FASTA header names. This is useful if a user wants to apply the same masking to all FASTA files or all records of a multi-FASTA. A common use case is when handling reference-aligned (same-length) consensus FASTAs.

Installation

From pypi

pip3 install figleaf_fasta

From GitHub repository

git clone https://github.com/AlexOrlek/figleaf_fasta.git
cd figleaf_fasta
pip3 install .

Options and usage

figleaf_fasta can be run from a Linux command-line as follows:
figleaf [arguments...]

figleaf_fasta can be used within a Python script as follows:
from figleaf_fasta.figleaf import figleaf
figleaf([arguments...])

Running figleaf -h on the command-line produces a summary of the command-line options:

usage: figleaf [-h] -fi FASTA_INPUT -r RANGES_PATH -fo FASTA_OUTPUT [--task TASK] [--hard_mask_letter HARD_MASK_LETTER] [--inverse_mask]

figleaf_fasta: apply hard/soft mask to FASTA file or exclude/extract sub-sequences

optional arguments:
  -h, --help            show this help message and exit

Input:
  -fi FASTA_INPUT, --fasta_input FASTA_INPUT
                        Filepath to input fasta file to be masked (required)
  -r RANGES_PATH, --ranges_path RANGES_PATH
                        Two-column tsv file with rows containing 0-indexed end-exclusive ranges to be masked/excluded/extracted (required)

Output:
  -fo FASTA_OUTPUT, --fasta_output FASTA_OUTPUT
                        Filepath for masked output fasta file (required)

Task:
  --task TASK           "hard_mask","soft_mask","exclude","extract" (default: hard_mask)

Mask:
  --hard_mask_letter HARD_MASK_LETTER
                        Letter to represent hard_mask regions (N, X or ?) (default: N)
  --inverse_mask        If flag is provided, all except mask ranges will be masked

The same arguments are required when using the figleaf function within a Python script, except that start, end positions can be provided either as a filepath (ranges_path), OR as a Python list (ranges_list).

Example

To generate example output in the example/ directory, run:
python figleaf_fasta.py or bash figleaf_fasta.sh

License

MIT License

History

1.1.1

  • Changed constraints on hardmask letters - can now use "?"
  • Fixed bugs when using fasta file with more than one sequence, with --task='exclude' or with --inverse_mask=False

1.1.0

  • First release on PyPI

Changed

  • Packaged code with setup.py and unit testing; uploaded to PyPI

1.0.0

  • First release, working code

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

figleaf_fasta-1.1.1.tar.gz (7.8 kB view hashes)

Uploaded Source

Built Distribution

figleaf_fasta-1.1.1-py3-none-any.whl (8.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page