Skip to main content

Compute ARSC (N/C/S) from protein fasta files

Project description

quickARSC: ARSC-based stoichiometry utility

PyPI version DOI


quickARSC is a lightweight command-line tool and a web interface for quantifying elemental stoichiometry from protein FASTA files. It calculates the number of nitrogen (N), carbon (C), and sulfur (S) atoms per amino acid residue side chain (ARSC) across proteins or proteomes.

These metrics follow the definitions used in Mende et al., Nature Microbiology, (2017).

For more details, please visit our wiki.


Web Interface

The static web interface is available at https://stsnsn.github.io/quickARSC/

Features:

  • Pre-computed Results: Browse and download ARSC metrics for all 143,614 GTDB r226.0 representatives.
  • Interactive Filtering: Filter results by taxonomy information.
  • Custom Analysis: Upload your own amino acid FASTA file (.fa, .faa, .fasta) to compute ARSC metrics on-the-fly.

Standalone Package

The standalone package is available at https://pypi.org/project/arsc/

Features

  • Elemental stoichiometry calculation: Calculate N, C, and S-ARSC directly from protein FASTA files or directories.
  • Multiprocessing: Fast and scalable analysis of large genome or proteome datasets.
  • Simple CLI tool: Run with a single command; easy to integrate into UNIX pipelines.

Installation From PyPI

pip install arsc

Usage

quickARSC <FASTA_FILE(faa/faa.gz) or input_dir/>

For consistency with the package name, the command quickARSC is now available. We also maintain arsc as a shorter command for faster typing.

While we recommend providing protein FASTA files as input or explicitly using the --nucleotide flag, quickARSC automatically detects nucleotide sequences and uses predicted amino acid sequences (requiring Prodigal in your PATH) by default; this feature can be disabled with the --no-auto-detection flag.

  • -h or --help : show help message

  • -v or --version : show version

  • -o or --output : output TSV file name (optional)

  • -t or --threads N : number of threads (default: 1)

  • -s or --stats : output summary statistics to stderr (default: False)

  • -p or --per-sequence: process each sequence individually instead of the entire file

  • --no-auto-detection: Disable automatic sequence type detection and treat all inputs as amino acids (default: False)

  • output format options

    • -a or --aa-composition : Include amino acid composition ratios in output (default: False)
    • -d or --decimal-places N : Number of decimal places (default: 6)
    • --no-header : Suppress header line in output (default: False)
    • --max-length N : number of maximal amino acid length (default: None)
    • --min-length N : number of minimal amino acid length (default: None)
  • -n or --nucleotide : calculate GC content and ARSC values from nucleotide files (fna, fna.gz, fa, fa.gz, fasta, fasta.gz). Requires Prodigal to be installed in your PATH for gene prediction.

Example

1. Compute ARSC values on a .faa file.

quickARSC test_data/genome_a.faa
  • output example:
query N_ARSC C_ARSC S_ARSC AvgResMW TotalLength
genome_a 0.148438 3.132812 0.023438 123.568566 128

2. Process all .faa / .faa.gz files in a directory using 3 threads and save results as ARSC_output.tsv.

quickARSC test_data/ -t 5 -o ARSC_output.tsv

3. Output with amino acid composition table as ARSC_output_full.tsv and show statistics summary.

quickARSC test_data/ -t 5 -as -o ARSC_output_full.tsv

4. Sort results by N-ARSC (descending) using pipe.

arsc test_data/ -t 3 --no-header | sort -k2,2nr

5. Process each sequence individually instead of the entire file and filter results by amino acid length >= 65.

arsc test_data/ -t 3 --min-length 65 -p

6. Process nucleotide files (fna/fna.gz) and show base GC compositions and ARSC values. (Requires Prodigal)

arsc -n test_data/ -t 2 -d 2
query genomic_GC base_A base_T base_G base_C N_ARSC C_ARSC S_ARSC AvgResMW TotalLength
pSAR11 36.45 31.59 31.95 16.37 20.09 0.46 3.12 0.02 133.91 830
pSAR12 36.45 31.59 31.95 16.37 20.09 0.46 3.12 0.02 133.91 830

Input requirements

  • Input directory must contain one or more fasta files
    • .faa, .faa.gz, .fna, .fna.gz, .fa, .fa.gz, .fasta, .fasta.gz

Output

  • stdout (if you need no header, use --no-header option)
  • TSV file (via -o or --output, optional)

Default format columns: query, N_ARSC, C_ARSC, S_ARSC, AvgResMW, TotalLenghth

  • N-ARSC — Average number of nitrogen atoms per amino-acid residue side chain.
  • C-ARSC — Average number of carbon atoms per amino-acid residue side chain.
  • S-ARSC — Average number of sulfur atoms per amino-acid residue side chain.
  • AvgResMW — Average molecular weight of amino-acid residues (not only side chain!).
  • TotalLenghth — Total amino acid length.

Dependencies

Core Dependencies (Required)

  • Python >= 3.8
  • Biopython >= 1.79
    • For a minimal setup without Prodigal, use the --no-auto-detection flag with amino acid inputs.

Optional Dependencies

  • Prodigal >= 2.6.3: Required only for nucleotide mode to perform gene prediction.
    • Must be installed and available in your system PATH for nucleotide inputs.

Citation

Please cite following articles:


License

This project is distributed under the GPL-2.0 license.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arsc-0.5.1.tar.gz (17.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arsc-0.5.1-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file arsc-0.5.1.tar.gz.

File metadata

  • Download URL: arsc-0.5.1.tar.gz
  • Upload date:
  • Size: 17.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.0

File hashes

Hashes for arsc-0.5.1.tar.gz
Algorithm Hash digest
SHA256 eadd343a1fd166707ca95bd18f4cb3020113296ef14cd4bd34bb2f07d2323810
MD5 811600f8971c0197407f6e3119e0a3bf
BLAKE2b-256 9fadb89862a558fd0c8d88dc3f303d6882b585f6cfb3d4d0899f791e18a93506

See more details on using hashes here.

File details

Details for the file arsc-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: arsc-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 19.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.0

File hashes

Hashes for arsc-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 97b1b1e2438b94e97be901ae1855f2687453b9dae1cedd6ad992bfdff77da6ad
MD5 71e100655e1259b9825eea5c7bdcf7d9
BLAKE2b-256 3874ebf2257bb35f1599181f12b24417c8e9044f80e252f1756477d775db6462

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page