Skip to main content

Compute ARSC (N/C/S) from protein fasta files

Project description

quickARSC: ARSC-based stoichiometry utility

PyPI version DOI


quickARSC is a lightweight command-line tool and a web interface for quantifying elemental stoichiometry from protein FASTA files. It calculates the number of nitrogen (N), carbon (C), and sulfur (S) atoms per amino acid residue side chain (ARSC) across proteins or proteomes.

These metrics follow the definitions used in Mende et al., Nature Microbiology, (2017). https://doi.org/10.1038/s41564-017-0008-3


Web Interface

The static web interface is available at https://stsnsn.github.io/quickARSC/

Features:

  • Pre-computed Results: Browse and download ARSC metrics for all 143,614 GTDB r226.0 representatives.
  • Interactive Filtering: Filter results by taxonomy information.
  • Custom Analysis: Upload your own amino acid FASTA file (.fa, .faa, .fasta) to compute ARSC metrics on-the-fly.

Standalone Package

The standalone package is available at https://pypi.org/project/arsc/

Features

  • Elemental stoichiometry calculation: Calculate N-, C-, and S-ARSC directly from protein FASTA files or directories.
  • Multiprocessing: Fast and scalable analysis of large genome or proteome datasets.
  • Simple CLI tool: Run with a single command; easy to integrate into UNIX pipelines.

Installation From PyPI

pip install arsc

Usage

arsc <FASTA_FILE (.faa / .faa.gz) or input_dir/>
  • -h or --help : show help message

  • -v or --version : show version

  • -o or --output : output TSV file name (optional)

  • -t or --threads N : number of threads (default: 1)

  • -s or --stats : output summary statistics to stderr (default: False)

  • -p, --per-sequence: process each sequence individually instead of the entire file

  • output format options

    • -a or --aa-composition : Include amino acid composition ratios and total length in output (default: False)
    • -d or --decimal-places N : Number of decimal places for floating point values (default: 6)
    • --no-header : Suppress header line in output (default: False)
    • --max-length N : number of maximal amino acid length (default: None)
    • --min-length N : number of minimal amino acid length (default: None)

Example

1. Compute ARSC values on a .faa file.

arsc test_data/genome_a.faa
  • output example:
query N_ARSC C_ARSC S_ARSC AvgResMW TotalLength
genome_a 0.148438 3.132812 0.023438 123.568566 194

2. Process all .faa / .faa.gz files in a directory using 3 threads and save results as ARSC_output.tsv.

arsc test_data/ -t 3 -o ARSC_output.tsv

3. Output with amino acid composition table as ARSC_output_full.tsv and show statistics summary.

arsc test_data/ -t 3 -as -o ARSC_output_full.tsv

4. Sort results by N-ARSC (descending) using pipe.

arsc test_data/ -t 3 --no-header | sort -k2,2nr

5. Process each sequence individually instead of the entire file and filter results by amino acid length > 130.

arsc test_data/ -t 3 --min-length 130 -p

Input requirements

  • Input directory must contain one or more amino-acid fasta (*.faa or *.faa.gz) files

Output

  • stdout (if you need no header, use --no-header option)
  • TSV file (via -o or --output, optional)

Default format columns: query, N_ARSC, C_ARSC, S_ARSC, AvgResMW, TotalLenghth

  • N-ARSC — Average number of nitrogen atoms per amino-acid residue side chain.
  • C-ARSC — Average number of carbon atoms per amino-acid residue side chain.
  • S-ARSC — Average number of sulfur atoms per amino-acid residue side chain.
  • AvgResMW — Average molecular weight of amino-acid residues (not only side chain!).
  • TotalLenghth — Total amino acid length.

Dependencies

  • Python >= 3.8
  • Biopython >= 1.79

Citation

Please cite following articles:


License

This project is distributed under the GPL-2.0 license.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arsc-0.4.3.tar.gz (14.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arsc-0.4.3-py3-none-any.whl (15.9 kB view details)

Uploaded Python 3

File details

Details for the file arsc-0.4.3.tar.gz.

File metadata

  • Download URL: arsc-0.4.3.tar.gz
  • Upload date:
  • Size: 14.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.0

File hashes

Hashes for arsc-0.4.3.tar.gz
Algorithm Hash digest
SHA256 ad65e9b857603059d49af14b4f4bdc60ed533f516c20a0c5b4a7679c4ffe6fa7
MD5 14bf113ad4a3f9cda55935b8d2c0403a
BLAKE2b-256 49f9c041c8a456ae191c328daedce940d888c3bcdd12122393de713ce9cd6ade

See more details on using hashes here.

File details

Details for the file arsc-0.4.3-py3-none-any.whl.

File metadata

  • Download URL: arsc-0.4.3-py3-none-any.whl
  • Upload date:
  • Size: 15.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.0

File hashes

Hashes for arsc-0.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 31c0fa39252c89675b1587620209a4ffa41d55720a3c29fb2a67cac0baa638eb
MD5 b4b213ee45a9b682e393cffeccb9969e
BLAKE2b-256 af4afaa5d7bd64b55c631b8b89b88c870998a52bde1ec52b707d41d6adc6357b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page