Compute ARSC (N/C/S) from protein fasta files
Project description
quickARSC: ARSC-based stoichiometry utility
quickARSC is a lightweight command-line tool and a web interface for quantifying elemental stoichiometry from protein FASTA files. It calculates the number of nitrogen (N), carbon (C), and sulfur (S) atoms per amino acid residue side chain (ARSC) across proteins or proteomes.
These metrics follow the definitions used in Mende et al., Nature Microbiology, (2017). https://doi.org/10.1038/s41564-017-0008-3
Web Interface
The static web interface is available at https://stsnsn.github.io/quickARSC/
Features:
- Pre-computed Results: Browse and download ARSC metrics for all 143,614 GTDB r226.0 representatives.
- Interactive Filtering: Filter results by taxonomy information.
- Custom Analysis: Upload your own amino acid FASTA file (.fa, .faa, .fasta) to compute ARSC metrics on-the-fly.
Standalone Package
The standalone package is available at https://pypi.org/project/arsc/
Features
- Elemental stoichiometry calculation: Calculate N, C, and S-ARSC directly from protein FASTA files or directories.
- Multiprocessing: Fast and scalable analysis of large genome or proteome datasets.
- Simple CLI tool: Run with a single command; easy to integrate into UNIX pipelines.
Installation From PyPI
pip install arsc
Usage
arsc <FASTA_FILE (.faa/.faa.gz) or input_dir/>
-
-hor--help: show help message -
-vor--version: show version -
-oor--output: output TSV file name (optional) -
-tor--threadsN : number of threads (default: 1) -
-sor--stats: output summary statistics to stderr (default: False) -
-por--per-sequence: process each sequence individually instead of the entire file -
output format options
-aor--aa-composition: Include amino acid composition ratios in output (default: False)-dor--decimal-placesN : Number of decimal places (default: 6)--no-header: Suppress header line in output (default: False)--max-lengthN : number of maximal amino acid length (default: None)--min-lengthN : number of minimal amino acid length (default: None)
-
-nor--nucleotide: calculate GC content and ARSC values from nucleotide files (fna, fna.gz, fa, fa.gz, fasta, fasta.gz). Requires Prodigal to be installed in your PATH for gene prediction.
Example
1. Compute ARSC values on a .faa file.
arsc test_data/genome_a.faa
- output example:
| query | N_ARSC | C_ARSC | S_ARSC | AvgResMW | TotalLength |
|---|---|---|---|---|---|
| genome_a | 0.148438 | 3.132812 | 0.023438 | 123.568566 | 194 |
2. Process all .faa / .faa.gz files in a directory using 3 threads and save results as ARSC_output.tsv.
arsc test_data/ -t 3 -o ARSC_output.tsv
3. Output with amino acid composition table as ARSC_output_full.tsv and show statistics summary.
arsc test_data/ -t 3 -as -o ARSC_output_full.tsv
4. Sort results by N-ARSC (descending) using pipe.
arsc test_data/ -t 3 --no-header | sort -k2,2nr
5. Process each sequence individually instead of the entire file and filter results by amino acid length >= 65.
arsc test_data/ -t 3 --min-length 65 -p
6. Process nucleotide files (fna/fna.gz) and show base compositions and ARSC values. (Requires Prodigal)
arsc -n test_data/ -t 2 -d 2
| query | genomic_GC | base_A | base_T | base_G | base_C | N_ARSC | C_ARSC | S_ARSC | AvgResMW | TotalLength |
|---|---|---|---|---|---|---|---|---|---|---|
| pSAR11 | 36.45 | 31.59 | 31.95 | 16.37 | 20.09 | 0.46 | 3.12 | 0.02 | 133.91 | 830 |
| pSAR12 | 36.45 | 31.59 | 31.95 | 16.37 | 20.09 | 0.46 | 3.12 | 0.02 | 133.91 | 830 |
Input requirements
- Input directory must contain one or more amino-acid fasta (
*.faaor*.faa.gz) files
Output
- stdout (if you need no header, use
--no-headeroption) - TSV file (via
-oor--output, optional)
Default format columns: query, N_ARSC, C_ARSC, S_ARSC, AvgResMW, TotalLenghth
- N-ARSC — Average number of nitrogen atoms per amino-acid residue side chain.
- C-ARSC — Average number of carbon atoms per amino-acid residue side chain.
- S-ARSC — Average number of sulfur atoms per amino-acid residue side chain.
- AvgResMW — Average molecular weight of amino-acid residues (not only side chain!).
- TotalLenghth — Total amino acid length.
Dependencies
- Python >= 3.8
- Biopython >= 1.79
Optional Dependencies
- Prodigal >= 2.6.3: Required only for nucleotide mode to perform gene prediction.
- Must be installed and available in your system PATH for
-n/--nucleotideoption.
- Must be installed and available in your system PATH for
Citation
Please cite following articles:
- (To be added)
- Mende et al., Nature Microbiology, (2017). https://doi.org/10.1038/s41564-017-0008-3
License
This project is distributed under the GPL-2.0 license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file arsc-0.5.0.tar.gz.
File metadata
- Download URL: arsc-0.5.0.tar.gz
- Upload date:
- Size: 16.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f9f2205f16363ad0ad154dd475fd3c1deda4051141e31b75e5941228e288e262
|
|
| MD5 |
795cd5957216ade82b4187d133d394e0
|
|
| BLAKE2b-256 |
038d809af5c749c60467b141e6668bcd60d236274161f77176c732d66dc9c222
|
File details
Details for the file arsc-0.5.0-py3-none-any.whl.
File metadata
- Download URL: arsc-0.5.0-py3-none-any.whl
- Upload date:
- Size: 17.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
620862b6af9def30c700e80bfe191ab3f0f2af5081cf5a9157fdb020fc63f00b
|
|
| MD5 |
209963d4f8dd438f9b532e7851214c18
|
|
| BLAKE2b-256 |
bf130e8f40ecf165bbeaa7a7d1d19064c125789871a58cc1317ff42b11116785
|