Compute ARSC (N/C/S) from protein fasta files
Project description
quickARSC: ARSC-based stoichiometry utility
quickARSC is a lightweight command-line tool and a web interface for quantifying elemental stoichiometry from protein FASTA files. It calculates the number of nitrogen (N), carbon (C), and sulfur (S) atoms per amino acid residue side chain (ARSC) across proteins or proteomes.
These metrics follow the definitions used in Mende et al., Nature Microbiology, (2017).
For more details, please visit our wiki.
Web Interface
The static web interface is available at https://stsnsn.github.io/quickARSC/
Features:
- Pre-computed Results: Browse and download ARSC metrics for all 143,614 GTDB r226.0 representatives.
- Interactive Filtering: Filter results by taxonomy information.
- Custom Analysis: Upload your own amino acid FASTA file (.fa, .faa, .fasta) to compute ARSC metrics on-the-fly.
Standalone Package
The standalone package is available at https://pypi.org/project/arsc/
Features
- Elemental stoichiometry calculation: Calculate N, C, and S-ARSC directly from protein FASTA files or directories.
- Multiprocessing: Fast and scalable analysis of large genome or proteome datasets.
- Simple CLI tool: Run with a single command; easy to integrate into UNIX pipelines.
Installation From PyPI
pip install arsc
Usage
quickARSC <FASTA_FILE(faa/faa.gz) or input_dir/>
For consistency with the package name, the command quickARSC is now available. We also maintain arsc as a shorter command for faster typing.
While we recommend providing protein FASTA files as input or explicitly using the --nucleotide flag, quickARSC automatically detects nucleotide sequences and uses predicted amino acid sequences (requiring Prodigal in your PATH) by default; this feature can be disabled with the --no-auto-detection flag.
-
-hor--help: show help message -
-vor--version: show version -
-oor--output: output TSV file name (optional) -
-tor--threadsN : number of threads (default: 1) -
-sor--stats: output summary statistics to stderr (default: False) -
-por--per-sequence: process each sequence individually instead of the entire file -
--no-auto-detection: Disable automatic sequence type detection and treat all inputs as amino acids (default: False) -
output format options
-aor--aa-composition: Include amino acid composition ratios in output (default: False)-dor--decimal-placesN : Number of decimal places (default: 6)--no-header: Suppress header line in output (default: False)--max-lengthN : number of maximal amino acid length (default: None)--min-lengthN : number of minimal amino acid length (default: None)
-
-nor--nucleotide: calculate GC content and ARSC values from nucleotide files (fna, fna.gz, fa, fa.gz, fasta, fasta.gz). Requires Prodigal to be installed in your PATH for gene prediction.
Example
1. Compute ARSC values on a .faa file.
quickARSC test_data/genome_a.faa
- output example:
| query | N_ARSC | C_ARSC | S_ARSC | AvgResMW | TotalLength |
|---|---|---|---|---|---|
| genome_a | 0.148438 | 3.132812 | 0.023438 | 123.568566 | 128 |
2. Process all .faa / .faa.gz files in a directory using 3 threads and save results as ARSC_output.tsv.
quickARSC test_data/ -t 5 -o ARSC_output.tsv
3. Output with amino acid composition table as ARSC_output_full.tsv and show statistics summary.
quickARSC test_data/ -t 5 -as -o ARSC_output_full.tsv
4. Sort results by N-ARSC (descending) using pipe.
arsc test_data/ -t 3 --no-header | sort -k2,2nr
5. Process each sequence individually instead of the entire file and filter results by amino acid length >= 65.
arsc test_data/ -t 3 --min-length 65 -p
6. Process nucleotide files (fna/fna.gz) and show base GC compositions and ARSC values. (Requires Prodigal)
arsc -n test_data/ -t 2 -d 2
| query | genomic_GC | base_A | base_T | base_G | base_C | N_ARSC | C_ARSC | S_ARSC | AvgResMW | TotalLength |
|---|---|---|---|---|---|---|---|---|---|---|
| pSAR11 | 36.45 | 31.59 | 31.95 | 16.37 | 20.09 | 0.46 | 3.12 | 0.02 | 133.91 | 830 |
| pSAR12 | 36.45 | 31.59 | 31.95 | 16.37 | 20.09 | 0.46 | 3.12 | 0.02 | 133.91 | 830 |
Input requirements
- Input directory must contain one or more fasta files
.faa,.faa.gz,.fna,.fna.gz,.fa,.fa.gz,.fasta,.fasta.gz
Output
- stdout (if you need no header, use
--no-headeroption) - TSV file (via
-oor--output, optional)
Default format columns: query, N_ARSC, C_ARSC, S_ARSC, AvgResMW, TotalLenghth
- N-ARSC — Average number of nitrogen atoms per amino-acid residue side chain.
- C-ARSC — Average number of carbon atoms per amino-acid residue side chain.
- S-ARSC — Average number of sulfur atoms per amino-acid residue side chain.
- AvgResMW — Average molecular weight of amino-acid residues (not only side chain!).
- TotalLenghth — Total amino acid length.
Dependencies
Core Dependencies (Required)
- Python >= 3.8
- Biopython >= 1.79
- For a minimal setup without Prodigal, use the
--no-auto-detectionflag with amino acid inputs.
- For a minimal setup without Prodigal, use the
Optional Dependencies
- Prodigal >= 2.6.3: Required only for nucleotide mode to perform gene prediction.
- Must be installed and available in your system PATH for nucleotide inputs.
Citation
Please cite following articles:
- (To be added)
- Mende et al., Nature Microbiology, (2017). https://doi.org/10.1038/s41564-017-0008-3
License
This project is distributed under the GPL-2.0 license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file arsc-0.5.1.tar.gz.
File metadata
- Download URL: arsc-0.5.1.tar.gz
- Upload date:
- Size: 17.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eadd343a1fd166707ca95bd18f4cb3020113296ef14cd4bd34bb2f07d2323810
|
|
| MD5 |
811600f8971c0197407f6e3119e0a3bf
|
|
| BLAKE2b-256 |
9fadb89862a558fd0c8d88dc3f303d6882b585f6cfb3d4d0899f791e18a93506
|
File details
Details for the file arsc-0.5.1-py3-none-any.whl.
File metadata
- Download URL: arsc-0.5.1-py3-none-any.whl
- Upload date:
- Size: 19.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97b1b1e2438b94e97be901ae1855f2687453b9dae1cedd6ad992bfdff77da6ad
|
|
| MD5 |
71e100655e1259b9825eea5c7bdcf7d9
|
|
| BLAKE2b-256 |
3874ebf2257bb35f1599181f12b24417c8e9044f80e252f1756477d775db6462
|