SonicParanoid: fast, accurate, and comprehensive orthology inference with machine learning and language models

These details have not been verified by PyPI

Project links

Project description

License

SonicParanoid

Fast, accurate, and comprehensive orthology inference with machine learning and language models

Description

SonicParanoid is a stand-alone software for the identification of orthologous relationships among multiple species. SonicParanoid is an open source software released under the GNU GENERAL PUBLIC LICENSE, Version 3.0 (GPLv3), implemented in Python3, Cython, and C++. It works on Linux and Mac OSX.

Fast and Scalable

SonicParanoid is able to infer the orthologs for hundres of prokaryotes in hours, or days for eukaryotes, using a desktop computer with 8 CPUs. This figure is much smaller when running on HPC servers with dozens of CPUs (e.g. <1h for the QfO benchmark datasets). It is also highly scalable, as it inferred the orthologs for 2000 MAGs in only 1 day using 128 CPUs.

Fast and scalable domain-aware orthology inference

SonicParanoid uses language models to infer orthologs at the domain level. The Artificial Neural Networks are directly trained on the input proteome set and it show a quasi-linear scalability on the number of input proteomes.

Accurate

SonicParanoid was tested using a benchmark proteome dataset from the Quest for Orthologs consortium, and the correctness of its predictions was evaluated using a standardized Orthology Benchmarking service. SonicParanoid showed a balanced trade-off between precision and recall, with an accuracy comparable to those of well-established inference methods.

Easy to use

Thanks to its speed, accuracy, and usability SonicParanoid substantially relieves the difficulties of orthology inference for biologists who need to construct and maintain their own genomic datasets.

Installation

For more detail on how to use and install SonicParanoid go its wiki-page: https://gitlab.com/salvo981/sonicparanoid2/-/wikis/home

Citation

Salvatore Cosentino, Sira Sriswasdi and Wataru Iwasaki (2024), SonicParanoid2: fast, accurate, and comprehensive orthology inference with machine learning and language models. Genome Biology. 25, Article number: 195 (2024) https://doi.org/10.1186/s13059-024-03298-4

Salvatore Cosentino and Wataru Iwasaki (2019), SonicParanoid: fast, accurate and easy orthology inference. Bioinformatics. Volume 35, Issue 1, 1 January 2019, Pages 149–151. https://doi.org/10.1093/bioinformatics/bty631

Changelog

For the complete changelog visit the release page on GitLab

2.0.9 (September 2025)

Python version: 3.10<=python<3.13 (this means that Python 3.9 is not supported anymore)
Enhancement: when installing MMseqs, the human-readeable version is also shown together with the hash commit for such version.
Enhancement: allow installation using Mamba (mini-forge)
Fix: pip install error due to obsolete cython code
Fix: package conflict when installing using Micromamba
Maintenance: upgraded Blast+ to v2.15.0

2.0.8 (August 7, 2024)

Announcement: SonicParanoid2 was published!
Citation links were updated.
Maintenance: upgrade to latest Diamond version (v2.1.9)
Fix: Avoid Diamond to fail when proteins containing only bases same as DNA bases are given as input. This is done by adding --ignore-warnings to the makedb and blastp commands.

2.0.7 (June 27, 2024)

New: Added a new program called sonicparanoid-get-profiles to download the MMseqs-PFam profile DB files. This can be used if the Profile DB could not be built locally.
Maintenance: Upgrade cython code to use dataclasses issue.
Fix: scikit-learn issue and upgrade the dependency to 1.5.0
Fix: inclusion of hits with lower bitscores due to overwrite
Fix: in species-species ortholog tables single relations were counted as 2
Fix: slowdowns due to queue timeout (only happened with thousands of proteomes and using slow storage)
Python version: 3.9<=python<3.13 (this means that Python 3.8 is not supported anymore)
Maintenance: update to support the latest Cython release
Maintenance: include early version of pyproject.toml (still use setup.py to compile Cython source files)

2.0.5 (April 9, 2024)

Fix: Scipy version related issue issue.
New feature: as requested, it is now possible extract multi-fasta files for selected (or all) output OGs.
Fix: issue that caused errors using some ANACONDA installations.

2.0.4 (July 3, 2023)

Maintenance update.

Fix: issue that caused SP2 to fail to predict the fastest alignments when using scikit-learn v1.3.0 and above.
Others: The installation guides in the web-page were updated to reflect the above fixed issue.

2.0.3 (June 6, 2023)

Small but important bug-fixes.

Fix: issue that caused the domain-based orthology to fail when processing huge proteomes.
Fix: error during creation of the PfamA profile DB.
Others: updated information on how to cite SP2.

2.0.2 (May 28, 2023)

This is a small maintenance update.

New: Support installation using Mamba/micromamba environments.
Fixed an error caused by shutil.rmtree on NFS file systems.

2.0.1 (May 2, 2023)

This is a massive update which introduces a lot new and features and improvements. SonicParanoid2 uses machine leaning for faster orthology and more comprehensive ortholgy inference. Visit the web-page for more details.

New: reduced all-vs-all execution time for all-vs-all alignments by 20~50% (depending on the dataset).
New: domain-aware orthology inference
Enhancement: you can now see the state of your run in real-time through status bars
Breaking change: many parameters have removed/added check the web-page more details.
Breaking change: removed single-linkage clustering for OGs
Python version: 3.8<=python<=3.10

1.3.8 (November 10, 2021)

Summary: fixed some important issues related to Diamond introduced with version v1.3.7.
Hot-fix: Missing otholog table.
Hot-fix: Error when using Diamond and index files.
Others: The minimum required memory per thread was reduced to 1 GigaByte.

1.3.7 (November 8, 2021)

Maintenance: upgraded to Diamond (v2.0.12)
Breaking change: the ortholog tables do not have their own directory anymore. For example for species 1 and 2 the ortholog table will stored under /project/orthologs_db/1/table.1-2
Breaking change: the ortholog matrixes are now stored under the directory '/project/ortholog_matrixes/'
Enhancement: more efficient directory structure for the orthologs_db directory.
Fix: Inconsistent OG counts with the same input dataset.
Others: set default value for the --max-len-diff parameter to 0.75.

1.3.6 (September 17, 2021)

Feature: BLAST can now be selected using the parameter --aln-tool
Feature: Diamond (v2.0.11) can now be selected using the parameter --aln-tool
Feature: added parameter --min-bitscore to set minimum bitscore for all-vs-all alignments (default is 40)
Usability: ANACONDA should now be used for installation on MacOS (and Linux were needed). Check the web-page for more details
Enhancement: added support for Python 3.9
Enhancement: retrained Adaboost model with new training data
Maintenance: upgraded to MMseqs2 version 13-45111
Fix: Throw an ERROR when empty files are input
Fix: Wrong automatic project naming
Breaking change: binaries (e.g., of MMSeqs) are now inside a single directory called software_packages
Breaking change: the -ml parameter is set to 1 by default
Breaking change: single linkage clustering was removed. The -slc parameter was accordingly removed
Breaking change: the parameter --max-gene-per-sp was removed
Others: minimum coverages for orthologs set to 20% and 20%

1.3.5 (December 11, 2020)

Enhancement: by default alignments are now compressed using the DEFLATE method in order to save storage space. The default compression level is 5 but it can be changed using the --compression-lev parameter.
Enhancement: reduces the I/O operations.
Usability: Added guide for the installation using CONDA to the web-page
Usability: removed homebrew as a requirement on MacOS
Usability: general improvements to the web-page
Maintenance: added filetype as a dependency
Fix: Execution error when using python 3.6

1.3.4 (July 25, 2020)

Enhancement: execution is 5~10% faster when many small proteomes are given input (e.g. > 1000)
Enhancement: considerably reduced IO when generating the alignments
Enhancement: when the available CPUs are more than the required alignment jobs these will be equally split between jobs instead of using 1 thread per job. This considerably reduces execution times when few big proteomes are in input, and many threads are available.
Enhancement: more informative output from the command line
Enhancement: output directories are now easier to browse even when many input files are provided
Enhancement: MCL binaries automatically installed for Linux and MacOS
Enhancement: warnings are shown only in debug mode
Enhancement: avoid users to restart a run using a different MMseqs sensitivity
Enhancement: automatically remove incomplete alignments when restarting a run
Maintenance: added wheel as a dependency and removed sh
Maintenance: upgraded to MMseqs2 version 11-e1a1c
Fix: Inconsistent results when using non-indexed target databases. Big thanks to Keito for providing the dataset.
Fix: wrongly formatted execution times in the alignments stats file.
Breaking change: alignments and ortholog tables are now organized into subdirectories, please check the web-page for details

1.3.2 (April 23, 2020)

Enhancement: Added support for Python 3.8
Maintenance: Increased minimum version for packages, Cython(0.29); pandas(1.0); numpy(1.18); scikit-learn(0.22); scipy(1.2.1); mypy(0.720); biopython(1.73)
Maintenance: Retrained prediction models using the latest version scikit-learn (0.22)
Fix: Too many open files error. Big thanks to Eva Deutekom
Fix: Removed scikit-lean warnings

1.3.0 (November 26, 2019)

Enhancement: SonicParanoid is much faster when using high sensitivity modes! Check the web-page
Enhancement: run directory names embed information about the run settings
Enhancement: generated temporary files are much smaller now
Fix: error with only 2 input species. Big thanks to Benjamin Hume
Fix: force overwriting of MMseqs2 binaries if the version is different from the supported one
Usability: Tested on Arch-based Manjaro Linux
Others: Big thanks to Shun Yamanouchi for providing some challenging datasets used for testing
Maintenance: upgraded to MMseqs2 version 10-6d92c

1.2.6 (August 26, 2019)

Fix: to many files open error which sometimes happened when using more than 20 threads

1.2.5 (August 7, 2019)

Fix: Logical threads are considered instead of physical cores in the adjustment of the threads number
Requirements: a minimum of 1.75 gigabytes per thread is required (the number of threads is automatically adjusted)
Enhancement: added parameter --force-all-threads to bypass the check for minimum per-thread memory

1.2.4 (July 14, 2019)

Enhancement: Added control to avoid selecting a number threads higher than the available physical CPU cores (big thanks to Shun Yamanouchi)
Fix: Removed some scipy warnings, now shown only in debug mode (thanks to Alexie Papanicolaou)
Requirements: psutils>=5.6.0 is now required
Requirements: mypy>=0.701 is now required
Requirements: at least Python 3.6 is now required

1.2.3 (June 7, 2019)

Enhancement: some error messages are more informative (big thanks to Jeff Stein)

1.2.2 (May 13, 2019)

Fix: solved a bug that caused MCL to be not properly compiled on some Linux distributions
Info: source code migrated to GitLab

1.2.1 (May 10, 2019)

Fix: solved bug related to random missing alignments
Info: this issue was first described in here

1.2.0 (April 26, 2019)

Change: Markov Clustering (MCL) is now used by default for the creation of ortholog groups
Enhancement: the MCL inflation can be controlled through the parameter --inflation
Enhancement: Output file with single-copy ortholog groups
Feature: single-linkage clustering for ortholog groups creation through the --single-linkage parameter
Enhancement: added secondary program to filter ortholog groups
Info: type sonicparanoid-extract --help to see the list of options
Enhancement: Filter ortholog groups by species ID
Enhancement: Filter ortholog groups by species composition (e.g. only groups with a given number of species)
Enhancement: Extract FASTA sequences of orthologs in selected groups
Fix: The correct version of SonicParanoid is now shown in the help
Others: General bug fixes and under-the-hood improvements

1.1.2 (March, 2019)

Enhancement: Filter ortholog groups by species ID
Enhancement: Filter ortholog groups by species composition (e.g. only groups with a given number of species)
Enhancement: Extract FASTA files corresponding orthologs in selected groups
Fix: The correct version of SonicParanoid is now shown in the help

1.1.1 (January 24, 2019)

Enhancement: No restriction on file names
Enhancement: No restriction on symbols used in FASTA headers
Enhancement: Added file with genes that could not be inserted in any group (not orthologs)
Enhancement: Added some statistics on the predicted ortholog groups
Enhancement: Update runs are automatically detected
Enhancement: Improved inference of in-paralogs
Enhancement: The directory structure has been redesigned to better support run updated

1.0.14 (October 19, 2018)

Enhancement: a warning is shown if non-protein sequences are given in input
Enhancement: upgraded to MMseqs2 6-f5a1c
Enhancement: SonicParanoid is now available through Bioconda

1.0.13 (September 18, 2018)

Fix: allow FASTA headers containing the '@' symbol

1.0.12 (September 7, 2018)

Improved accuracy
Added new sensitivity mode (most-sensitive)
Fix: internal input directory is wiped at every new run
Fix: available disk space calculation

1.0.11 (August 7, 2018)

Added new program (sonicparanoid-extract) to process output multi-species clusters
Added the possibility to analyse only 2 proteomes
Added support for Python3.7
Python3 versions: 3.5, 3.6, 3.7
Upgraded MMseqs2 (commit: a856ce, August 6, 2018)

1.0.9 (May 10, 2018)

First public release
Python3 versions: 3.4, 3.5, 3.6

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.0.9

Sep 12, 2025

2.0.9a16 pre-release

Sep 11, 2025

2.0.9a9 pre-release

Aug 28, 2024

2.0.9a5 pre-release

Aug 26, 2024

2.0.9a3 pre-release

Aug 20, 2024

2.0.8

Aug 7, 2024

2.0.8a2 pre-release

Aug 7, 2024

2.0.7

Jun 27, 2024

2.0.5

Apr 9, 2024

2.0.5a2 pre-release yanked

Jul 5, 2023

Reason this release was yanked:

Unstable version

2.0.4

Jul 3, 2023

2.0.3

Jun 6, 2023

2.0.2

May 27, 2023

2.0.1

May 2, 2023

2.0.0

May 2, 2023

1.3.8

Nov 10, 2021

1.3.7

Nov 8, 2021

1.3.6

Sep 17, 2021

1.3.5

Dec 11, 2020

1.3.4

Jul 25, 2020

1.3.2

Apr 23, 2020

1.3.0

Nov 26, 2019

1.3.0b8 pre-release

Nov 26, 2019

1.3.0b7 pre-release

Nov 26, 2019

1.3.0b6 pre-release

Nov 14, 2019

1.3.0b5 pre-release

Nov 12, 2019

1.3.0b4 pre-release

Nov 14, 2019

1.3.0b3 pre-release

Oct 28, 2019

1.3.0b1 pre-release

Oct 24, 2019

1.2.6

Aug 26, 2019

1.2.5

Aug 7, 2019

1.2.4

Jul 15, 2019

1.2.3

Jun 7, 2019

1.2.2

May 13, 2019

1.2.1

May 10, 2019

1.2.0

Apr 26, 2019

1.1.1

Jan 24, 2019

1.0.14

Oct 19, 2018

1.0.13

Sep 18, 2018

1.0.12

Sep 7, 2018

1.0.11

Aug 7, 2018

1.0.10

Aug 7, 2018

1.0.9

May 10, 2018

1.0.8

May 10, 2018

1.0.7

Mar 8, 2018

1.0.6

Mar 5, 2018

1.0.0.dev6 pre-release

Aug 2, 2018

1.0.0.dev5 pre-release

Jul 23, 2018

1.0.0.dev4 pre-release

Jul 23, 2018

1.0.0.dev2 pre-release

Apr 20, 2018

1.0.0.dev1 pre-release

Apr 20, 2018

0.7.3

Mar 5, 2018

0.7.1

Mar 5, 2018

0.7.0

Mar 5, 2018

0.6.8

Mar 1, 2018

0.6.7

Feb 28, 2018

0.6.6

Feb 28, 2018

0.6.5

Feb 28, 2018

0.6.4

Feb 28, 2018

0.6.3

Feb 28, 2018

0.6.2

Feb 28, 2018

0.6.1

Feb 28, 2018

0.6.0

Feb 28, 2018

0.1.9

May 10, 2018

0.1.8

May 10, 2018

0.1.7

May 10, 2018

0.1.6

May 10, 2018

0.1.5

May 10, 2018

0.1.4

May 10, 2018

0.1.3

May 3, 2018

0.1.2

May 2, 2018

0.1.0

Feb 27, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sonicparanoid-2.0.9.tar.gz (97.9 MB view details)

Uploaded Sep 12, 2025 Source

File details

Details for the file sonicparanoid-2.0.9.tar.gz.

File metadata

Download URL: sonicparanoid-2.0.9.tar.gz
Upload date: Sep 12, 2025
Size: 97.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.17

File hashes

Hashes for sonicparanoid-2.0.9.tar.gz
Algorithm	Hash digest
SHA256	`0c62b9584c4f4f614f299935842033e7998cc84a528ae1c2bbf8c84b2beb80cc`
MD5	`6bb648cfd110148d6929674a5573e79e`
BLAKE2b-256	`721382a4a45ab1f129de345d3a26e8f351f3e3d156911bf42fd81771b1112788`

See more details on using hashes here.

sonicparanoid 2.0.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SonicParanoid

Description

Fast and Scalable

Fast and scalable domain-aware orthology inference

Accurate

Easy to use

Installation

Citation

Changelog

2.0.9 (September 2025)

2.0.8 (August 7, 2024)

2.0.7 (June 27, 2024)

2.0.5 (April 9, 2024)

2.0.4 (July 3, 2023)

2.0.3 (June 6, 2023)

2.0.2 (May 28, 2023)

2.0.1 (May 2, 2023)

1.3.8 (November 10, 2021)

1.3.7 (November 8, 2021)

1.3.6 (September 17, 2021)

1.3.5 (December 11, 2020)

1.3.4 (July 25, 2020)

1.3.2 (April 23, 2020)

1.3.0 (November 26, 2019)

1.2.6 (August 26, 2019)

1.2.5 (August 7, 2019)

1.2.4 (July 14, 2019)

1.2.3 (June 7, 2019)

1.2.2 (May 13, 2019)

1.2.1 (May 10, 2019)

1.2.0 (April 26, 2019)

1.1.2 (March, 2019)

1.1.1 (January 24, 2019)

1.0.14 (October 19, 2018)

1.0.13 (September 18, 2018)

1.0.12 (September 7, 2018)

1.0.11 (August 7, 2018)

1.0.9 (May 10, 2018)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes