Tools for working with OrthoXML files.
Project description
orthoxml-tools
Tools for working with OrthoXML files.
What is OrthoXML Format?
OrthoXML is a standard for sharing and exchanging orthology predictions. OrthoXML is designed broadly to allow the storage and comparison of orthology data from any ortholog database. It establishes a structure for describing orthology relationships while still allowing flexibility for database-specific information to be encapsulated in the same format.
OrthoXML
Installation
pip install orthoxml-tools
Usage
orthoxml-tools [options] <subcommand> [options]
Note: Input OrthoXML files can be in plain text or compressed format. Both gzip (.gz) and bzip2 (.bz2) compression are supported.
Subcommands
🛠️ validate
Validate an OrthoXML file against the schema version specified in the file itself.
orthoxml-tools validate --infile path/to/file.orthoxml
Options:
--infile <file>: Specify the input file (required).
Example:
orthoxml-tools validate --infile examples/data/ex1.orthoxml
🛠️ stats
Display basic statistics.
orthoxml-tools stats --infile path/to/file.orthoxml [--outfile <file>]
Options:
--infile <file>: Specify the input file (required).
Example:
orthoxml-tools stats --infile examples/data/ex3-int-taxon.orthoxml
🛠️ gene-stats
Display statistics for gene count per taxon.
orthoxml-tools gene-stats --infile path/to/file.orthoxml [--outfile <file>]
Options:
--infile <file>: Specify the input file (required).--outfile <file>: Write stats to a txt file.
Example:
orthoxml-tools gene-stats --infile examples/data/ex3-int-taxon.orthoxml --outfile gene_stats.txt
🛠️ filter
Filter orthology groups based on CompletenessScore score and a threshold and strategy.
orthoxml-tools filter --infile path/to/file.orthoxml --threshold <value> --strategy <cascade-remove|extract|reparent> --outfile <file>
Options:
--infile <file>: Specify the input file. (required)--threshold <value>: Set the threshold for filtering. value below this will be removed. (required)--strategy <cascade-remove|extract|reparent>: Choose the filtering strategy (default iscascade-remove).--outfile <file>: Save output to a file. if not specified, the output will be printed to stdout. (required)
Examples:
orthoxml-tools filter --infile examples/data/sample-for-filter.orthoxml --score-name CompletenessScore --strategy top-down --threshold 0.24 --outfile tests_output/filtered_stream.orthoxml
🛠️ taxonomy
Print a human-readable taxonomy tree from the OrthoXML file.
orthoxml-tools taxonomy --infile path/to/file.orthoxml
Example:
>>> orthoxml-tools taxonomy --infile examples/data/ex3-int-taxon.orthoxml
Root
├── Mus musculus
└── Primates
├── Homo sapiens
└── Pan troglodytes
🛠️ export-pairs
Export pairs (orthologs or paralogs) in TSV form, with configurable chunking and buffering.
orthoxml-tools export-pairs <ortho|para> \
--infile <file> \
--outfile <file> \
[--id <tag>] \
[--chunk-size <number>] \
[--buffer-size <bytes>]
Positional arguments: <ortho|para> Choose which pair type to export:
ortho: orthologous pairspara: paralogous pairs
Options:
--infile <file>: Input OrthoXML file (required).--outfile <file>: Write output CSV to this file (required).--id <tag>: Gene attribute to use as identifier (default: id). other values: protId, geneId--chunk-size <number>: Number of pairs to process per chunk (default: 20_000).--buffer-size <bytes>: I/O buffer size in bytes (default: 4194304).
Examples:
# [5.1] Export ortholog pairs with default chunk & buffer sizes
orthoxml-tools export-pairs ortho \
--infile examples/data/ex1-int-taxon.orthoxml \
--outfile orthos.csv
# [5.2] Export paralog pairs with default chunk & buffer sizes
orthoxml-tools export-pairs para \
--infile examples/data/ex1-int-taxon.orthoxml \
--outfile paras.csv
# [5.3] Export ortholog pairs using `geneId` as the identifier column
orthoxml-tools export-pairs ortho \
--infile examples/data/ex1-int-taxon.orthoxml \
--outfile orthos_geneid.csv \
--id geneId
# [5.4] Export ortholog pairs with custom chunk and buffer sizes
orthoxml-tools export-pairs ortho \
--infile examples/data/ex1-int-taxon.orthoxml \
--outfile orthos_custom.csv \
--chunk-size 5000 \
--buffer-size 1048576
🛠️ export-ogs
Export Orthologous Groups as TSV file.
orthoxml-tools export-ogs --infile path/to/file.orthoxml --outfile path/to/output.tsv [--id <tag>]
Options:
--infile <file>: Input OrthoXML file (required).--outfile <file>: Write output CSV to this file (required).--id <tag>: Gene attribute to use as identifier (default: id).
Examples:
orthoxml-tools export-ogs --infile examples/data/sample-for-og.orthoxml --outfile tests_output/ogs.tsv --id protId
🛠️ split
Split the tree into multiple trees based on rootHOGs.
orthoxml-tools split --infile path/to/file.orthoxml --outdir path/to/output_folder
Options:
--infile <file>: Specify the input OrthoXML file (required).--outdir <folder>: Specify the output folder where the trees will be saved.
Examples:
orthoxml-tools split --infile examples/data/ex4-int-taxon-multiple-rhogs.orthoxml --outdir tests_output/splits
File Conversions
🛠️ OrthoXML to Newick Tree (NHX)
Convert OrthoXML to Newick (NHX) format.
orthoxml-tools to-nhx --infile path/to/file.orthoxml --outdir path/to/output_folder --xref-tag [geneId,protId,...]
Options:
--infile <file>: Specify the input OrthoXML file (required).--outdir <folder>: Specify the output folder where the NHX files will be saved (required).--xref-tag <tag>: Specify the attribute of the<gene>element to use as the label for the leaves. Default isprotId.--encode-levels: If set, encode group levels as NHX comments in the output tree. This is useful for visualizing the hierarchy of orthologous groups.
Example:
orthoxml-tools to-nhx --infile examples/data/sample-for-nhx.orthoxml --outdir ./tests_output/trees --xref-tag protId --encode-levels
🛠️ Newick Tree (NHX) to OrthoXML
Convert Newick (NHX) format to OrthoXML.
orthoxml-tools from-nhx --infile path/to/file.nhx --outfile path/to/file.orthoxml [--species-encode nhx|underscore]
Options:
--infile <file>: Specify the input nhx file or files. (at least one file is required).- You can specify multiple files by providing them as a space-separated list.
- If you provide multiple files, they will be combined into a single OrthoXML output.
--outfile <folder>: Specify the output OrthoXML file (required).--species-encode <nhx|underscore>: How species/taxonomic levels are encoded in the Newick files. (default:underscore)nhx→ Species encoded in NHX comments using S= or T= tags. For example:(A_s1:0.1[&&NHX:S=s1],B_s2:0.2[&&NHX:S=s2]);underscore→ Species encoded in leaf labels using underscores (e.g.,GeneID_SpeciesID).
Example:
orthoxml-tools from-nhx --infile examples/data/sample.nhx --outfile ./tests_output/from_nhx.orthoxml
orthoxml-tools from-nhx --infile examples/data/sample2.nhx examples/data/sample.nhx --outfile ./tests_output/from_nhx21.orthoxml
orthoxml-tools from-nhx \
--species-encode nhx \
--infile examples/data/sample.nhx \
--outfile tests_output/from_nhx_nhxspecies.orthoxml
🛠️ CSV to OrthoXML (exploratory feature)
Convert a CSV file to OrthoXML. The CSV file is structured such that each row represents an orthogroup (OG), each column corresponds to a species, and each cell contains a gene name. This format is generated by OrthoFinder e.g. examples/data/InputOrthogroups.csv.
[!WARNING] Note that since the CSV does not contain the full information required to represent the hierarchical structure of HOGs, the output OrthoXML file is reported at the root level. It should not be considered a full-fledged OrthoXML file.
orthoxml-tools from-csv --infile path/to/file.csv --outfile path/to/file.orthoxml
Options:
--infile <file>: Specify the input orthogroups.csv file (required).--outfile <folder>: Specify the output OrthoXML file (required).
Example:
orthoxml-tools from-csv --infile examples/data/InputOrthogroups.csv --outfile tests_output/orthofinder.orthoxml
🛠️ filter
Filter the OrthoXML tree by a completeness score.
--score-name <str>: Name of the field for completeness score annotation (e.g. 'CompletenessScore')--threshold <float>: Threshold value for the completeness score--strategy <bottomup|topdown>: Filtering strategy. Bottom-up will keep complete subHOGs even if they parents are incomplete.--outfile <file>: If provided, write the filtered OrthoXML to this file; otherwise, print to stdout
orthoxml-tools tests/test-data/case_filtering.orthoxml filter --score-name CompletenessScore \
--threshold 0.75 \
--strategy bottomup \
--outfile output-oxml.orthoxml
Help
To see help for any command:
orthoxml-tools --help
orthoxml-tools -h
orthoxml-tools stats --help
orthoxml-tools stats -h
Legacy API
The orthoxml-tools package used to provides a object oriented interface for working with OrthoXML files. This API is deprecated and will be removed in v1.0.0. Please use the new streaming CLI method. The documentation on it can be found here.
Testing
uv install `.[test]`
pytest -vv
# test cli
tests/test_cli.sh
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file orthoxml_tools-1.3.2.tar.gz.
File metadata
- Download URL: orthoxml_tools-1.3.2.tar.gz
- Upload date:
- Size: 15.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8646bb3f1bf0de994f6ba970d34f201d0df2a1fc53dead78fe73de77b664f94
|
|
| MD5 |
177a112c7f08e4d7bb443e6c78e76d9e
|
|
| BLAKE2b-256 |
deff9d227a5d59eb7db1f9b51799b5c682eb007264bd39b798fb78217684f472
|
Provenance
The following attestation bundles were made for orthoxml_tools-1.3.2.tar.gz:
Publisher:
publish.yml on DessimozLab/orthoxml-tools
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
orthoxml_tools-1.3.2.tar.gz -
Subject digest:
b8646bb3f1bf0de994f6ba970d34f201d0df2a1fc53dead78fe73de77b664f94 - Sigstore transparency entry: 1429634946
- Sigstore integration time:
-
Permalink:
DessimozLab/orthoxml-tools@f16965978ef5353b3ee9131b0ca91f0a74adf2e5 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/DessimozLab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f16965978ef5353b3ee9131b0ca91f0a74adf2e5 -
Trigger Event:
push
-
Statement type:
File details
Details for the file orthoxml_tools-1.3.2-py3-none-any.whl.
File metadata
- Download URL: orthoxml_tools-1.3.2-py3-none-any.whl
- Upload date:
- Size: 65.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
77db96158f594e121148c587fdffc57d09fc029ebb016637ee7576307ad8730e
|
|
| MD5 |
78fda1ded81b0c778c7f0931e692b630
|
|
| BLAKE2b-256 |
87ecf36eb09a18c3e4bb82473c0693ecd6749763e4a7f43baf1a2daca5ca41da
|
Provenance
The following attestation bundles were made for orthoxml_tools-1.3.2-py3-none-any.whl:
Publisher:
publish.yml on DessimozLab/orthoxml-tools
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
orthoxml_tools-1.3.2-py3-none-any.whl -
Subject digest:
77db96158f594e121148c587fdffc57d09fc029ebb016637ee7576307ad8730e - Sigstore transparency entry: 1429634954
- Sigstore integration time:
-
Permalink:
DessimozLab/orthoxml-tools@f16965978ef5353b3ee9131b0ca91f0a74adf2e5 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/DessimozLab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f16965978ef5353b3ee9131b0ca91f0a74adf2e5 -
Trigger Event:
push
-
Statement type: