XLDataGraph is a library for crosslinking data analysis and visualization.
Project description
XLDataGraph
XLDataGraph is a feature-rich Python library for processing, filtering, comparing, and visualizing crosslinking mass spectrometry results. Built for advanced structural biology and proteomics workflows, it offers seamless integration of sequence, domain, and structural data, supporting publication-ready visualizations and network analyses.
Table of Contents
- Features
- Installation
- Supported File Types
- Quick Start Example
- API Overview
- Detailed Method Documentation
- Examples
- Contributing
- License
- Contact
- Project Status
Features
- Crosslink data: Imports MeroX
.zhrmresults and extracts crosslink site data. - Metadata: Includes support for domain (
.dmn), FASTA (.fasta), protein chain (.pcd), and structure (.cif,.pdb) files. - Flexible filtering: By score, replica number, crosslink type (inter/intra/homeotypic).
- Data merging/replica handling: Directly support combining, blanking, and comparing datasets.
- Visualization suite:
- Circos protein interaction plots with domain coloring.
- Venn diagrams for cross-dataset overlap.
- Gephi-compatible exports (AAI/PPI networks).
- ChimeraX pseudo-bond distance constraints.
- Structural predictions: Fast CA-CA or advanced A* path-based 3D predictions for hypothetical crosslinks.
- Scriptable and extensible for custom workflows.
Installation
pip install xldg
Supported File Types
| Extension | Description |
|---|---|
.fasta |
Protein sequences (Uniprot, Araport11, Custom) |
.dmn |
Protein domain annotation |
.zhrm |
MeroX result |
.pcd |
Protein-chain assignment descriptor |
.cif/.pdb |
Structure files (mmCIF or PDB format) |
Quick Start Example
import os
from xldg.data import Path, MeroX, Domain, Fasta, CrossLink
from xldg.graphics import CircosConfig, Circos
cwd = './examples/files'
fasta = Fasta.load_data(os.path.join(cwd, 'example.fasta'), 'Custom')
domains = Domain.load_data(Path.list_given_type_files(cwd, 'dmn'))
crosslinks = MeroX.load_data(Path.list_given_type_files(cwd, 'zhrm'), 'DSBU')
combined = CrossLink.combine_all(crosslinks)
config = CircosConfig(fasta, domains)
circos = Circos(combined, config)
circos.save(os.path.join(cwd, 'results', 'circos_basic.svg'))
API Overview
Data Loading & Filtering
Loading Data
- Fasta.load_data(path, fasta_format, remove_parenthesis=False)
- Load one or more FASTA files. See below for argument details.
- Domain.load_data(path)
- Load one or more domain annotation files.
- MeroX.load_data(path, linker=None)
- Import one or more
.zhrmcrosslink result files.
- Import one or more
- ProteinChain.load_data(path)
- Load a protein-chain map (
.pcd).
- Load a protein-chain map (
- ProteinStructure.load_data(path)
- Load a 3D structure file (
.pdbor.cif).
- Load a 3D structure file (
Filtering and Dataset Manipulation
- CrossLink.filter_by_score(dataset, min_score=0, max_score=sys.maxsize)
- Keep only crosslinks in the specified score window.
- CrossLink.filter_by_replica(dataset, min_replica=1, max_replica=sys.maxsize)
- Filter crosslinks by replica count.
- CrossLink.remove_interprotein(dataset)
- Remove all interprotein crosslinks.
- CrossLink.remove_intraprotein(dataset)
- Remove all intraprotein crosslinks.
- CrossLink.remove_homeotypic(dataset)
- Remove all homeotypic crosslinks (same residue/peptide on both sides).
- CrossLink.combine_all([datasets])
- Merge given datasets as a single dataset.
- CrossLink.combine_replicas(dataset_list, n)
- Combine every
ndatasets into a multi-replicate dataset group.
- Combine every
- CrossLink.blank_replica(dataset)
- Set all replica counts to 1 for plotting/overlap analyses.
Visualization
- CircosConfig: Configures Circos protein plot visuals and filters.
- Circos: Generates and saves Circos plots.
- VennConfig: Configures Venn diagrams.
- Venn2/Venn3: 2- or 3-group overlap plots.
- CrossLinkDataset.export_ppis_for_gephi(folder, filename, pcd)
- Exports protein-protein interaction graphs for Gephi visualization.
- CrossLinkDataset.export_aais_for_gephi(folder, filename, pcd)
- Exports residue-residue networks.
- CrossLinkDataset.export_for_chimerax(...)
- Exports
.pbpseudo-bond files for ChimeraX (see below for expanded argument docs).
- Exports
Structural Prediction
- ProteinStructureDataset.predict_crosslinks(...)
- Predicts possible crosslinks based on atomic-residue coordinates; can use direct or sampled pathfinding.
Detailed Method Documentation
Data Loading and Filtering
Fasta.load_data(path, fasta_format, remove_parenthesis=False)
- path:
strorlistofstr. Filepaths to FASTA files. - fasta_format:
str."Uniprot","Araport11", or"Custom"(header parser). - remove_parenthesis:
bool(optional). Remove parentheses content from headers. - returns:
FastaDataset.
Domain.load_data(path)
- path:
strorlistofstr. Path(s) to.dmnfile(s). - returns:
DomainDataset.
MeroX.load_data(path, linker=None)
- path:
strorlistofstr. Path(s) to.zhrmzipped result file(s). - linker:
str(optional). Linker type, for annotation purposes. - returns: One
CrossLinkDatasetfor single file;listof datasets for multiple files.
CrossLink.filter_by_score(dataset, min_score=0, max_score=sys.maxsize)
- dataset:
CrossLinkDatasetorlist. Input data. - min_score / max_score:
int. Score window. - returns: Filtered dataset.
CrossLink.filter_by_replica(dataset, min_replica=1, max_replica=sys.maxsize)
- dataset:
CrossLinkDatasetorlist. Input data. - min_replica / max_replica:
int. Allowed replica count window. - returns: Filtered dataset.
CrossLink.remove_interprotein(dataset)
- dataset:
CrossLinkDatasetorlist. Dataset(s) to filter. - returns: Dataset with all crosslinks between distinct proteins removed (only intraprotein and homeotypic remain).[^2][^1]
- Effect: Used to focus on internal protein organization, e.g., in Circos plots.
CrossLink.remove_intraprotein(dataset)
- dataset:
CrossLinkDatasetorlist. Dataset(s) to filter. - returns: Dataset with all intra-protein crosslinks removed (only interprotein and homeotypic links remain).[^1][^2]
- Effect: Good for focusing on protein-protein interactions (interactions between different proteins).
CrossLink.remove_homeotypic(dataset)
- dataset:
CrossLinkDatasetorlist. Dataset(s) to filter. - returns: Dataset with all homeotypic crosslinks removed (where both sites correspond to the same residue or peptide).[^2][^1]
- Effect: Streamlines network/structural analyses by removing redundancy.
CrossLink.combine_all(datasets)
- datasets:
listofCrossLinkDataset. All datasets to merge. - returns: Combined
CrossLinkDataset.
CrossLink.combine_replicas(dataset_list, n)
- dataset_list: List of datasets (e.g., by replicate).
- n: Number per group (e.g.,
n=3for three-replicate overlays). - returns: List of merged multi-replicate datasets.
CrossLink.blank_replica(dataset)
- dataset:
CrossLinkDatasetorlist. All replica counts set to 1. - returns: Dataset(s) for plotting or overlap comparison.
Visualization Methods
CircosConfig(fasta, domains=None, ...)
- fasta: FastaDataset.
- domains: DomainDataset (optional).
- legend, title:
str(optional). - figsize:
(float, float), e.g.(9, 9)for image size. - label_interval, space_between_sectors, font sizes, color overrides: See docstrings/defaults in code for advanced settings.
- returns: CircosConfig object.
Circos(crosslinks, config)
- crosslinks: CrossLinkDataset.
- config: CircosConfig.
- .save(path): Renders and saves Circos plot.
VennConfig(label_1, label_2, label_3=None, title=None, ...)
- label_1, label_2, label_3:
str. Categories for Venn sets (up to 3). - title:
str(optional). Plot title and additional options for color and font size. - returns: VennConfig.
Venn2(dataset1, dataset2, config)
- dataset1, dataset2: CrossLinkDataset. Sets to compare.
- config: VennConfig.
- .save(path): Save the Venn plot.
Venn3(dataset1, dataset2, dataset3, config)
- Add third dataset. Otherwise, usage as in Venn2.
CrossLinkDataset.export_ppis_for_gephi(folder, filename, pcd)
- folder:
str. Output folder. - filename:
str. Output.gexffilename. - pcd: ProteinChainDataset.
- returns: None. Writes file.
CrossLinkDataset.export_aais_for_gephi(folder, filename, pcd)
- Same as above, but for residue-residue/AAI level network.
CrossLinkDataset.export_for_chimerax(pcd, folder, filename, diameter=0.2, dashes=1, color_valid_distance='#48cae4', color_invalid_outsider='#d62828', protein_structure=None, min_distance=0, max_distance=sys.maxsize, atom_type='CA')
- pcd: ProteinChainDataset.
- folder: str.
- filename: str.
- diameter: float. Bond diameter for visualization.
- dashes: int. Style parameter for ChimeraX.
- color_valid_distance: str. For valid range links.
- color_invalid_outsider: str. For out-of-range links.
- protein_structure: ProteinStructureDataset (optional). Used for distance validation.
- min_distance/max_distance: float. Site distance boundaries.
- atom_type: str. (Usually "CA").
- returns: None. Writes one or more
*.pbfiles for ChimeraX.
Structure-Based Crosslink Prediction
ProteinStructureDataset.predict_crosslinks(pcd, residues_1, residues_2, min_length=1.0, max_length=sys.maxsize, linker=None, atom_type='CA', direct_path=True, radius=1.925, node_multiplier=100, num_processes=1)
- pcd: ProteinChainDataset.
- residues_1/residues_2: str. Residue selectors, e.g.
{Kfor N-term lysine,Kfor all lysines. - min_length: float (angstrom). Minimum allowed CA-CA distance.
- max_length: float (angstrom). Maximum allowed distance.
- linker: str. Linker identifier.
- atom_type: str (default
"CA"). On which atom dummy links will be modeled. - direct_path: bool. If
True, use direct CA-CA Euclidean distance; ifFalse, use A* path sampling (models obstacles, slow). - radius: float. Excluded-volume radius.
- node_multiplier: int. Controls sampling density if A* search used.
- num_processes: CPU count for parallel calculation (A* only).
- returns: CrossLinkDataset of predicted sites.
Examples
See examples/ directory for:
- Circos plotting
- Crosslink prediction and ChimeraX export
- Gephi (network) and Venn visualization
- Combined, advanced dataset filtering and merging
Contributing
Issues and pull requests welcome! See GitHub Issues.
License
This project is licensed under the GNU GPLv3.
Contact
- GitHub: @a-helix
Project Status
XLDataGraph is actively developed - please see the repository for the latest features, bugfixes, and documentation.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xldg-0.3.7.tar.gz.
File metadata
- Download URL: xldg-0.3.7.tar.gz
- Upload date:
- Size: 42.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bcac793d6fe431f39aafff4c4ff68da91caee1c6dd921c5c05996b8a5015dd03
|
|
| MD5 |
2b780123041a50132a59ff7495d22073
|
|
| BLAKE2b-256 |
6dac559920ab507024e14b996d4e70cc7619b8064cb7049549b8f349e2d8d2b4
|
File details
Details for the file xldg-0.3.7-py3-none-any.whl.
File metadata
- Download URL: xldg-0.3.7-py3-none-any.whl
- Upload date:
- Size: 39.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1da289ddd4e9e1112e3ba4fae6d6c8657a5539c7ba83075c25c9ccf9f5aabc5f
|
|
| MD5 |
0578147c70e307f54ecb0c9b00312e6e
|
|
| BLAKE2b-256 |
5c10e34c37b747821f2fbc8fc62999f6ce78caa465d5783ff976d3ab07f04d41
|