sPYce is a Python package for the cross-species integration of single-nucleus ATAC-seq data (snATAC-seq)

These details have not been verified by PyPI

Project links

Project description

(Credits avatar: Created with https://deepai.org/machine-learning-model/pop-art-generator)

Single Cell Integration for Single Cell and Single Nucleus ATAC-seq Across Evolution (sPYce)

We present sPYce, a since cell and single nucleus ATAC-seq integration method across different species implemented in Python. sPYce takes single cell data (ie. a cell x feature matrix) and transforms it to single cell k-mer histograms. It was developed for single cell and single nucleus ATAC-seq data, but it can be similarly applied to any single cell data that follows an on-off relationship. In the case of ATAC-seq, the k-mer histogram represents the sequence composition of accessible regions. sPYce allows the straightforward combination of those k-mer histogram over several species, performs appropriate normalization, and defines an easy-to-use interface to run dimensionality reduction, clustering, embedding, visualisation, and automated label transfer.

For more information, please see our documentation with tutorials, instructions, and detailed API explanations.

Installation

The code was implemented and tested using Python=3.9, but any Python version greater than or equal to 3.6 should work. Make sure you have pip installed.

pip

The easiest way to install our package is using pip. Simply run

python3 -m pip install spyceATAC

Using the packaging service also allows to use the entry points (see below).

Git

If you want to contribute or customize the code, you can also clone the repository. Build and install the package from the code by navigating to the folder to which you cloned the repository. Then run

python3 -m build
python3 -m pip install .

This will install sPYce from the files in the repository. Note that if you have changed the code yourself, they'll be reflected in the global installation.

If you only want to install the requirements, use `pip`` via

python3 -m pip install -r requirements.txt

or, if you're a developer, we recommend running

python3 -m pip install -r requirements_dev.txt

Data

sPYce requires as data input the following files:

a peak matrix per sample and species, rows representing cells, columns representing peaks
a bed file with peaks in the same order as the peaks in the peak matrix. You need to have either one per peak matrix or one per species
a reference genome as fasta .fa file per species

If you follow our tutorial on our website, you can download the example data by running:

sh shell/get_testdata.sh /path/to/spyce/dir /output/dir n_cpus

Please replace the paths and the number of used processes with the desired values.

Example

sPYce's use is dependent on your data. Please follow the tutorial for more information. Generally, sPYce follows the workflow

KMer matrix creation sPYce requires a setup file in which file paths per species are saved. This allows easy modification and archiving if you want to test different data versions. We developed the .embed file format to represent hierarchical data easily without knowing any json or xml. It roughly follows a pythonic syntax, and hierarchies are represented as tab indents. Follow the tutorial for more information.

Create your own KMer matrices based on your data using our command-line interface

spyce-create --setup_file /path/to/setup/file --k k [optional parameters]

where you should replace /path/to/setup/file with the path to your own setup file, k with the desired k-mer length, and [optional parameters] with any additional parameter that you want to add. See full description by typing

spyce-create --help

Analysis Once the snATAC-seq peak matrices from all species were converted to a KMer matrix, you can analyse and treat the data (meaning the cell-by-k-mer histogram) as you want. Our interface tries to provide NumPy like interface.

# spyce libraries
from spyce.kmerMatrix import KMerCollection
from spyce.plotting import plot_dr, plot_umap

species_c_vec = ... # define your species color vector

# Load kmer collection
kmer_collect = KMerCollection.load("../data/kmer/tutorial/tutorial_kmer_collection_obj.pkl")
# centered unit-sum normalization
kmer_collect.set_normalization("centered_sum") 

# Perform PCA and get explained variance ratio of the first 3 PCs
explained_var_ratio = kmer_collect.reduce_dimensionality(
    algorithm="pca",
    save_name="pca",
    n_pca_components=n_pca_components
)[:3]

# Remove non-linear species effects that can occur due to unequal cell type distribution
kmer_collect.remove_species_effect(
    batch_vec=species_vec,  # indicate along which values a batch/species effect can occur
    dr_key="pca",  # Use the dimensionality reduction saved under the pca key
    save_name="harmony_pca",  # save the Harmony corrected PCs under harmony_pca 
    algorithm="harmony"  # use Harmony batch correction algorithm
)

# plot PCA
fig, ax = plot_dr(
    kmer_collect,
    ck=species_c_vec,
    dr_key="harmony_pca",
    cmap=None,
    randomize=True,
    title="PCA Species (centered-sum)",
)
handles = [
    Line2D(
        [0], [0], marker="o", color="w", 
        markerfacecolor=c, markersize=5, label=s
    )
    for s, c in species_colors.items()
] 
fig.legend(handles=handles, loc=7)
fig.tight_layout()
fig.subplots_adjust(right=0.7)
plt.show()

# compute UMAP
kmer_collect.umap(
    dr_name="harmony_pca",  # use the Harmony-corrected PCs for UMAP
    n_neighbors=n_umap_neighbors, 
    min_dist=min_dist
)

fig, ax = plot_umap(
    kmer_collect,
    ck=species_c_vec,
    randomize=True,
    title="UMAP Species (centered-sum)",
    cmap=None
)
handles = [
    Line2D(
        [0], [0], marker="o", color="w", 
        markerfacecolor=c, markersize=5, label=s
    )
    for s, c in species_colors.items()
] 
fig.legend(handles=handles, loc=7)
fig.tight_layout()
fig.subplots_adjust(right=0.7)
plt.show()

# Calculate nearest neighbor adjacency matrix to avoid re-calculating it for different leiden parameters.
# However, this can also be done directly by running `run_clustering`.
nn_mat = get_nn_mat(
    kmer_collect.dr["harmony_pca"], 
    n_neighbors=15, 
    verbosity=1, 
    return_distance=True
)

# calculate leiden clustering and test several parameters
kmer_collect.run_clustering(
    algorithm="leiden",  # set algorithm
    save_name="leiden_0.6",  # save result under this name
    resolution=.6,  # additional leiden parameter - resolution
    leiden_beta=0.,  # additional leiden parameter - beta
    adj_mat=nn_mat  # additional leiden parameter - adjacency matrix
)
kmer_collect.run_clustering(
    algorithm="leiden",
    save_name="leiden_0.4",
    resolution=.4,
    leiden_beta=0.,
    adj_mat=nn_mat
)
kmer_collect.run_clustering(
    algorithm="leiden",
    save_name="leiden_0.2",
    resolution=.2,
    leiden_beta=0.,
    adj_mat=nn_mat
)
kmer_collect.run_clustering(
    algorithm="leiden",
    save_name="leiden_0.1",
    resolution=.1,
    leiden_beta=0.,
    adj_mat=nn_mat
)

# plot Leiden embedding
fig, ax = plt.subplots(2, 2, figsize=(8, 8))
x_umap = kmer_collect.x_umap["umap"]  # get UMAP coordinates
idc = np.arange(x_umap.shape[0])
np.random.shuffle(idc)  # randomly shuffle indices

scat = ax.reshape(-1)[0].scatter(
    x_umap[idc, 0],
    x_umap[idc, 1],
    marker=".",
    s=1.,
    cmap="jet",
    c=kmer_collect.clustering["leiden_0.1"][idc],  # set Leiden clustering as color vector
)
ax.reshape(-1)[0].set_title(r"$\sigma=0.1$")

scat = ax.reshape(-1)[1].scatter(
    x_umap[idc, 0],
    x_umap[idc, 1],
    marker=".",
    s=1.,
    cmap="jet",
    c=kmer_collect.clustering["leiden_0.2"][idc],
)
ax.reshape(-1)[1].set_title(r"$\sigma=0.2$")

scat = ax.reshape(-1)[2].scatter(
    x_umap[idc, 0],
    x_umap[idc, 1],
    marker=".",
    s=1.,
    cmap="jet",
    c=kmer_collect.clustering["leiden_0.4"][idc],
)
ax.reshape(-1)[2].set_title(r"$\sigma=0.4$")

scat = ax.reshape(-1)[3].scatter(
    x_umap[idc, 0],
    x_umap[idc, 1],
    marker=".",
    s=1.,
    cmap="jet",
    c=kmer_collect.clustering["leiden_0.6"][idc],
)
ax.reshape(-1)[3].set_title(r"$\sigma=0.6$")

fig.suptitle("Leiden clusters")
fig.tight_layout()
plt.show()

Note that this example is not exhaustive. You can equally perform label transfer and TFBS enrichment analysis. But these steps depend more on your research question in mind. Please see our documentation and tutorials for more information.

Reference

If you found sPYce helpful for your own work, please consider citing. Pre-print coming soon!

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.2

Jan 24, 2026

0.0.5

Nov 2, 2025

0.0.4

Jul 9, 2025

0.0.3

Apr 9, 2025

This version

0.0.2

Apr 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spyceatac-0.0.2.tar.gz (68.8 kB view details)

Uploaded Apr 9, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spyceatac-0.0.2-py3-none-any.whl (149.2 kB view details)

Uploaded Apr 9, 2025 Python 3

File details

Details for the file spyceatac-0.0.2.tar.gz.

File metadata

Download URL: spyceatac-0.0.2.tar.gz
Upload date: Apr 9, 2025
Size: 68.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.18

File hashes

Hashes for spyceatac-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`c38dd7ae24f55effedadfcf1bc19e238f3bc2dd22d061467400aac5de329f6c6`
MD5	`cd00d6a746c240fd46530d10738a9da4`
BLAKE2b-256	`98e97fac2939dc5d7fff6699bf822491454c76ab76849f5bfe757bdc7492d19b`

See more details on using hashes here.

File details

Details for the file spyceatac-0.0.2-py3-none-any.whl.

File metadata

Download URL: spyceatac-0.0.2-py3-none-any.whl
Upload date: Apr 9, 2025
Size: 149.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.18

File hashes

Hashes for spyceatac-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e84066ac822175b965de86cd885219e5ba8a8f37042b76d5460967c89ce9f218`
MD5	`bd3a459a20a87ca04d39fad7f4044e79`
BLAKE2b-256	`204678f53c42005cab40ef27e0416c7e2cd19467da06b7400b60b136171728ab`

See more details on using hashes here.

spyceATAC 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Single Cell Integration for Single Cell and Single Nucleus ATAC-seq Across Evolution (sPYce)

Installation

pip

Git

Data

Example

Reference

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes