jdti · PyPI

No project description provided

These details have not been verified by PyPI

Project description

JDtI – Python library for scRNAseq/RNAseq data analysis

Python version License Docs

drawing

Author: Jakub Kubiś

Institute of Bioorganic Chemistry
Polish Academy of Sciences
Laboratory of Single Cell Analyses

Description

JDtI (JDataIntegration) is a Python library for data integration and advanced post-processing of single-cell datasets.

JDtI enables basic quality control steps such as control of cells per cluster, number of genes per cell, and more advanced tasks like subclustering, integration, and wide visualization. In this approach, we do not drop the cell information during separate set analyses; instead, we use previous cluster cell lineage information for integrating data based on cluster markers and data harmonization. After integration, it is possible to visualize cell interactions and correlations in many ways, including cell distance, correlations, and more.

Despite this, it is also able to conduct DEG analysis between sets, selected cells, or grouped cells, and visualize the results on UMAP, volcano plots, and regression plots comparing pairs of cells. It is very powerful for more advanced analyses focusing on specific issues within the data that may not be discovered in basic analyses.

Additionally, JDtI offers many functions for data visualization and processing within clean visual outputs, such as volcano plots, gene expression analysis of different data types, clustering, heatmaps, and more.

drawing

It is compatible with various sequencing approaches, including scRNA-seq and bulk RNA-seq, and supports interoperability with tools such as Seurat, Scanpy, and other bioinformatics frameworks using the 10x sparse matrix format as input. More details about the available functions can be found in the Documentation and Example Usage section on GitHub.

Installation Documenation

Example usage: 1. Basic functions 2. Data clustering 3. Data integration 4. Data subclustering

Installation

pip install jdti

Documentation

Documentation for classes and functions is available here 👉 Documentation 📄

Example usage

1. Basic functions

1. Loading functions

from jdti import *

2. Loading data

# load sparse matrix as pd.DataFrame data with creating metadata
data, metadata = load_sparse(path = 'data/set1', name = 'set1')

#load data frame from different data type (.tsv, .txt, .tsv)
data = pd.read_csv('example_data.csv')

fl = find_features(data, features =['KIT', 'MC1', 'EDNRB', 'PAX3'])

fl2 = find_features(data, features =['KIT', 'MC1R', 'EDNRB', 'PAX3'])

nam = find_names(data, names = ['0', '1', '2','10', '1&'])

data_reduced = reduce_data(data,
                features = fl2['included'],
                names = nam['included'])

DEG = calc_DEG(data_reduced, 
             metadata_list  = None, 
             entities = compare_dict, 
             sets = None, 
             min_exp = 0, 
             min_pct = 0.1, 
             n_proc =10)

DEG2 = calc_DEG(data, 
             metadata_list = metadata['sets'], 
             entities = compare_dict, 
             sets = None, 
             min_exp = 0, 
             min_pct = 0.1, 
             n_proc = 10)


fig = volcano_plot(DEG3, 
                 p_adj = True, 
                 top = 25, 
                 p_val = 0.05, 
                 lfc = 0.25, 
                 standard_scale = False, 
                 rescale_adj = True, 
                 image_width = 12, 
                 image_high = 12)


DEG3_10 = DEG3.sort_values(['p_val', 'esm', 'log(FC)'], ascending=[True, False, False]).head(10)

data_reduced = reduce_data(data,
                features = list(set(DEG3_10['feature'])),
                names = nam['included'])

avg = average(data_reduced)
occ = occurrence(data_reduced)

fig = features_scatter(expression_data = avg, 
                     occurence_data = occ,
                     features = None, 
                     metadata_list = None, 
                     colors = 'viridis', 
                     hclust = 'complete', 
                     img_width = 8, 
                     img_high = 5, 
                     label_size = 10, 
                     size_scale = 100,
                     x_lab = 'Genes', 
                     legend_lab = 'log(CPM + 1)',
                     bbox_to_anchor_scale = 25,
                     bbox_to_anchor_perc=(0.91, 0.55),
                     bbox_to_anchor_group=(1.01, 0.4))

fig = development_clust(data = avg, 
                      method = 'ward',
                      img_width = 5,
                      img_high = 5)

2. Data clustering

from jdti import Clustering, load_sparse

data, metadata = load_sparse(path = 'data/set2', name = 'set2')
clusters = Clustering.add_data_frame(data, metadata)

clusters.clustering_data
clusters.clustering_metadata

clusters.perform_PCA(pc_num=100, width=8, height=6)

clusters.knee_plot_PCA(width=8, height=6)

clusters.harmonize_sets(harmonize_type='harmony')

clusters.find_clusters_PCA(pc_num=0, eps=0.5, min_samples=10, width=8, height=6, harmonized=False)

clusters.perform_UMAP(factorize=False, umap_num=0, pc_num=5, harmonized=False)


clusters.knee_plot_umap(eps=0.5, min_samples=10)

clusters.find_clusters_UMAP(umap_n=5, eps=0.5, min_samples=10, width=8, height=6)


clusters.UMAP_vis(names_slot='cell_names', set_sep=True, point_size=0.6)

clusters.UMAP_feature(feature_name = 'KIT', features_data=None, point_size=0.6)

clusters.get_umap_data()

clusters.get_pca_data()

clusters.return_clusters(clusters='umap')

3. Data integration


from jdti import COMPsc, volcano_plot

jseq_object = COMPsc.project_dir('data', ['set1', 'set2'])

jseq_object.load_sparse_from_projects(normalized_data=True)

dt = jseq_object.get_partial_data(names=['10'], features=['KIT', 'PAX3', 'MITF'], name_slot='cell_names')

jseq_object.gene_histograme(bins=100)

jseq_object.gene_threshold(min_n = 50, max_n = 3000)

jseq_object.gene_histograme(bins=100)

jseq_object.reduce(reg = '5', inc_set = False)

jseq_object.gene_histograme(bins=100)

jseq_object.cell_histograme(name_slot = 'cell_names')

jseq_object.cluster_threshold(min_n = 20, name_slot = 'cell_names')

jseq_object.cell_histograme(name_slot = 'cell_names')

# returny

met = jseq_object.input_metadata

data = jseq_object.get_data(set_info=True) 

metadata = jseq_object.get_metadata()

jseq_object.calculate_difference_markers(min_exp = 0, 
                                         min_pct = 0.25, 
                                         n_proc=10, 
                                         force = False)



jseq_object.estimating_similarity(method = 'pearson', 
                                  p_val = 0.05,
                                  top_n = 10)
    

pl = jseq_object.similarity_plot(split_sets = True, 
                                 set_info = True,
                                 cmap='seismic', 
                                 width = 16, height = 14)

   
# pl.savefig(f'sim_plot_top_{top}.svg', dpi=300, bbox_inches='tight')

pl2 = jseq_object.spatial_similarity(set_info= True, bandwidth = 1, n_neighbors = 5,
min_dist = 0.1, legend_split = 2, point_size = 20, spread=1.0,
set_op_mix_ratio=1.0,
local_connectivity=1,
repulsion_strength=1.0,
negative_sample_rate=5,
width = 12, height = 10)

pl2.savefig(f'sim_plot_map_top_{top}.svg', dpi=300, bbox_inches='tight')

sim_data = jseq_object.similarity sim_data = sim_data[sim_data['set1'] != sim_data['set2']]

jseq_object.cell_regression( cell_x = '2', cell_y = '6', set_x = 'set1', set_y = 'set2', threshold = 6, image_width = 12, image_high = 7, color = 'black')

jseq_object.clustering_features(name_slot = 'cell_names', features_list = None, p_val = 0.05, top_n = 10, adj_mean = False, beta = 0.2)

jseq_object.perform_PCA(pc_num = 50)

jseq_object.knee_plot_PCA()

jseq_object.harmonize_sets(harmonize_type = 'harmony')

# jseq_object.find_clusters_PCA(pc_num = 100, eps = 0.5, min_samples = 10)

jseq_object.perform_UMAP(factorize=False, umap_num = 2, pc_num = 10, harmonized = True)


# jseq_object.knee_plot_umap(eps = 0.5, min_samples = 10)


# jseq_object.find_clusters_UMAP(umap_n = 6, eps = 1, min_samples = 20)


plu = jseq_object.UMAP_vis( 
             names_slot = 'cell_names', 
             set_sep = True,
             point_size = 1,
             font_size = 6,
             legend_split_col = 2,
             width = 8,
             height = 6,
             inc_num = True)

# plu.savefig(f'sim_umap_top.svg', dpi=300, bbox_inches='tight')


plu = jseq_object.UMAP_vis( 
             names_slot = 'sets', 
             set_sep = True,
             point_size = 1,
             font_size = 6,
             legend_split_col = 1,
             width = 8,
             height = 6,
             inc_num = False)

# plu.savefig(f'sim_umap_sets_top_.svg', dpi=300, bbox_inches='tight')

vis = jseq_object.UMAP_feature( 
             features_data = jseq_object.get_data(set_info = False) ,
             feature_name = 'MAP1B',
             point_size = 0.6,
             font_size = 6,
             width = 8,
             height = 6,
             palette = 'light')

# vis.savefig(f'sim_umap_sets_top_vis.svg', dpi=300, bbox_inches='tight')

jseq_object.var_data


# jseq_object.save_project(name = 'topola')

stats = jseq_object.statistic(cells=None, sets='All', min_exp=0, min_pct=0.025, n_proc=10)
stats_5 = stats.sort_values(['valid_group', 'esm', 'log(FC)'], ascending=[True, False, False]).groupby('valid_group').head(5)



fig = volcano_plot(stats)

jseq_object.scatter_plot(
                 names = None,
                 features = list(set(stats_5['feature'])),
                 name_slot = 'cell_names',
                 scale = False,
                 colors = 'viridis', 
                 hclust = 'complete', 
                 img_width  = 15, 
                 img_high  = 3, 
                 label_size = 10, 
                 size_scale = 200,
                 x_lab = 'Genes', 
                 legend_lab = 'log(CPM + 1)',
                 set_box_size = 5,
                 set_box_high = 0.1,
                 bbox_to_anchor_scale = 25,
                 bbox_to_anchor_perc=(0.90, 0.5),
                 bbox_to_anchor_group=(0.9, 0.3))

import re

jseq_object.data_composition( 
                     features_count = list(set([re.sub(r' .*$', '',x) for x in list(set(jseq_object.input_metadata['cell_names']))])),
                     name_slot = 'cell_names',
                     set_sep = True
                     )


jseq_object.composition_pie( 
                    width = 6, 
                    height = 6, 
                    font_size = 15,
                    cmap  = "tab20",
                    legend_split_col = 1,
                    offset_labels = 0.5,
                    legend_bbox = (1.15, 0.95))


jseq_object.bar_composition( 
                    cmap = 'tab20b', 
                    width = 2, 
                    height = 6, 
                    font_size = 15,
                    legend_split_col = 1,
                    legend_bbox = (1.3, 1))

4. Data subclustering

from jdti import COMPsc

jseq_object = COMPsc.project_dir('data', ['set2'])

jseq_object.load_sparse_from_projects(normalized_data=True)

jseq_object.subcluster_prepare(features = ['HMGCS1', 'MAP1B', 'SOX4'], 
                               cluster='10')

jseq_object.define_subclusters( 
                          umap_num = 5,
                          eps = 1, 
                          min_samples = 5,
                          n_neighbors = 5,  
                          min_dist = 0.1, 
                          spread = 1.0,              
                          set_op_mix_ratio = 1.0,    
                          local_connectivity = 1,    
                          repulsion_strength = 1.0,  
                          negative_sample_rate = 5,  
                          width = 8, 
                          height = 6)

jseq_object.subcluster_features_scatter(
                                        colors = 'viridis', 
                                        hclust = 'complete', 
                                        img_width = 3, 
                                        img_high = 5, 
                                        label_size = 6, 
                                        size_scale = 70,
                                        x_lab = 'Genes', 
                                        legend_lab = 'normalized')

mapping = {
    "old_name": ["-1", "1", "4"],
    "new_name": ["1", "1", "1"]
}

jseq_object.rename_subclusters(mapping)

jseq_object.subcluster_DEG_scatter(
                                    top_n = 3,
                                    min_exp = 0, 
                                    min_pct = 0.1, 
                                    p_val = 0.05,
                                    colors = 'viridis', 
                                    hclust = 'complete', 
                                    img_width = 3, 
                                    img_high = 5, 
                                    label_size = 6, 
                                    size_scale = 70,
                                    x_lab = 'Genes', 
                                    legend_lab = 'normalized',
                                    n_proc=10)

jseq_object.accept_subclusters()

l = set(jseq_object.input_metadata['cell_names'])

Have fun JBS

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.7

Mar 31, 2026

0.2.6

Feb 17, 2026

0.2.5

Feb 13, 2026

0.2.4

Feb 12, 2026

0.2.3

Feb 10, 2026

0.2.2

Jan 29, 2026

0.2.1

Dec 17, 2025

0.2.0

Dec 16, 2025

0.1.9

Dec 16, 2025

0.1.8

Dec 16, 2025

0.1.7

Dec 8, 2025

0.1.6

Oct 16, 2025

0.1.5

Oct 16, 2025

0.1.4

Oct 15, 2025

0.1.3

Oct 14, 2025

0.1.2

Oct 14, 2025

0.1.1

Oct 14, 2025

This version

0.1.0

Oct 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jdti-0.1.0.tar.gz (45.1 kB view details)

Uploaded Oct 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

jdti-0.1.0-py3-none-any.whl (45.8 kB view details)

Uploaded Oct 10, 2025 Python 3

File details

Details for the file jdti-0.1.0.tar.gz.

File metadata

Download URL: jdti-0.1.0.tar.gz
Upload date: Oct 10, 2025
Size: 45.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.10.10

File hashes

Hashes for jdti-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`8c1d3bfae8cf81df9c4fb575df585cc1f2ffd4a428f9469e3ba27e5eb92d8b90`
MD5	`114dfcd70868d4442bd1b249bcbbc503`
BLAKE2b-256	`dfe2514fbdaf47be2317aeb6a82bf6afb65eae87629597d1bff3149575e857ec`

See more details on using hashes here.

File details

Details for the file jdti-0.1.0-py3-none-any.whl.

File metadata

Download URL: jdti-0.1.0-py3-none-any.whl
Upload date: Oct 10, 2025
Size: 45.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.10.10

File hashes

Hashes for jdti-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4570d8ea59df817b8b2688d852a5aa428489b0dafdae834a9d9f5f9a5737b0f0`
MD5	`49e31287379c8523b9af920b053bf73f`
BLAKE2b-256	`161ac4038b30f3e78ffa07177b18fc8b9be56ece4c2d6795a297e63298c37a0a`

See more details on using hashes here.

jdti 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

JDtI – Python library for scRNAseq/RNAseq data analysis

Author: Jakub Kubiś

Description

Table of contents

Installation

Documentation

Example usage

1. Basic functions

1. Loading functions

2. Loading data

2. Data clustering

3. Data integration

4. Data subclustering

Have fun JBS

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes