No project description provided
Project description
JDtI – Python library for scRNAseq/RNAseq data analysis
Author: Jakub Kubiś
Polish Academy of Sciences
Laboratory of Single Cell Analyses
Description
JDtI enables basic quality control steps such as control of cells per cluster, number of genes per cell, and more advanced tasks like subclustering, integration, and wide visualization. In this approach, we do not drop the cell information during separate set analyses; instead, we use previous cluster cell lineage information for integrating data based on cluster markers and data harmonization. After integration, it is possible to visualize cell interactions and correlations in many ways, including cell distance, correlations, and more.
Despite this, it is also able to conduct DEG analysis between sets, selected cells, or grouped cells, and visualize the results on UMAP, volcano plots, and regression plots comparing pairs of cells. It is very powerful for more advanced analyses focusing on specific issues within the data that may not be discovered in basic analyses.
Additionally, JDtI offers many functions for data visualization and processing within clean visual outputs, such as volcano plots, gene expression analysis of different data types, clustering, heatmaps, and more.
It is compatible with various sequencing approaches, including scRNA-seq and bulk RNA-seq, and supports interoperability with tools such as Seurat, Scanpy, and other bioinformatics frameworks using the 10x sparse matrix format as input. More details about the available functions can be found in the Documentation and Example Usage section on GitHub.
Table of contents
Example usage: 1. Basic functions 2. Data clustering 3. Data integration 4. Data subclustering
Installation
pip install jdti
Documentation
Documentation for classes and functions is available here 👉 Documentation 📄
Example usage
1. Basic functions
1. Loading functions
from jdti import *
2. Loading data
# load sparse matrix as pd.DataFrame data with creating metadata
data, metadata = load_sparse(path = 'data/set1', name = 'set1')
#load data frame from different data type (.tsv, .txt, .tsv)
data = pd.read_csv('example_data.csv')
fl = find_features(data, features =['KIT', 'MC1', 'EDNRB', 'PAX3'])
fl2 = find_features(data, features =['KIT', 'MC1R', 'EDNRB', 'PAX3'])
nam = find_names(data, names = ['0', '1', '2','10', '1&'])
data_reduced = reduce_data(data,
features = fl2['included'],
names = nam['included'])
DEG = calc_DEG(data_reduced,
metadata_list = None,
entities = compare_dict,
sets = None,
min_exp = 0,
min_pct = 0.1,
n_proc =10)
DEG2 = calc_DEG(data,
metadata_list = metadata['sets'],
entities = compare_dict,
sets = None,
min_exp = 0,
min_pct = 0.1,
n_proc = 10)
fig = volcano_plot(DEG3,
p_adj = True,
top = 25,
p_val = 0.05,
lfc = 0.25,
standard_scale = False,
rescale_adj = True,
image_width = 12,
image_high = 12)
DEG3_10 = DEG3.sort_values(['p_val', 'esm', 'log(FC)'], ascending=[True, False, False]).head(10)
data_reduced = reduce_data(data,
features = list(set(DEG3_10['feature'])),
names = nam['included'])
avg = average(data_reduced)
occ = occurrence(data_reduced)
fig = features_scatter(expression_data = avg,
occurence_data = occ,
features = None,
metadata_list = None,
colors = 'viridis',
hclust = 'complete',
img_width = 8,
img_high = 5,
label_size = 10,
size_scale = 100,
x_lab = 'Genes',
legend_lab = 'log(CPM + 1)',
bbox_to_anchor_scale = 25,
bbox_to_anchor_perc=(0.91, 0.55),
bbox_to_anchor_group=(1.01, 0.4))
fig = development_clust(data = avg,
method = 'ward',
img_width = 5,
img_high = 5)
2. Data clustering
from jdti import Clustering, load_sparse
data, metadata = load_sparse(path = 'data/set2', name = 'set2')
clusters = Clustering.add_data_frame(data, metadata)
clusters.clustering_data
clusters.clustering_metadata
clusters.perform_PCA(pc_num=100, width=8, height=6)
clusters.knee_plot_PCA(width=8, height=6)
clusters.harmonize_sets(harmonize_type='harmony')
clusters.find_clusters_PCA(pc_num=0, eps=0.5, min_samples=10, width=8, height=6, harmonized=False)
clusters.perform_UMAP(factorize=False, umap_num=0, pc_num=5, harmonized=False)
clusters.knee_plot_umap(eps=0.5, min_samples=10)
clusters.find_clusters_UMAP(umap_n=5, eps=0.5, min_samples=10, width=8, height=6)
clusters.UMAP_vis(names_slot='cell_names', set_sep=True, point_size=0.6)
clusters.UMAP_feature(feature_name = 'KIT', features_data=None, point_size=0.6)
clusters.get_umap_data()
clusters.get_pca_data()
clusters.return_clusters(clusters='umap')
3. Data integration
from jdti import COMPsc, volcano_plot
jseq_object = COMPsc.project_dir('data', ['set1', 'set2'])
jseq_object.load_sparse_from_projects(normalized_data=True)
dt = jseq_object.get_partial_data(names=['10'], features=['KIT', 'PAX3', 'MITF'], name_slot='cell_names')
jseq_object.gene_histograme(bins=100)
jseq_object.gene_threshold(min_n = 50, max_n = 3000)
jseq_object.gene_histograme(bins=100)
jseq_object.reduce(reg = '5', inc_set = False)
jseq_object.gene_histograme(bins=100)
jseq_object.cell_histograme(name_slot = 'cell_names')
jseq_object.cluster_threshold(min_n = 20, name_slot = 'cell_names')
jseq_object.cell_histograme(name_slot = 'cell_names')
# returny
met = jseq_object.input_metadata
data = jseq_object.get_data(set_info=True)
metadata = jseq_object.get_metadata()
jseq_object.calculate_difference_markers(min_exp = 0,
min_pct = 0.25,
n_proc=10,
force = False)
jseq_object.estimating_similarity(method = 'pearson',
p_val = 0.05,
top_n = 10)
pl = jseq_object.similarity_plot(split_sets = True,
set_info = True,
cmap='seismic',
width = 16, height = 14)
# pl.savefig(f'sim_plot_top_{top}.svg', dpi=300, bbox_inches='tight')
pl2 = jseq_object.spatial_similarity(set_info= True,
bandwidth = 1,
n_neighbors = 5,
min_dist = 0.1,
legend_split = 2,
point_size = 20,
spread=1.0,
set_op_mix_ratio=1.0,
local_connectivity=1,
repulsion_strength=1.0,
negative_sample_rate=5,
width = 12,
height = 10)
pl2.savefig(f'sim_plot_map_top_{top}.svg', dpi=300, bbox_inches='tight')
sim_data = jseq_object.similarity sim_data = sim_data[sim_data['set1'] != sim_data['set2']]
jseq_object.cell_regression( cell_x = '2', cell_y = '6', set_x = 'set1', set_y = 'set2', threshold = 6, image_width = 12, image_high = 7, color = 'black')
jseq_object.clustering_features(name_slot = 'cell_names', features_list = None, p_val = 0.05, top_n = 10, adj_mean = False, beta = 0.2)
jseq_object.perform_PCA(pc_num = 50)
jseq_object.knee_plot_PCA()
jseq_object.harmonize_sets(harmonize_type = 'harmony')
# jseq_object.find_clusters_PCA(pc_num = 100, eps = 0.5, min_samples = 10)
jseq_object.perform_UMAP(factorize=False, umap_num = 2, pc_num = 10, harmonized = True)
# jseq_object.knee_plot_umap(eps = 0.5, min_samples = 10)
# jseq_object.find_clusters_UMAP(umap_n = 6, eps = 1, min_samples = 20)
plu = jseq_object.UMAP_vis(
names_slot = 'cell_names',
set_sep = True,
point_size = 1,
font_size = 6,
legend_split_col = 2,
width = 8,
height = 6,
inc_num = True)
# plu.savefig(f'sim_umap_top.svg', dpi=300, bbox_inches='tight')
plu = jseq_object.UMAP_vis(
names_slot = 'sets',
set_sep = True,
point_size = 1,
font_size = 6,
legend_split_col = 1,
width = 8,
height = 6,
inc_num = False)
# plu.savefig(f'sim_umap_sets_top_.svg', dpi=300, bbox_inches='tight')
vis = jseq_object.UMAP_feature(
features_data = jseq_object.get_data(set_info = False) ,
feature_name = 'MAP1B',
point_size = 0.6,
font_size = 6,
width = 8,
height = 6,
palette = 'light')
# vis.savefig(f'sim_umap_sets_top_vis.svg', dpi=300, bbox_inches='tight')
jseq_object.var_data
# jseq_object.save_project(name = 'topola')
stats = jseq_object.statistic(cells=None, sets='All', min_exp=0, min_pct=0.025, n_proc=10)
stats_5 = stats.sort_values(['valid_group', 'esm', 'log(FC)'], ascending=[True, False, False]).groupby('valid_group').head(5)
fig = volcano_plot(stats)
jseq_object.scatter_plot(
names = None,
features = list(set(stats_5['feature'])),
name_slot = 'cell_names',
scale = False,
colors = 'viridis',
hclust = 'complete',
img_width = 15,
img_high = 3,
label_size = 10,
size_scale = 200,
x_lab = 'Genes',
legend_lab = 'log(CPM + 1)',
set_box_size = 5,
set_box_high = 0.1,
bbox_to_anchor_scale = 25,
bbox_to_anchor_perc=(0.90, 0.5),
bbox_to_anchor_group=(0.9, 0.3))
import re
jseq_object.data_composition(
features_count = list(set([re.sub(r' .*$', '',x) for x in list(set(jseq_object.input_metadata['cell_names']))])),
name_slot = 'cell_names',
set_sep = True
)
jseq_object.composition_pie(
width = 6,
height = 6,
font_size = 15,
cmap = "tab20",
legend_split_col = 1,
offset_labels = 0.5,
legend_bbox = (1.15, 0.95))
jseq_object.bar_composition(
cmap = 'tab20b',
width = 2,
height = 6,
font_size = 15,
legend_split_col = 1,
legend_bbox = (1.3, 1))
4. Data subclustering
from jdti import COMPsc
jseq_object = COMPsc.project_dir('data', ['set2'])
jseq_object.load_sparse_from_projects(normalized_data=True)
jseq_object.subcluster_prepare(features = ['HMGCS1', 'MAP1B', 'SOX4'],
cluster='10')
jseq_object.define_subclusters(
umap_num = 5,
eps = 1,
min_samples = 5,
n_neighbors = 5,
min_dist = 0.1,
spread = 1.0,
set_op_mix_ratio = 1.0,
local_connectivity = 1,
repulsion_strength = 1.0,
negative_sample_rate = 5,
width = 8,
height = 6)
jseq_object.subcluster_features_scatter(
colors = 'viridis',
hclust = 'complete',
img_width = 3,
img_high = 5,
label_size = 6,
size_scale = 70,
x_lab = 'Genes',
legend_lab = 'normalized')
mapping = {
"old_name": ["-1", "1", "4"],
"new_name": ["1", "1", "1"]
}
jseq_object.rename_subclusters(mapping)
jseq_object.subcluster_DEG_scatter(
top_n = 3,
min_exp = 0,
min_pct = 0.1,
p_val = 0.05,
colors = 'viridis',
hclust = 'complete',
img_width = 3,
img_high = 5,
label_size = 6,
size_scale = 70,
x_lab = 'Genes',
legend_lab = 'normalized',
n_proc=10)
jseq_object.accept_subclusters()
l = set(jseq_object.input_metadata['cell_names'])
Have fun JBS
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jdti-0.1.0.tar.gz.
File metadata
- Download URL: jdti-0.1.0.tar.gz
- Upload date:
- Size: 45.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.10.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8c1d3bfae8cf81df9c4fb575df585cc1f2ffd4a428f9469e3ba27e5eb92d8b90
|
|
| MD5 |
114dfcd70868d4442bd1b249bcbbc503
|
|
| BLAKE2b-256 |
dfe2514fbdaf47be2317aeb6a82bf6afb65eae87629597d1bff3149575e857ec
|
File details
Details for the file jdti-0.1.0-py3-none-any.whl.
File metadata
- Download URL: jdti-0.1.0-py3-none-any.whl
- Upload date:
- Size: 45.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.10.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4570d8ea59df817b8b2688d852a5aa428489b0dafdae834a9d9f5f9a5737b0f0
|
|
| MD5 |
49e31287379c8523b9af920b053bf73f
|
|
| BLAKE2b-256 |
161ac4038b30f3e78ffa07177b18fc8b9be56ece4c2d6795a297e63298c37a0a
|