Quickly convert parquet files exported from stereo-seq to adata format and add appropriate metadata
Project description
stereoseq-to-adata
快速将从 stereo-seq 导出的 dataframe 转换为 adata 格式并添加合适的元信息
Quickly convert the dataframe exported from stereo-seq to adata format and add appropriate metadata
Installation
pip install stereoseq-to-adata
Usage
For single df
import s2a
adata = s2a.stereo_df_to_adata(
<your df or your df path>,
obs_src='cell_label' # cell_label or spot_bin
# spot_bin_size=50 # if you are using spot_bin, spot_bin_size is required
verbose=True
)
print(adata)
output:
2025-02-18 16:58:16 - reading /mnt/inner-data/sde/total_gene_2D/macaque-20240814-cla-all/total_gene_T89_macaque_f001_2D_macaque-20240814-cla-all.parquet...
2025-02-18 16:58:32 - raw df shape: (68613121, 8)
2025-02-18 16:58:32 - df after drop non-cell expr: (19689074, 8)
2025-02-18 16:58:32 - df columns: ['gene', 'x', 'y', 'umi_count', 'rx', 'ry', 'gene_area', 'cell_label']
2025-02-18 16:58:32 - has_rxry: True
2025-02-18 16:58:32 - start mapping...
2025-02-18 16:58:33 - n_genes: 15638, n_cells: 181398
2025-02-18 16:58:33 - creating sparse matrix...
2025-02-18 16:58:34 - creating AnnData...
2025-02-18 16:58:35 - done in 18.90 seconds
AnnData object with n_obs × n_vars = 181398 × 15638
obs: 'region_global_id'
obsm: 'spatial', 'spatial_r'
For std folder
Use Python:
import s2a
adatas = s2a.process_stereo_folder(
<your df folder>,
save_to=<folder to save adatas>,
obs_src='cell_label' # cell_label or spot_bin
# spot_bin_size=50 # if you are using spot_bin, spot_bin_size is required
)
print(adatas, end='\n...\n')
print(adatas[0].obs, end='\n...\n')
print(adatas[0].uns['export_meta'], end='\n...\n')
output:
processing files: 100%|██████████████████████████████████████████| 46/46 [00:33<00:00, 1.37it/s]
[AnnData object with n_obs × n_vars = 101001 × 15579
obs: 'region_global_id', 'region_name'
uns: 'export_meta'
obsm: 'spatial', 'spatial_r', AnnData object with n_obs × n_vars = 115651 × 15352
obs: 'region_global_id', 'region_name'
uns: 'export_meta'
obsm: 'spatial', 'spatial_r', AnnData object with n_obs × n_vars = 197038 × 15911
obs: 'region_global_id', 'region_name'
uns: 'export_meta'
obsm: 'spatial', 'spatial_r', AnnData object with n_obs × n_vars = 156175 × 15842
obs: 'region_global_id', 'region_name'
uns: 'export_meta'
...
...
]
---
region_global_id region_name
T89-cell-78 716 L-F3-l1
T89-cell-79 716 L-F3-l1
T89-cell-80 716 L-F3-l1
T89-cell-83 716 L-F3-l1
T89-cell-84 716 L-F3-l1
... ... ...
T89-cell-429516 585 L-F5-l5
T89-cell-429760 585 L-F5-l5
T89-cell-429821 647 L-F5-l6
T89-cell-430276 585 L-F5-l5
T89-cell-430305 585 L-F5-l5
[101001 rows x 2 columns]
---
{'animal_id': np.int64(1),
'cell_mask_root': '/data/sdbd/cell-mask-rechunk-by-row/macaque',
'cell_mask_version': 'macaque-20230418-v5',
'chip': 'T89',
'end_time': '2024-11-06T15:21:25.127952',
'export_parquet': np.True_,
'export_root': '/data/sde/total_gene_2D/macaque-20241106-mq179-F1-F7',
'export_tsv': np.False_,
'export_version': 'macaque-20241106-mq179-F1-F7',
'ignore_when_no_cell': np.False_,
'ignore_when_no_region': np.False_,
'ignored_areas': array([], dtype=float64),
'ntp_version': 'Mq179-motor',
'only_for_region_mapping': np.False_,
'pid': np.int64(3070353),
'sec_para': {'dx': np.int64(5924), 'dy': np.int64(28129)},
'selected_areas': array([ 900, 901, 902, 722, 1030, 709, 710, 687, 711, 712, 716,
715, 714, 686, 713, 1183, 1192, 1193, 1194, 1195, 644, 645,
646, 585, 647, 496, 497, 498, 500, 499, 491, 492, 493,
495, 494]),
'selected_areas_as_rect': np.float64(-1.0),
'skip_unselected_areas': np.True_,
'species': 'macaque',
'start_time': '2024-11-06T15:17:49.475968',
'status': 'success',
'user': 'myuan',
'with_cell_size': np.True_}
Use shell:
Related parameters are the same as above
$ python -m process_stereo_folder --help
usage: process-stereo-folder [-h] [OPTIONS]
Process a folder of stereo dataframes. Folder format should be zhengmingyuan's format:
/path/to/stereo_folder/
region-*.csv # region id and region name
total_gene_{chip_a}_*.parquet # gene expression matrix
total_gene_{chip_a}_*.meta.json # meta data
...
total_gene_{chip_z}_*.parquet # gene expression matrix
total_gene_{chip_z}_*.meta.json # meta data
...
save_to: Path | None
the path to save the AnnData objects obs_add_prefix: str
the prefix to add to the observation names verbose: bool
whether to print debug information workers: int
the number of workers to use
╭─ options ───────────────────────────────────────────────╮
│ -h, --help show this help message and exit │
│ --folder PATH|STR (required) │
│ --save-to {None}|PATH|STR │
│ (default: None) │
│ --obs-add-prefix STR (default: '{chip}-cell-') │
│ --obs-src {cell_label,spot_bin} │
│ (default: cell_label) │
│ --spot-bin-size INT (default: 50) │
│ --verbose, --no-verbose │
│ (default: False) │
│ --workers INT (default: 4) │
│ --enable-tqdm, --no-enable-tqdm │
│ (default: True) │
╰─────────────────────────────────────────────────────────╯
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file stereoseq_to_adata-0.2.1.tar.gz.
File metadata
- Download URL: stereoseq_to_adata-0.2.1.tar.gz
- Upload date:
- Size: 67.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
312b13571beb6dcbde40504c469958b04a766265f6a81edabf1f26c0e914406c
|
|
| MD5 |
09fb8fc66a00e52a389484abb6264784
|
|
| BLAKE2b-256 |
39e4cb1f9292acf564a4d51c432c419a9056c2ab4834ba03c5f66bf177bb0506
|
File details
Details for the file stereoseq_to_adata-0.2.1-py3-none-any.whl.
File metadata
- Download URL: stereoseq_to_adata-0.2.1-py3-none-any.whl
- Upload date:
- Size: 7.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e4add67e9424f5f040594c1d6690de7eb1427816c34f3353f3b9d28f4f019ee2
|
|
| MD5 |
1d39bb1709075ce1b5362455eeda897b
|
|
| BLAKE2b-256 |
df7af7c6030afda1d450dceb5a16ab25068e59b46dc51bd8b914263157f7fe16
|