Quickly convert parquet files exported from stereo-seq to adata format and add appropriate metadata

Project description

stereoseq-to-adata

快速将从 stereo-seq 导出的 dataframe 转换为 adata 格式并添加合适的元信息

Quickly convert the dataframe exported from stereo-seq to adata format and add appropriate metadata

Installation

pip install stereoseq-to-adata

Usage

For single df

import s2a

adata = s2a.stereo_df_to_adata(
    <your df or your df path>,
    obs_src='cell_label' # cell_label or spot_bin
    # spot_bin_size=50 # if you are using spot_bin, spot_bin_size is required
    verbose=True
)
print(adata)

output:

2025-02-18 16:58:16 - reading /mnt/inner-data/sde/total_gene_2D/macaque-20240814-cla-all/total_gene_T89_macaque_f001_2D_macaque-20240814-cla-all.parquet...
2025-02-18 16:58:32 - raw df shape: (68613121, 8)
2025-02-18 16:58:32 - df after drop non-cell expr: (19689074, 8)
2025-02-18 16:58:32 - df columns: ['gene', 'x', 'y', 'umi_count', 'rx', 'ry', 'gene_area', 'cell_label']
2025-02-18 16:58:32 - has_rxry: True
2025-02-18 16:58:32 - start mapping...
2025-02-18 16:58:33 - n_genes: 15638, n_cells: 181398
2025-02-18 16:58:33 - creating sparse matrix...
2025-02-18 16:58:34 - creating AnnData...
2025-02-18 16:58:35 - done in 18.90 seconds

AnnData object with n_obs × n_vars = 181398 × 15638
    obs: 'region_global_id'
    obsm: 'spatial', 'spatial_r'

For std folder

Use Python:

import s2a

adatas = s2a.process_stereo_folder(
    <your df folder>,
    save_to=<folder to save adatas>,
    obs_src='cell_label' # cell_label or spot_bin
    # spot_bin_size=50 # if you are using spot_bin, spot_bin_size is required

)
print(adatas, end='\n...\n')
print(adatas[0].obs, end='\n...\n')
print(adatas[0].uns['export_meta'], end='\n...\n')

output:

processing files: 100%|██████████████████████████████████████████| 46/46 [00:33<00:00,  1.37it/s]
[AnnData object with n_obs × n_vars = 101001 × 15579
    obs: 'region_global_id', 'region_name'
    uns: 'export_meta'
    obsm: 'spatial', 'spatial_r', AnnData object with n_obs × n_vars = 115651 × 15352
    obs: 'region_global_id', 'region_name'
    uns: 'export_meta'
    obsm: 'spatial', 'spatial_r', AnnData object with n_obs × n_vars = 197038 × 15911
    obs: 'region_global_id', 'region_name'
    uns: 'export_meta'
    obsm: 'spatial', 'spatial_r', AnnData object with n_obs × n_vars = 156175 × 15842
    obs: 'region_global_id', 'region_name'
    uns: 'export_meta'
    ...
    ...
]
---
                region_global_id region_name
T89-cell-78                  716     L-F3-l1
T89-cell-79                  716     L-F3-l1
T89-cell-80                  716     L-F3-l1
T89-cell-83                  716     L-F3-l1
T89-cell-84                  716     L-F3-l1
...                          ...         ...
T89-cell-429516              585     L-F5-l5
T89-cell-429760              585     L-F5-l5
T89-cell-429821              647     L-F5-l6
T89-cell-430276              585     L-F5-l5
T89-cell-430305              585     L-F5-l5

[101001 rows x 2 columns]
---

{'animal_id': np.int64(1),
 'cell_mask_root': '/data/sdbd/cell-mask-rechunk-by-row/macaque',
 'cell_mask_version': 'macaque-20230418-v5',
 'chip': 'T89',
 'end_time': '2024-11-06T15:21:25.127952',
 'export_parquet': np.True_,
 'export_root': '/data/sde/total_gene_2D/macaque-20241106-mq179-F1-F7',
 'export_tsv': np.False_,
 'export_version': 'macaque-20241106-mq179-F1-F7',
 'ignore_when_no_cell': np.False_,
 'ignore_when_no_region': np.False_,
 'ignored_areas': array([], dtype=float64),
 'ntp_version': 'Mq179-motor',
 'only_for_region_mapping': np.False_,
 'pid': np.int64(3070353),
 'sec_para': {'dx': np.int64(5924), 'dy': np.int64(28129)},
 'selected_areas': array([ 900,  901,  902,  722, 1030,  709,  710,  687,  711,  712,  716,
        715,  714,  686,  713, 1183, 1192, 1193, 1194, 1195,  644,  645,
        646,  585,  647,  496,  497,  498,  500,  499,  491,  492,  493,
        495,  494]),
 'selected_areas_as_rect': np.float64(-1.0),
 'skip_unselected_areas': np.True_,
 'species': 'macaque',
 'start_time': '2024-11-06T15:17:49.475968',
 'status': 'success',
 'user': 'myuan',
 'with_cell_size': np.True_}

Use shell:

Related parameters are the same as above

$ python -m process_stereo_folder --help


usage: process-stereo-folder [-h] [OPTIONS]

Process a folder of stereo dataframes. Folder format should be zhengmingyuan's format:
/path/to/stereo_folder/
    region-*.csv                      # region id and region name
    total_gene_{chip_a}_*.parquet     # gene expression matrix
    total_gene_{chip_a}_*.meta.json   # meta data
    ...
    total_gene_{chip_z}_*.parquet     # gene expression matrix
    total_gene_{chip_z}_*.meta.json   # meta data
    ...

save_to: Path | None
    the path to save the AnnData objects obs_add_prefix: str
    the prefix to add to the cell names verbose: bool
    whether to print debug information workers: int
    the number of workers to use

╭─ options ───────────────────────────────────────────────╮
│ -h, --help              show this help message and exit │
│ --folder PATH|STR       (required)                      │
│ --save-to {None}|PATH|STR                               │
│                         (default: None)                 │
│ --obs-add-prefix STR   (default: '{chip}-cell-')       │
│ --verbose, --no-verbose                                 │
│                         (default: False)                │
│ --workers INT           (default: 4)                    │
│ --enable-tqdm, --no-enable-tqdm                         │
│                         (default: True)                 │
╰─────────────────────────────────────────────────────────╯

Project details

Release history Release notifications | RSS feed

0.3.1

Dec 17, 2025

0.3.0

Dec 17, 2025

0.2.2

Mar 21, 2025

0.2.1

Mar 21, 2025

This version

0.2.0

Mar 17, 2025

0.1.2

Feb 19, 2025

0.1.0

Feb 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stereoseq_to_adata-0.2.0.tar.gz (67.0 kB view details)

Uploaded Mar 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

stereoseq_to_adata-0.2.0-py3-none-any.whl (7.7 kB view details)

Uploaded Mar 17, 2025 Python 3

File details

Details for the file stereoseq_to_adata-0.2.0.tar.gz.

File metadata

Download URL: stereoseq_to_adata-0.2.0.tar.gz
Upload date: Mar 17, 2025
Size: 67.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.4

File hashes

Hashes for stereoseq_to_adata-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`a5ddf20cfbe5e89c3a962ae704e842bbc849d7cdc1995b84bbcb0556729e26bb`
MD5	`cb85c0174a196a0b0ba56065b526812f`
BLAKE2b-256	`396ff9850d41ef681dc21de2220c3348a63c770f2fb71834ae4ddd8d110d8411`

See more details on using hashes here.

File details

Details for the file stereoseq_to_adata-0.2.0-py3-none-any.whl.

File metadata

Download URL: stereoseq_to_adata-0.2.0-py3-none-any.whl
Upload date: Mar 17, 2025
Size: 7.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.4

File hashes

Hashes for stereoseq_to_adata-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8d97e4553d7542ef5469d364e9e7ee6c3be5b7ceb551fa5ea1a5d830f258b45d`
MD5	`0e76a89c6c5f589f33be3b9404ae3d42`
BLAKE2b-256	`aeffd0ca02a4c382557d0ef05270612fdda28fa3fad9f8160d359006dbe12b2e`

See more details on using hashes here.

stereoseq-to-adata 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

stereoseq-to-adata

Installation

Usage

For single df

For std folder

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes