Skip to main content

Quickly convert parquet files exported from stereo-seq to adata format and add appropriate metadata

Project description

stereoseq-to-adata

快速将从 stereo-seq 导出的 dataframe 转换为 adata 格式并添加合适的元信息

Quickly convert the dataframe exported from stereo-seq to adata format and add appropriate metadata

Installation

pip install stereoseq-to-adata

Usage

For single df

import s2a

adata = s2a.stereo_df_to_adata(
    <your df or your df path>,
    obs_src='cell_label' # cell_label or spot_bin
    # spot_bin_size=50 # if you are using spot_bin, spot_bin_size is required
    verbose=True
)
print(adata)

output:

2025-02-18 16:58:16 - reading /mnt/inner-data/sde/total_gene_2D/macaque-20240814-cla-all/total_gene_T89_macaque_f001_2D_macaque-20240814-cla-all.parquet...
2025-02-18 16:58:32 - raw df shape: (68613121, 8)
2025-02-18 16:58:32 - df after drop non-cell expr: (19689074, 8)
2025-02-18 16:58:32 - df columns: ['gene', 'x', 'y', 'umi_count', 'rx', 'ry', 'gene_area', 'cell_label']
2025-02-18 16:58:32 - has_rxry: True
2025-02-18 16:58:32 - start mapping...
2025-02-18 16:58:33 - n_genes: 15638, n_cells: 181398
2025-02-18 16:58:33 - creating sparse matrix...
2025-02-18 16:58:34 - creating AnnData...
2025-02-18 16:58:35 - done in 18.90 seconds

AnnData object with n_obs × n_vars = 181398 × 15638
    obs: 'region_global_id'
    obsm: 'spatial', 'spatial_r'

For std folder

Use Python:

import s2a

adatas = s2a.process_stereo_folder(
    <your df folder>,
    save_to=<folder to save adatas>,
    obs_src='cell_label' # cell_label or spot_bin
    # spot_bin_size=50 # if you are using spot_bin, spot_bin_size is required

)
print(adatas, end='\n...\n')
print(adatas[0].obs, end='\n...\n')
print(adatas[0].uns['export_meta'], end='\n...\n')

output:

processing files: 100%|██████████████████████████████████████████| 46/46 [00:33<00:00,  1.37it/s]
[AnnData object with n_obs × n_vars = 101001 × 15579
    obs: 'region_global_id', 'region_name'
    uns: 'export_meta'
    obsm: 'spatial', 'spatial_r', AnnData object with n_obs × n_vars = 115651 × 15352
    obs: 'region_global_id', 'region_name'
    uns: 'export_meta'
    obsm: 'spatial', 'spatial_r', AnnData object with n_obs × n_vars = 197038 × 15911
    obs: 'region_global_id', 'region_name'
    uns: 'export_meta'
    obsm: 'spatial', 'spatial_r', AnnData object with n_obs × n_vars = 156175 × 15842
    obs: 'region_global_id', 'region_name'
    uns: 'export_meta'
    ...
    ...
]
---
                region_global_id region_name
T89-cell-78                  716     L-F3-l1
T89-cell-79                  716     L-F3-l1
T89-cell-80                  716     L-F3-l1
T89-cell-83                  716     L-F3-l1
T89-cell-84                  716     L-F3-l1
...                          ...         ...
T89-cell-429516              585     L-F5-l5
T89-cell-429760              585     L-F5-l5
T89-cell-429821              647     L-F5-l6
T89-cell-430276              585     L-F5-l5
T89-cell-430305              585     L-F5-l5

[101001 rows x 2 columns]
---

{'animal_id': np.int64(1),
 'cell_mask_root': '/data/sdbd/cell-mask-rechunk-by-row/macaque',
 'cell_mask_version': 'macaque-20230418-v5',
 'chip': 'T89',
 'end_time': '2024-11-06T15:21:25.127952',
 'export_parquet': np.True_,
 'export_root': '/data/sde/total_gene_2D/macaque-20241106-mq179-F1-F7',
 'export_tsv': np.False_,
 'export_version': 'macaque-20241106-mq179-F1-F7',
 'ignore_when_no_cell': np.False_,
 'ignore_when_no_region': np.False_,
 'ignored_areas': array([], dtype=float64),
 'ntp_version': 'Mq179-motor',
 'only_for_region_mapping': np.False_,
 'pid': np.int64(3070353),
 'sec_para': {'dx': np.int64(5924), 'dy': np.int64(28129)},
 'selected_areas': array([ 900,  901,  902,  722, 1030,  709,  710,  687,  711,  712,  716,
        715,  714,  686,  713, 1183, 1192, 1193, 1194, 1195,  644,  645,
        646,  585,  647,  496,  497,  498,  500,  499,  491,  492,  493,
        495,  494]),
 'selected_areas_as_rect': np.float64(-1.0),
 'skip_unselected_areas': np.True_,
 'species': 'macaque',
 'start_time': '2024-11-06T15:17:49.475968',
 'status': 'success',
 'user': 'myuan',
 'with_cell_size': np.True_}

Use shell:

Related parameters are the same as above

$ python -m process_stereo_folder --help

usage: process-stereo-folder [-h] [OPTIONS]

Process a folder of stereo dataframes. Folder format should be zhengmingyuan's format:
/path/to/stereo_folder/
    region-*.csv                      # region id and region name
    total_gene_{chip_a}_*.parquet     # gene expression matrix
    total_gene_{chip_a}_*.meta.json   # meta data
    ...
    total_gene_{chip_z}_*.parquet     # gene expression matrix
    total_gene_{chip_z}_*.meta.json   # meta data
    ...

save_to: Path | None
    the path to save the AnnData objects obs_add_prefix: str
    the prefix to add to the observation names verbose: bool
    whether to print debug information workers: int
    the number of workers to use

╭─ options ───────────────────────────────────────────────╮
│ -h, --help              show this help message and exit │
│ --folder PATH|STR       (required)                      │
│ --save-to {None}|PATH|STR                               │
│                         (default: None)                 │
│ --obs-add-prefix STR    (default: '{chip}-cell-')       │
│ --obs-src {cell_label,spot_bin}                         │
│                         (default: cell_label)           │
│ --spot-bin-size INT     (default: 50)                   │
│ --verbose, --no-verbose                                 │
│                         (default: False)                │
│ --workers INT           (default: 4)                    │
│ --enable-tqdm, --no-enable-tqdm                         │
│                         (default: True)                 │
╰─────────────────────────────────────────────────────────╯

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stereoseq_to_adata-0.2.1.tar.gz (67.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stereoseq_to_adata-0.2.1-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file stereoseq_to_adata-0.2.1.tar.gz.

File metadata

  • Download URL: stereoseq_to_adata-0.2.1.tar.gz
  • Upload date:
  • Size: 67.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.9

File hashes

Hashes for stereoseq_to_adata-0.2.1.tar.gz
Algorithm Hash digest
SHA256 312b13571beb6dcbde40504c469958b04a766265f6a81edabf1f26c0e914406c
MD5 09fb8fc66a00e52a389484abb6264784
BLAKE2b-256 39e4cb1f9292acf564a4d51c432c419a9056c2ab4834ba03c5f66bf177bb0506

See more details on using hashes here.

File details

Details for the file stereoseq_to_adata-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for stereoseq_to_adata-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e4add67e9424f5f040594c1d6690de7eb1427816c34f3353f3b9d28f4f019ee2
MD5 1d39bb1709075ce1b5362455eeda897b
BLAKE2b-256 df7af7c6030afda1d450dceb5a16ab25068e59b46dc51bd8b914263157f7fe16

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page