Skip to main content

ECA Client

Project description

ECAUGT

Caution: This package cannot work on Windows

ECAUGT is a package designed for customized in data cell sorting under the human Ensemble Cell Atalas (hECA). It contains the APIs to search and download data from the hECA's database.

You are welcomed to use our web version at http://eca.xglab.tech/#/cellSorting

You can also find more information at https://github.com/XuegongLab/ECAUGT and http://eca.xglab.tech/ecaugt/index.html

About hECA

hECA provides a platform for assembling massive scattered single-data into a unified Giant Table (uGT). We keeps exploring information framework and future ways of building and utilizing cell atlas. Here we provide entries for customized in data cell sorting, access to unified Hierarchical Annotation Framework (uHAF) and multifaceted portraits of genes, cell types and organs.

hECA and ECAUGT are designed and developed by XGlab in Tsinghua University.

Visit hECA's homepage at http://eca.xglab.tech/

Read our pre-print paper at https://www.biorxiv.org/content/10.1101/2021.07.21.453289v1

Install

pip install ECAUGT

Tutorial

1. Configuration

1.1 Load packages

import sys
import pandas as pd
import ECAUGT
import time
import multiprocessing
import numpy as np

1.2 Connect to server

# set parameters
endpoint = "https://HCAd-Datasets.cn-beijing.ots.aliyuncs.com"
access_id = "LTAI5t7t216W9amUD1crMVos" #enter your id and keys
access_key = "ZJPlUbpLCij5qUPjbsU8GnQHm97IxJ"
instance_name = "HCAd-Datasets"
table_name = 'HCA_d'
# setup client
ECAUGT.Setup_Client(endpoint, access_id, access_key, instance_name, table_name)

1.3 Build index

We should check if the index has been built.

ECAUGT.build_index()

2. Search cell with metadata condition

Conditions are presented in a structured string which is a combination of several logical expressions.

Each logical expression should be in the following forms:

field_name1 == value1,                          here '==' means equal

field_name2 <> value2,                          here '<>' means unequal

Three symbols are used for logical operation between expressions:

logical_expression1 && logical_expression2,     here '&&' means AND operation

logical_expression1 || logical_expression2,     here '||' means OR operation

! logical_expression1,                         here '!' means not NOT operation

Brackets are allowed and the priorities of the logical operations are as common. The metadata condition string is also robust to the space character.

# get primary keys
rows_to_get = ECAUGT.query_cells("organ == Lung && cell_type == T cell  ")

The variable rows_to_get is a list containing their primary keys.

3. Download data

We first download three columns of the queried cells and return them in the DataFrame form. (The first column in the result is the primary keys)

For illustration, we only download the first 20 cells.

rows_to_get_2 = rows_to_get[0:20]

3.1 Download interested columns

# download data in pandas::DataFrame from
ECAUGT.get_columnsbycell_para(rows_to_get = rows_to_get_2, cols_to_get=['cl_name','uHAF_name','cell_type'], col_filter=None, do_transfer = True, thread_num = multiprocessing.cpu_count()-1)

Then we show how the result will look like when we don't do transform.

# download data in list from
ECAUGT.get_columnsbycell_para(rows_to_get = rows_to_get_2, cols_to_get=['cl_name','uHAF_name','cell_type'], col_filter=None, do_transfer = False, thread_num = multiprocessing.cpu_count()-1)

3.2 Download all columns

We also compare the time consumption between parallel and unparallel cell download processes for the first 20 cells, and find the parallel process only takes about 1/3 time.

# the parallel version
start_time = time.time()
result = ECAUGT.get_columnsbycell_para(rows_to_get = rows_to_get_2, cols_to_get=None, col_filter=None, do_transfer = False, thread_num = multiprocessing.cpu_count()-1)
time.time()-start_time
# the unparallel version
start_time = time.time()
result = ECAUGT.get_columnsbycell(rows_to_get = rows_to_get_2, cols_to_get=None,col_filter=None,do_transfer = False)
time.time()-start_time

4. Search cell with both metadata condition and gene condition

Now we show hot to add gene conditions when downloading cells. Here we download some genes of the queried cells and select the cells whose expression level on PTPRC is larger than 0.1 and experssion level on CD3D is no less than 0.1

# add col_filter on gene
gene_condition = ECAUGT.seq2filter("PTPRC > 0.1 && CD3D>=0.1")

4.1 Download some of the columns

df_result = ECAUGT.get_columnsbycell_para(rows_to_get = rows_to_get, cols_to_get=['CD3D','PTPRC','donor_id','uHAF_name'], col_filter=gene_condition, do_transfer = True, thread_num = multiprocessing.cpu_count()-1)

We can find that 7403 cells among the 14870 queried cells has expression levels that satisfy PTPRC > 0.1 && CD3D>=0.1. Then we can download some columns of these cells with the parameter cols_to get and the genes involved in the condition must be included in the

4.2 Download all columns of these cells

We can get all expression levels and metadatas of these cells by setting the parameter cols_to_get as None

df_result = ECAUGT.get_columnsbycell_para(rows_to_get = rows_to_get, cols_to_get=None, col_filter=gene_condition, do_transfer = True, thread_num = multiprocessing.cpu_count()-1)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ECAUGT-1.0.6.tar.gz (17.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ECAUGT-1.0.6-py3-none-any.whl (18.0 kB view details)

Uploaded Python 3

File details

Details for the file ECAUGT-1.0.6.tar.gz.

File metadata

  • Download URL: ECAUGT-1.0.6.tar.gz
  • Upload date:
  • Size: 17.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 pkginfo/1.7.0 requests/2.25.1 setuptools/51.0.0.post20201207 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.6.10

File hashes

Hashes for ECAUGT-1.0.6.tar.gz
Algorithm Hash digest
SHA256 77f25b941513fad950804d2ee02348dee997a60e9f1c0641285ea524a28165a7
MD5 5901b1f5a1239858be71e8b7a2538039
BLAKE2b-256 396e31c915d706c8eff15a7e704269570420b336040d613f51a76f0b6e832088

See more details on using hashes here.

File details

Details for the file ECAUGT-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: ECAUGT-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 18.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 pkginfo/1.7.0 requests/2.25.1 setuptools/51.0.0.post20201207 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.6.10

File hashes

Hashes for ECAUGT-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 167457249f5e0988febcc08251d560de55c90980d96f0057ffa7d185d778afb7
MD5 9469bb5b7c97aa902fcdb24377cc3ba1
BLAKE2b-256 2ce1e257089ada2aa26654f33240c0a50ad19a65592b7fbe92904237e3cba44d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page