Skip to main content

Agentic-AI Cell-type Annotation Tool

Project description

AICAT: Agentic-AI Cell-type Annotation Tool

AICAT (Agentic-AI Cell-type Annotation Tool) is a computational method for automatic cell type annotation of single-cell RNA-seq data. It leverages the model via OpenAI's API to generate deep and context-aware cell type predictions based on cluster-level expression profiles.


Installation

To install the package in development mode:

git clone https://github.com/RavenGan/AICAT.git
cd AICAT
pip install -e .

API Key Requirement

AICAT requires access to the OpenAI GPT model. Users need to provide their own OpenAI API key. To avoid the risk of exposing the API key or committing the key to browsers, users need to set up the API key as a system environment variable before running AICAT.

Users can generate the API key in the OpenAI account webpage: log in to OpenAI. In the pop-up windows, click on “->” next to “API”; next, click on the left-hand-side icon of “API key”; then click on “Create new secret key” to create your key which directs you to the API key page. Copy the key and paste it on a note for further use. Avoid sharing your API key with others or uploading it to public spaces. Make sure it’s not visible in browsers or any client-side scripts. Finally, on the left bar, click “Settings”; on the break-down list click on “Billing”, and make sure you have non-zero credit balance.

With the API key, there are two options to setup the key:

Option 1: Using .env file

OPENAI_API_KEY=your-api-key-here

Option 2: Using terminal (temporary)

export OPENAI_API_KEY=your-api-key-here

Supported Input

The main input is a clustered .h5ad file containing single-cell gene expression data.

Required Structure

  • The .obs DataFrame should contain a cluster label column.
  • The expression matrix should be preprocessed (e.g., log-normalized) to allow differential gene calculation later in the program.
  • The .h5ad file can be generated via the python package Scanpy.

Usage

CLI Usage

Indepth annotation CLI

Run the following from your terminal. Here a small test data is used as an example. This data can also be found under the folder ./tests/data.

aicat-indepth \
  --openai_api_key "OPENAI_API_KEY" \
  --adata_path "tests/data/CRC_SMC05-T_processed.h5ad" \
  --species "human" \
  --tissue "primary colorectal cancer" \
  --cluster_col_name "Cell_type" \
  --data_name "CRC_SMC05-T" \
  --save_path "tests/res/CRC_SMC05-T" 

Display CLI options with

aicat-indepth --help

Subcluster CLI (optional)

To perform subclustering annotation, run the following from the terminal. The argument --AnnoSingle_res_path requires the output from the previous command aicat-indepth.

aicat-subcluster \
  --openai_api_key "OPENAI_API_KEY" \
  --adata_path "tests/data/CRC_SMC05-T_processed.h5ad" \
  --tissue "primary colorectal cancer" \
  --cluster_col_name 'Cell_type' \
  --chosen_cluster "B cells" \
  --AnnoSingle_res_path "tests/res/CRC_SMC05-T/AnnoSingle_primary colorectal cancer_res_dict.json" \
  --save_path "tests/res/CRC_SMC05-T_subcluster"

Display CLI options with

aicat-subcluster --help

Programmatic Usage

Users can also call aicat from python:

from aicat.main_indepth import indepth_annotation
# Indepth annotation
indepth_annotation(api_key, 
                    adata_path,
                    species,
                    tissue,
                    cluster_col_name)

# subclustering annotation
subcluster_annotation(api_key, 
                        adata_path,
                        tissue,
                        cluster_col_name,
                        chosen_cluster,
                        AnnoSingle_res_path)

Output

AICAT will output the following:

  • A .json file with predicted cell types
  • The full conversation history including the prompts and GPT response content
  • A log of the full running progress

Testing

Users can also run unit tests using

pytest tests/

To test a specific function manually:

python tests/test_indepth.py

Make sure OPENAI_API_KEY is set in your environment.

Dependencies

Major dependencies used in this package:

  • langchain, langchain-openai, langchain-core
  • pydantic, pydantic-settings
  • scanpy, pandas, dotenv, requests
  • argparse, json, os, re, subprocess See pyproject.toml for full details about the package versions.

License

This project is licensed under the MIT License.

Contributions

Contributions are welcome! If you’d like to add features or fix bugs, feel free to fork the repository and open a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aicat_annotator-0.0.1.tar.gz (287.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aicat_annotator-0.0.1-py3-none-any.whl (27.7 kB view details)

Uploaded Python 3

File details

Details for the file aicat_annotator-0.0.1.tar.gz.

File metadata

  • Download URL: aicat_annotator-0.0.1.tar.gz
  • Upload date:
  • Size: 287.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for aicat_annotator-0.0.1.tar.gz
Algorithm Hash digest
SHA256 937086674a3dd903d874f0d92f34e93e52bff62ae4bb28d1aa4605dab77a894b
MD5 d13bb1d06a8f668f77fce68ce5374d22
BLAKE2b-256 4d99f50626da446b0c2d6773f063a44bde8d993a3e82f73e67bdac891373de09

See more details on using hashes here.

File details

Details for the file aicat_annotator-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for aicat_annotator-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c29a2da984bd26cf4ba06d56e47554acb53a0404dd94dd87dc7b7edacb166ce0
MD5 cb31a4a40b7f68e187fc190e1552fe84
BLAKE2b-256 05586448f22e4c66709e5adab2a38140f0a8f96d4a9d45299bc771c3c64c8956

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page