Skip to main content

Genomic decomposition and reconstruction of non-tumor diploid subclones

Project description

CLEMENT

  • Genomic decomposition and reconstruction of non-tumor diploid subclones (2023)
  • CLonal decomposition via Expectation-Maximization algorithm established in Non-Tumor setting
  • Support multiple diploid sample
  • Biallelic variants (Homo, 1/1) can degrade the performance of CLEMENT.

Overview of CLEMENT workflow and core algorithms


CLEMENT_overview

Installation

Dependencies

  • python 3.6.x
  • matplotlib 3.5.2
  • seaborn 0.11.2
  • numpy 1.21.5
  • pandas 1.3.4
  • scikit-learn 1.0.2
  • scipy 1.7.3
  • palettable 3.3.0

Install from github

  1. git clone https://github.com/Yonsei-TGIL/CLEMENT.git
    cd CLEMENT
    pip3 install .

  2. pip3 install git+https://github.com/Yonsei-TGIL/CLEMENT.git

Install from PyPi

  1. pip3 install CLEMENTDNA

Version update

1.0.11 (Jan 1st, 2024)

Input format

As now of 1.0.4, CLEMENT only supports standardized TSV input. Examples of input file is shown in "example" directory.

  • 1st column: mutation ID (CHR_POS is recommended)
  • 2nd column: label (answer), if possible. If user don't know the label (answer), just set 0
  • 3rd column: Depth1,Alt1,Depth2,Alt2....,Depth_n,Alt_n * should be comma-separated, and no space permitted
  • 4th column: BQ1,BQ2....,BQ_n * should be comma-separated, and no space permitted. If absent, CLEMENT set default BQ as 20.

Running

command line interface

CLEMENT [OPTIONS]   

options

(Mandatory) These options are regarding User's input and output format
	--INPUT_TSV		Input data whether TSV. The tool automatically detects the number of samples
	--CLEMENT_DIR 		Directory where the outputs of CLEMENT be saved

These options are regarding downsizing User's input or not
	--RANDOM_PICK 		Set this variable to user want to downsize the sample. If user don't want to downsize, set -1. (default : -1).

These options are regarding the selection of likelihood model
	--MODEL 		Model for TP, FN in E-step.  (default: betabinomial)
	--CONSTANT   Constant multiplier for alpha and beta in beta-binomila distribution. (default:1)

These options are adjusting E-M algorithm parameter
	--NUM_CLONE_TRIAL_START 	Minimum number of expected cluster_hards (initation of K) 	(default: 3)
	--NUM_CLONE_TRIAL_END 		Maximum number of expected cluster_hards (termination of K)	 (default: 5)
	--TRIAL_NO 			Trial number in each candidate cluster_hard number. DO NOT recommend over 15 (default: 5)
	--FP_PRIOR FP_PRIOR   		Prior of false positive (FP). Recommendation : <= 0.1. (default : 0.01)
	--TN_PRIOR TN_PRIOR   		Prior of true negative (TN). Recommendation : > 0.99. (default : 0.99)
	--KMEANS_CLUSTERNO		Number of initial K-means cluster. Recommendation : 5~8 for one-sample, 8-15 for larger-sample (default: 8)
	--MIN_CLUSTER_SIZE		The minimum cluster size that is acceptable. Recommendation : 1-3% of total variants number 	(default: 9)

Other options
	--MODE			Selection of clustering method. "Hard": hard clustering only,  "Both": both hard and soft (fuzzy) clustering (default: "Both")
	--MAKEONE_STRICT  	1: strict, 2: lenient, 3: most lenient (default : 1)
	--SCORING		True : comparing with the answer set, False : just visualization (default: False)
	

Miscelleneous
	--FONT_FAMILY		Font family that displayed in the plots (default : "arial")
	--VISUALIZATION		Whether produce image in every E-M step (default: True)
	--IMAGE_FORMAT		Image format that displayed in the plots (default : jpg)
	--VERBOSE		0: no record,  1: simplified record,  2: verbose record (default: 2)

output

${CLEMENT_DIR}"/result"

  • CLEMENT_decision CLEMENT's best recommendation among hard and soft clustering.
  • CLEMENT_hard_1st CLEMENT's best decomposition by hard clustering.
  • CLEMENT_hard.gapstatistics.txt Selecting the optimal K in hard clustering based on gap* stastics.
  • CLEMENT_soft_1st CLEMENT's best decomposition by soft (fuzzy) clustering.
  • membership.txt Membership assignment of all variants to each clusters.
  • membership_count.txt Count matrix of the membership assignment to each clusters.
  • mixture.txt Centroid of each clusters

Example

DIR=[YOUR_DIRECTORY]

# Example 1
CLEMENT \
	--INPUT_TSV ${DIR}"/example/1.SimData/SimData_1D/n500_125x/lump/0.0/clone_4/1/1.txt" \
	--CLEMENT_DIR ${DIR}"/example/1.SimData/SimData_1D/n500_125x/lump/0.0/clone_4/1" \
  	--NUM_CLONE_TRIAL_START 1 \
	--NUM_CLONE_TRIAL_END 5 

# Example 2
CLEMENT \
	--INPUT_TSV ${DIR}"/example/2.CellData/MRS_2D/M1-8_M2-4/M1-8_M2-4_input.txt" \
	--CLEMENT_DIR ${DIR}"/example/2.CellData/MRS_2D/M1-8_M2-4"  \
	--NUM_CLONE_TRIAL_START 2 \
	--NUM_CLONE_TRIAL_END 6 \
	--RANDOM_PICK 500

example1 example2

Contact

goldpm1@yuhs.ac

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

CLEMENTDNA-1.0.12.tar.gz (27.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

CLEMENTDNA-1.0.12-py3-none-any.whl (170.0 kB view details)

Uploaded Python 3

File details

Details for the file CLEMENTDNA-1.0.12.tar.gz.

File metadata

  • Download URL: CLEMENTDNA-1.0.12.tar.gz
  • Upload date:
  • Size: 27.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.8

File hashes

Hashes for CLEMENTDNA-1.0.12.tar.gz
Algorithm Hash digest
SHA256 4f48977805c6c681519ac60e178c709756b6d0d218b1a9559cc54a843c701e99
MD5 b4ec038148b25409d000701994984c01
BLAKE2b-256 48365aba1dd40aa5428dfe274a7b66b8822edc6846f9f885f77ed5a91f1a1a57

See more details on using hashes here.

File details

Details for the file CLEMENTDNA-1.0.12-py3-none-any.whl.

File metadata

  • Download URL: CLEMENTDNA-1.0.12-py3-none-any.whl
  • Upload date:
  • Size: 170.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.8

File hashes

Hashes for CLEMENTDNA-1.0.12-py3-none-any.whl
Algorithm Hash digest
SHA256 d6a8825864bee8553160c171408997278a8bbab4dbfe61631cf9fc33f11eb0ff
MD5 bebde17f040f65930d1ac2f1b520926b
BLAKE2b-256 ecad1009fa99565a3444459a76454fe2897ab8f46a53e536bdfb9a7703bc5121

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page