Skip to main content

ChimeraLM: A genomic lanuage model to identify chimera artifact introduced by whole genome amplification (WGA).

Project description

logo ChimeraLM social

ChimeraLM

A genomic language model to identify chimera artifacts introduced by whole genome amplification (WGA).

Overview

ChimeraLM is a deep learning model designed to detect artificial chimeric reads that arise during whole genome amplification processes.

Installation

Install from PyPI

pip install chimeralm

Install from Source

# Clone the repository
git clone https://github.com/ylab-hi/ChimeraLM.git
cd ChimeraLM

# Install in development mode with uv
uv sync

uv run chimeralm --version

CLI Usage

ChimeraLM provides a Python CLI with two main commands for chimeric read detection and filtering.

Command Structure

chimeralm [OPTIONS] COMMAND [ARGS]...

Available Commands

predict - Detect Chimeric Reads

Predict chimeric reads in a BAM file using the pre-trained ChimeraLM model.

chimeralm predict [OPTIONS] DATA_PATH

Arguments:

  • DATA_PATH: Path to the input BAM file

Options:

  • -g, --gpus INTEGER: Number of GPUs to use (default: 0)
  • -o, --output PATH: Output path for predictions (default: {input}.predictions)
  • -b, --batch-size INTEGER: Batch size for processing (default: 12)
  • -w, --workers INTEGER: Number of worker threads (default: 0)
  • -v, --verbose: Enable verbose output
  • -m, --max-sample INTEGER: Maximum number of samples to process
  • -l, --limit-batches INTEGER: Limit prediction batches
  • -p, --progress-bar: Show progress bar
  • --random-seed: Make prediction non-deterministic

Examples:

# Basic prediction on CPU
chimeralm predict input.bam

# Prediction with GPU acceleration
chimeralm predict input.bam --gpus 1 --batch-size 24

# Prediction with custom output path and progress bar
chimeralm predict input.bam --output results/ --progress-bar --verbose

Performance Tips

  1. GPU Usage: Use --gpus 1 for faster processing if CUDA is available
  2. Batch Size: Increase --batch-size for better GPU utilization (e.g., 24-32)
  3. Memory: Monitor memory usage with large batch sizes
  4. Threading: Adjust --workers based on your system's CPU cores

Output Files

The predict command generates:

  • Prediction results in the specified output directory
  • Filtered and sorted BAM file with index (automatically created)

Troubleshooting

Common Issues:

  1. CUDA out of memory: Reduce --batch-size or use CPU mode
  2. Slow processing: Enable GPU acceleration with --gpus 1
  3. Missing dependencies: Run uv sync to install all dependencies

Debug Mode: Use --verbose flag to get detailed logging information about the prediction process.

Version Information

chimeralm --version

Getting Help

# General help
chimeralm --help

# Command-specific help
chimeralm predict --help

Citation

If you use ChimeraLM in your research, please cite:

@software{chimeralm2025,
  title={ChimeraLM: A genomic language model to identify chimera artifacts},
  author={Li, Yangyang, Guo, Qingxiang and Yang, Rendong},
  year={2025},
  url={https://github.com/ylab-hi/ChimeraLM}
}

License

This project is licensed under the Apache License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chimeralm-1.0.0.tar.gz (12.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chimeralm-1.0.0-py3-none-any.whl (57.2 kB view details)

Uploaded Python 3

File details

Details for the file chimeralm-1.0.0.tar.gz.

File metadata

  • Download URL: chimeralm-1.0.0.tar.gz
  • Upload date:
  • Size: 12.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.23

File hashes

Hashes for chimeralm-1.0.0.tar.gz
Algorithm Hash digest
SHA256 223aa04cb9fe1f462ed4ce3ca2999732b8c40b71cd78743ae381bdf5145d94ef
MD5 e47a2308225e2396640b374343844de4
BLAKE2b-256 a6c4918b9425625d2cc978b3922d8ffadea4cb5aa4062a23d56c0d0b39f075d1

See more details on using hashes here.

File details

Details for the file chimeralm-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: chimeralm-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 57.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.23

File hashes

Hashes for chimeralm-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 47a803f16131954afa5bd4474f956fba54ad7e0cbc818aec43e068852063bf23
MD5 5eccc52ac37e2e8d40720600cbc5a3c8
BLAKE2b-256 c2d114245d6381d660a5d144a6f587f14a894068733dd1a5bf87f1c9cef35cca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page