ChimeraLM: A genomic lanuage model to identify chimera artifact introduced by whole genome amplification (WGA).
Project description
ChimeraLM 
ChimeraLM
A genomic language model to identify chimera artifacts introduced by whole genome amplification (WGA).
Overview
ChimeraLM is a deep learning model designed to detect artificial chimeric reads that arise during whole genome amplification processes.
Installation
Install from PyPI
pip install chimeralm
Install from Source
# Clone the repository
git clone https://github.com/ylab-hi/ChimeraLM.git
cd ChimeraLM
# Install in development mode with uv
uv sync
uv run chimeralm --version
CLI Usage
ChimeraLM provides a Python CLI with two main commands for chimeric read detection and filtering.
Command Structure
chimeralm [OPTIONS] COMMAND [ARGS]...
Available Commands
predict - Detect Chimeric Reads
Predict chimeric reads in a BAM file using the pre-trained ChimeraLM model.
chimeralm predict [OPTIONS] DATA_PATH
Arguments:
DATA_PATH: Path to the input BAM file
Options:
-g, --gpus INTEGER: Number of GPUs to use (default: 0)-o, --output PATH: Output path for predictions (default:{input}.predictions)-b, --batch-size INTEGER: Batch size for processing (default: 12)-w, --workers INTEGER: Number of worker threads (default: 0)-v, --verbose: Enable verbose output-m, --max-sample INTEGER: Maximum number of samples to process-l, --limit-batches INTEGER: Limit prediction batches-p, --progress-bar: Show progress bar--random-seed: Make prediction non-deterministic
Examples:
# Basic prediction on CPU
chimeralm predict input.bam
# Prediction with GPU acceleration
chimeralm predict input.bam --gpus 1 --batch-size 24
# Prediction with custom output path and progress bar
chimeralm predict input.bam --output results/ --progress-bar --verbose
Performance Tips
- GPU Usage: Use
--gpus 1for faster processing if CUDA is available - Batch Size: Increase
--batch-sizefor better GPU utilization (e.g., 24-32) - Memory: Monitor memory usage with large batch sizes
- Threading: Adjust
--workersbased on your system's CPU cores
Output Files
The predict command generates:
- Prediction results in the specified output directory
- Filtered and sorted BAM file with index (automatically created)
Troubleshooting
Common Issues:
- CUDA out of memory: Reduce
--batch-sizeor use CPU mode - Slow processing: Enable GPU acceleration with
--gpus 1 - Missing dependencies: Run
uv syncto install all dependencies
Debug Mode:
Use --verbose flag to get detailed logging information about the prediction process.
Version Information
chimeralm --version
Getting Help
# General help
chimeralm --help
# Command-specific help
chimeralm predict --help
Citation
If you use ChimeraLM in your research, please cite:
@software{chimeralm2025,
title={ChimeraLM: A genomic language model to identify chimera artifacts},
author={Li, Yangyang, Guo, Qingxiang and Yang, Rendong},
year={2025},
url={https://github.com/ylab-hi/ChimeraLM}
}
License
This project is licensed under the Apache License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chimeralm-1.0.0.tar.gz.
File metadata
- Download URL: chimeralm-1.0.0.tar.gz
- Upload date:
- Size: 12.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
223aa04cb9fe1f462ed4ce3ca2999732b8c40b71cd78743ae381bdf5145d94ef
|
|
| MD5 |
e47a2308225e2396640b374343844de4
|
|
| BLAKE2b-256 |
a6c4918b9425625d2cc978b3922d8ffadea4cb5aa4062a23d56c0d0b39f075d1
|
File details
Details for the file chimeralm-1.0.0-py3-none-any.whl.
File metadata
- Download URL: chimeralm-1.0.0-py3-none-any.whl
- Upload date:
- Size: 57.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
47a803f16131954afa5bd4474f956fba54ad7e0cbc818aec43e068852063bf23
|
|
| MD5 |
5eccc52ac37e2e8d40720600cbc5a3c8
|
|
| BLAKE2b-256 |
c2d114245d6381d660a5d144a6f587f14a894068733dd1a5bf87f1c9cef35cca
|