Skip to main content

Yet Another SIMulator for Alternative Splicing Events and Realistic Gene Expression Profile

Project description

YASIM -- Yet Another SIMulator for Alternative Splicing Events and Realistic Gene Expression Profile

Markdown compatibility guide This file is written in Myst-flavored Markdown, and may show errors on the default landing page of PYPI or Git Hostings. You can correctly preview it on generated Sphinx documentation or Visual Studio Code with ExecutableBookProject.myst-highlight plugin.


Badages: Python version PyPI - Version GitHub Release GitHub Downloads GitHub Contributors Code style: black License Documentation

URLs: PYPI, GitHub, Documentation.

With the development of Third-Generation Sequencing (TGS) and related technologies, accurate quantification of transcripts in the isoform level with precise detection of novel isoforms from Alternative Splicing (AS) events or relocation of Transposable Elements (TEs) had become possible. YASIM is the tool that simulates Next- or Third-Generation bulk RNA-Sequencing raw FASTQ reads with ground truth genome annotation and realistic gene expression profile (GEP). It can be used to benchmark tools that are claimed to be able to detect isoforms (e.g., StringTie) or quantify reads on an isoform level (e.g., featureCounts).

YASIM serves different simulation purposes. For example, it can be used to simulate a count matrix from reference genome annotation or to simulate raw FASTQ reads from a user-provided count matrix. When combined with other tools, the user can also simulate reads from the genome with Single Nucleotide Polymorphism (SNP), Insertions & Deletions (InDels), Structural Variations (SVs), and other genomic variations.

YASIM cannot generate machine noises for each sequencer, and third-party DNA- or RNA-Seq simulators (Referred to as Low-Level Read generators, LLRGs) are needed to convert cDNA sequences to reads with machine errors and quality information (Except PacBio Sequel/Sequel II model). This gives YASIM extreme flexibility over sequencer models. Till now, YASIM can simulate most Illumina NGS sequencers and most PacBio/ONT TGS sequencing platforms.

YASIM is designed to be modularized, as some of the modules are general-purpose and can be used in other simulation tasks. Implemented in Python 3, YASIM follows Object-Oriented Programming (OOP) styles and can be easily extended. Theoretically, YASIM can run on any platform that supports Python3. However, most LLRG are POSIX-only (i.e., work on GNU/Linux, MacOS, and friends). So it is recommended to deploy this tool inside major GNU/Linux distributions like Ubuntu, Debian, CentOS, Fedora, etc. Using YASIM on Microsoft Windows Subsystem of Linux (WSL), version 1 or 2, is NOT recommended -- It would lead to impaired performance and may cause other problems due to LLRG incompatibilities. Using YASIM on other platforms (e.g., Oracle Solaris) is neither tested nor recommended.

Installation

Using the Pre-Built Version from PYPI

You need a working Python interpreter (CPython implementation) >= 3.7 (recommended 3.8) and the latest pip to install this software from PYPI. Command:

pip install yasim==3.2.1

You are recommended to use this application inside a virtual environment like venv, virtualenv, pipenv, conda, or poetry.

Build from Source

Before building from the source, get a copy of the latest source code from https://github.com/WanluLiuLab/yasim using Git:

git clone https://github.com/WanluLiuLab/yasim

Or, if you prefer to use GNU Wget.

wget -o yasim-master.zip https://github.com/WanluLiuLab/yasim/archive/refs/heads/master.zip
unzip yasim-master.zip

You need Python interpreter (CPython implementation) >= 3.7, latest PYPA build, and setuptools to build this software. You are recommended to build the software in a virtual environment provided by virtualenv, etc.

Build and install the simulator using:

cd yasim
python3 -m build
pip install dist/yasim-3.2.1-py3-none-any.whl

Installation of Third-Party Programs

For TGS LLRGs:

  • PBSIM, which simulates PacBio RS C1 and C2 chemistries, with CCS support.
  • PBSIM2, which simulates PacBio RS II P4C2, P5C3, and P6C4 chemistry, CLR only; ONT R9.4, R9.5, and R10.3 pore.
  • PBSIM3, which simulates PacBio RS II and Sequel model, with CCS support.
  • BadRead, which simulates arbitrary PacBio and ONT models.

For NGS LLRGs:

  • ART, which simulates various Illumina NGS platforms. Namely, GenomeAnalyzer I, GenomeAnalyzer II, HiSeq 1000, HiSeq 2000, HiSeq 2500, HiSeqX PCR free, HiSeqX TruSeq, MiniSeq TruSeq, MiSeq v3, NextSeq500 v2.
  • DWGSIM, which simulates arbitrary Illumina models.

You may refer to the LLRG tutorial for detailed guidance on the utilization of these pieces of software.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yasim-3.2.1.tar.gz (769.6 kB view details)

Uploaded Source

Built Distribution

yasim-3.2.1-py3-none-any.whl (819.1 kB view details)

Uploaded Python 3

File details

Details for the file yasim-3.2.1.tar.gz.

File metadata

  • Download URL: yasim-3.2.1.tar.gz
  • Upload date:
  • Size: 769.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.19

File hashes

Hashes for yasim-3.2.1.tar.gz
Algorithm Hash digest
SHA256 968e33bfe947233834929e8a11a67d4cee9cbd1d2064616d383b58d928048508
MD5 2c2dc743a9c446b669b964042874fc0f
BLAKE2b-256 e641312841624e94841acbafe1a6deaf78112c28ec5984f7c5c20ec76a8a3e93

See more details on using hashes here.

File details

Details for the file yasim-3.2.1-py3-none-any.whl.

File metadata

  • Download URL: yasim-3.2.1-py3-none-any.whl
  • Upload date:
  • Size: 819.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.19

File hashes

Hashes for yasim-3.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ba7bf4f4e489a5f6274a42e0b07ed6acdd3e8425d65f8384a01a344479bad16d
MD5 845527cf63d11f3dde9f7ae70ae12440
BLAKE2b-256 a656d5ff284ecca2ca431c6e97a37a99a389ede5e3d1b9211aed72c1caf57dfe

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page