Skip to main content

post process *clip reads after alignment. Replacement for htseq-clip

Project description

Shoji

Shoji is a a flexible command-line toolset for the analysis of iCLIP and eCLIP sequencing data. It is designed as a replacement for htseq-clip, providing streamlined workflows for annotation parsing, crosslink site extraction, counting, and matrix generation.

Shoji

Features

  • Annotation Parsing: Extract and flatten features from GFF3 files to BED format.
  • Sliding Window Generation: Create sliding windows over genomic annotations for downstream analysis.
  • Crosslink Extraction: Extract crosslink sites from BAM files with flexible options for site type, mate, and filtering.
  • Counting: Count crosslink sites per window and output results in Apache Parquet format.
  • Matrix Creation: Aggregate counts across samples into R-friendly matrices (CSV or Parquet).
  • Tabix Conversion: Convert BED files to bgzipped, tabix-indexed format for efficient querying.

Major differences to htseq-clip

  • No --splitExons flag, Shoji cannot split exons into components
  • New --split-intron flag. If an intron overlaps exon from another gene, using this tag will split the intron into non overlapping chunks
  • Piping output disabled. Output file names MUST be specified
  • count function output is only available in Apache parquet format
  • createMatrix by default do not write duplicate windows. If adjacent overlapping windows have same crosslink counts across all samples, this function now writes only the most 5' (relative to strand) window to output file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shoji-0.31.0.tar.gz (38.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shoji-0.31.0-py3-none-any.whl (46.3 kB view details)

Uploaded Python 3

File details

Details for the file shoji-0.31.0.tar.gz.

File metadata

  • Download URL: shoji-0.31.0.tar.gz
  • Upload date:
  • Size: 38.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.11.0-29-generic

File hashes

Hashes for shoji-0.31.0.tar.gz
Algorithm Hash digest
SHA256 89628872a9756553e80b38e17266c8217c6f15816d36f4cd4838199518af04c3
MD5 1d9c30183a934e80ddd92ef186bcca7b
BLAKE2b-256 409abb410b513501b8bc48debf6571ad8f2a0d5ff0046c0a6803a27166bb11f0

See more details on using hashes here.

File details

Details for the file shoji-0.31.0-py3-none-any.whl.

File metadata

  • Download URL: shoji-0.31.0-py3-none-any.whl
  • Upload date:
  • Size: 46.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.11.0-29-generic

File hashes

Hashes for shoji-0.31.0-py3-none-any.whl
Algorithm Hash digest
SHA256 628c4f15c373ac3c9554a708f01c899698a8d8ab4b1f6d9d274ec099dc55a766
MD5 470a71b24ec09de0bd1ed88ce5c66cfa
BLAKE2b-256 21f3a0b84745f10aee386518ad07be970236db71871054a73d2c59c81a215321

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page