BETA: Binding and Expression Target Analysis - Integrative analysis of ChIP-seq and RNA-seq data
Project description
BETA2: Binding and Expression Target Analysis
BETA is a computational tool for integrative analysis of ChIP-seq and RNA-seq/microarray data to predict transcription factor (TF) direct target genes and identify whether the TF primarily functions as a transcriptional activator or repressor.
Overview
BETA integrates ChIP-seq binding data with differential gene expression data to:
-
Predict direct target genes by combining:
- Regulatory potential scores based on distance to transcription start sites (TSS)
- Binding peak signals
- Gene expression changes
-
Infer TF function as activator or repressor through:
- Regulatory potential analysis
- Gene set enrichment analysis
-
Identify enriched motifs in target regions (optional)
Key Features
- Integrative Analysis: Combines ChIP-seq peaks with gene expression data
- Regulatory Potential Scoring: Distance-weighted scoring system
- Statistical Assessment: Kolmogorov-Smirnov test and permutation-based FDR
- Motif Analysis: Optional motif scanning and enrichment analysis
- Multiple Input Formats: Supports LIMMA, Cuffdiff, and custom formats
- Genome Support: Human (hg38, hg19, hg18) and Mouse (mm10, mm9)
Installation
Requirements
- Python 3.8 or higher
- C compiler (gcc) for motif scanning module
From Source
git clone https://github.com/yourusername/BETA2.git
cd BETA2
pip install -e .
Using pip (when available)
pip install beta-binding-analysis
Quick Start
Basic Analysis
Predict TF target genes and function (activator/repressor):
beta basic \
-p peaks.bed \
-e diff_expr.txt \
-k LIM \
-g hg38 \
-n my_experiment \
-o output_dir
Plus Mode (with Motif Analysis)
Include motif analysis:
beta plus \
-p peaks.bed \
-e diff_expr.txt \
-k LIM \
-g hg38 \
--gs hg38.fa \
-n my_experiment \
-o output_dir
Minus Mode (Peaks Only)
Analyze binding data without expression data:
beta minus \
-p peaks.bed \
-g hg38 \
-n my_experiment \
-o output_dir
Input Files
ChIP-seq Peaks (required)
BED format file (3 or 5 columns):
chr1 1000 2000
chr1 5000 6000 peak1 100
Differential Expression (required for basic/plus modes)
Supported formats:
- LIMMA (
-k LIM): Standard LIMMA output - Cuffdiff (
-k CUF): Cuffdiff gene_exp.diff format - BETA Standard Format (
-k BSF):GeneSymbol log2FoldChange FDR TP53 2.5 0.001 MYC -1.8 0.01 - Other (
-k O): Specify columns with--info
Genome Sequence (for plus mode)
FASTA format genome sequence file (required for motif analysis)
Output Files
Basic Mode
{name}_targets.txt: Predicted target genes with statistics{name}_uptarget.txt: Up-regulated targets{name}_downtarget.txt: Down-regulated targets{name}_function.pdf: TF function prediction plot
Plus Mode
Additional files:
{name}_motif.html: Motif enrichment results{name}_motif_logo/: Motif logos- Motif scanning results
Algorithm
Regulatory Potential Score
For each gene, BETA calculates a regulatory potential score based on nearby binding peaks:
Score = Σ exp(-0.5 - 4 × distance/max_distance)
Where distance is from peak center to TSS.
Target Prediction
- Rank genes by regulatory potential
- Rank genes by differential expression
- Combine rankings using Kolmogorov-Smirnov test
- Calculate FDR through permutation testing
Function Prediction
Assess enrichment of up-regulated vs down-regulated genes among predicted targets using one-sided KS test.
Command-line Options
Common Options
| Option | Description | Default |
|---|---|---|
-p, --peakfile |
ChIP-seq peaks (BED format) | Required |
-g, --genome |
Genome assembly (hg38/hg19/hg18/mm10/mm9) | Required |
-n, --name |
Output prefix | "NA" |
-o, --output |
Output directory | Current directory |
-d, --distance |
Distance from TSS (bp) | 100000 |
--pn |
Number of peaks to consider | 10000 |
Expression Options
| Option | Description |
|---|---|
-e, --diff_expr |
Differential expression file |
-k, --kind |
Expression file format (LIM/CUF/BSF/O) |
--info |
Column specification (for -k O) |
--df |
FDR threshold |
--da |
Top genes to consider (fraction or number) |
Advanced Options
| Option | Description |
|---|---|
--method |
Scoring method (score/distance) |
-c, --cutoff |
P-value cutoff for targets |
--bl |
Use CTCF boundary filtering |
--gname2 |
Gene IDs are gene symbols |
Examples
Example 1: Basic TF Analysis (hg38)
beta basic \
-p ERalpha_peaks.bed \
-e ERalpha_treatment_vs_control.txt \
-k LIM \
-g hg38 \
-n ERalpha \
-d 100000 \
-c 0.001
Example 2: With Custom Expression Format
beta basic \
-p TF_peaks.bed \
-e expression.txt \
-k O \
--info 1,3,7 \
-g hg38 \
-n TF_experiment
(Column 1: gene ID, Column 3: log2FC, Column 7: FDR)
Example 3: Mouse Analysis with Motif Scanning
beta plus \
-p mm10_peaks.bed \
-e mm10_expression.txt \
-k CUF \
-g mm10 \
--gs mm10.fa \
-n mouse_TF \
--mn 20
Migration from BETA 1.x
This is a modernized Python 3 version of BETA. Key changes:
- Python 3.8+ required (was Python 2.6/2.7)
- Default genome: hg38 (was hg19)
- Improved performance: Optimized algorithms
- Better logging: Structured logging instead of print statements
- Modern packaging: Uses pyproject.toml and pip installable
Command Compatibility
All BETA 1.x commands should work with BETA 2.0 without changes. However, you may need to update:
- Genome references to hg38
- Python environment to 3.8+
Reference Data
BETA includes reference gene annotations for:
- Human: hg38 (default), hg19, hg18
- Mouse: mm10, mm9
Default CTCF boundary data included for hg19 and mm9.
Custom Genomes
For other genome assemblies, provide your own reference:
beta basic \
-p peaks.bed \
-e expression.txt \
-k LIM \
-r custom_refseq.txt \
-n experiment
RefSeq format: tab-delimited with columns:
bin name chrom strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds score name2 ...
Citation
If you use BETA in your research, please cite:
Wang S, Sun H, Ma J, et al. Target analysis by integration of transcriptome and ChIP-seq data with BETA. Nature Protocols. 2013;8(12):2502-2515. doi:10.1038/nprot.2013.150
License
BETA is distributed under the Artistic License 2.0.
Support
- Documentation: http://cistrome.org/BETA/tutorial.html
- Issues: https://github.com/yourusername/BETA2/issues
- Original paper: https://doi.org/10.1038/nprot.2013.150
Authors
- Original Author: Su Wang (wangsu0623atgmail.com)
- Python 3 Port: Tommy Tang (tangming2005atgmail.com)
Changelog
Version 2.0.0 (2025)
- Python 3.8+ support (dropped Python 2)
- Modern project structure and packaging
- Default genome changed to hg38
- Improved code quality and type hints
- Enhanced logging and error handling
- Updated dependencies
- Performance optimizations
Version 1.0.7 (2015)
- Original Python 2 version
- Basic, plus, and minus modes
- Support for hg18, hg19, mm9, mm10
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file beta_binding_analysis-2.0.0.tar.gz.
File metadata
- Download URL: beta_binding_analysis-2.0.0.tar.gz
- Upload date:
- Size: 40.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
23172413ee613689570ca2a743faf8f12331f2e8c5bfe7b5863301e8542ac1d8
|
|
| MD5 |
791c71041c28d415e5acf124a15d0ab2
|
|
| BLAKE2b-256 |
1581fbea22a586c0b67c8449d19ee548a990967402a03091ac1538769ce37204
|
File details
Details for the file beta_binding_analysis-2.0.0-py3-none-any.whl.
File metadata
- Download URL: beta_binding_analysis-2.0.0-py3-none-any.whl
- Upload date:
- Size: 40.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
544173788fa02e54cb4583f4a40f9a3646c4472b51be414ebe426786c8cb2ff5
|
|
| MD5 |
4d2e7a6219cd267685a3e89dbfc02109
|
|
| BLAKE2b-256 |
1e3d1c81e75f98e22af99bef2859d7a57815a9a150a29e1c0840e2d0c9d6b800
|