EMD-guided masked autoencoder for chromatin interaction map restoration
Project description
EMMA
EMMA is an EMD-guided restoration toolkit for chromatin interaction maps. It restores complete or low-quality genomic-bin regions in Hi-C, Pore-C, and contact-like 3D genome matrices by combining distance-diagonal signal decomposition, masked IMF autoencoder correction, and mode-weighted reconstruction.
What EMMA Does
EMMA supports three common workflows:
- Restore known missing regions from a user-provided mask.
- Automatically detect low-coverage or missing genomic bins, then restore them.
- Reconstruct or lightly enhance a contact matrix without an explicit missing mask.
The restore mode keeps observed entries unchanged and only replaces entries marked by the imputation mask.
Installation
Option A. Install From PyPI
After the package is released to PyPI:
pip install emma-3dgenome
Option B. Install Directly From GitHub
pip install "git+https://github.com/ydduanran/EMMA.git"
Option C. Install From Local Source
Clone the repository and install it locally:
git clone https://github.com/ydduanran/EMMA.git
cd EMMA
pip install .
For editable development:
pip install -e ".[dev]"
If your machine has no internet access but already has the required build tools installed, use:
pip install -e . --no-build-isolation
Core dependencies include numpy, scipy, torch, cooler, EMD-signal, scikit-learn, and scikit-image.
Quick Start
1. Restore With A BED Missing-Region File
Use this when you know which genomic intervals need imputation.
emma restore sample.mcool \
--resolution 10000 \
--chrom chr2 \
--mask-regions missing_regions.bed \
--output emma_out/
missing_regions.bed should use:
chrom start end
Example:
chr2 2000000 2250000
chr2 7600000 7900000
2. Restore With A Boolean Matrix Mask
Use this when you already have a square boolean mask where True marks entries to impute.
emma restore sample.npy \
--mask mask.npy \
--output emma_out/
3. Auto-Detect Missing Bins And Restore
Use this when missing or low-coverage bins are not known in advance.
emma restore sample.mcool \
--resolution 10000 \
--chrom chr2 \
--auto-mask \
--auto-mask-mode balanced \
--output emma_auto_out/
Available auto-mask modes:
conservativebalancedaggressive
You can exclude assembly gaps, centromeres, telomeres, blacklist regions, or other regions that should not be imputed:
emma restore sample.mcool \
--resolution 10000 \
--chrom chr2 \
--auto-mask \
--exclude-bed hg38_exclude_regions.bed \
--output emma_auto_out/
4. Reconstruct Without Explicit Missing Regions
Use this for conservative EMMA-style matrix reconstruction or enhancement.
emma reconstruct sample.mcool \
--resolution 10000 \
--chrom chr2 \
--mode conservative \
--blend 0.2 \
--output reconstructed_out/
Modes:
conservative: lightly blends the reconstruction with the original matrix.full: uses the reconstructed matrix directly.
Input Formats
EMMA currently supports:
.cool.mcool.npy.npz
Rules:
.mcoolrequires--resolution..cooland.mcoolrequire--chrom..npyshould contain a square contact matrix..npzreads keymatrixif present; otherwise it reads the first array. Use--keyto choose a specific array.
Output Files
emma restore writes:
restored.npy
prediction_only.npy
masked_input.npy
mask.npy
mask_regions.bed
config.json
report.json
diag_stats.json
log.txt
emma detect writes:
mask.npy
detected_missing_bins.tsv
detected_missing_regions.bed
excluded_bins.tsv
auto_mask_diagnostics.tsv
report.json
emma reconstruct writes:
reconstructed.npy
difference.npy
config.json
report.json
diag_stats.json
log.txt
Python API
from emma_3dgenome import EmmaRestorer
from emma_3dgenome.io import load_contact_matrix
from emma_3dgenome.masks import load_mask_regions
matrix = load_contact_matrix(
"sample.mcool",
chrom="chr2",
resolution=10000,
)
mask_info = load_mask_regions(
"missing_regions.bed",
chrom="chr2",
resolution=10000,
n_bins=matrix.shape[0],
)
restorer = EmmaRestorer(preset="default", device="cuda:0")
result = restorer.restore(
matrix,
mask=mask_info.mask,
regions=mask_info.regions,
)
result.save("emma_out")
Auto-mask restoration:
from emma_3dgenome import EmmaRestorer
from emma_3dgenome.io import load_contact_matrix
matrix = load_contact_matrix("sample.mcool", chrom="chr2", resolution=10000)
restorer = EmmaRestorer(preset="default", device="cuda:0")
result = restorer.restore_auto(
matrix,
chrom="chr2",
resolution=10000,
auto_mask_mode="balanced",
)
result.save("emma_auto_out")
Matrix reconstruction:
result = restorer.reconstruct(matrix, mode="conservative", blend=0.2)
result.save("reconstructed_out")
IMF Parameters
Default mode-weighted reconstruction parameters:
max_imfs = 5
imf_weights = 0.08 1.35 1.20 1.90 0.80
residual_weight = 1.0
diag_calib_strength = 0.20
Interpretation:
- IMF1: high-frequency noise component, strongly down-weighted.
- IMF2: local structure component, enhanced.
- IMF3: intermediate-scale structure component, enhanced.
- IMF4: domain or boundary-related structure component, strongly enhanced.
- IMF5: low-frequency structure component, slightly retained.
- Residual: global trend, retained.
Override from CLI:
emma restore sample.mcool \
--resolution 10000 \
--chrom chr2 \
--mask-regions missing_regions.bed \
--max-imfs 5 \
--imf-weights 0.08 1.35 1.20 1.90 0.80 \
--residual-weight 1.0 \
--diag-calib-strength 0.20 \
--output emma_out/
Presets
Available presets:
defaultpapersmoothsharpconservativefast
Use fast for small smoke tests. Use default or paper for standard restoration.
Minimal Test
python -m compileall -q src tests
python -m pytest -q tests
If pytest is not installed:
pip install -e ".[dev]"
Citation
If you use EMMA in your work, please cite the EMMA manuscript when it becomes available.
@article{emma2026,
title = {EMMA: EMD-guided masked autoencoder restoration of chromatin interaction maps},
author = {To be updated},
journal = {To be updated},
year = {2026}
}
License
This project is released under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file emma_3dgenome-0.1.0.tar.gz.
File metadata
- Download URL: emma_3dgenome-0.1.0.tar.gz
- Upload date:
- Size: 37.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd40f63297361851fc1e9dd69cc39d7ca1fc3243aceb336ea6c547c426cee27f
|
|
| MD5 |
3dccfbb351a0c5698f8bf654f3c16a89
|
|
| BLAKE2b-256 |
c62b0f1fb970504647b7476b717f347a187a8cea3857fd9e8cf126ac20540076
|
File details
Details for the file emma_3dgenome-0.1.0-py3-none-any.whl.
File metadata
- Download URL: emma_3dgenome-0.1.0-py3-none-any.whl
- Upload date:
- Size: 40.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
926a15bc98049c4be22d50a1644e697395f847c0fb72ca0295ec1646b411d331
|
|
| MD5 |
cc8a4971e92f6fb65c096e73a6f6d1e4
|
|
| BLAKE2b-256 |
fb8984e9f025ead59661b7969769a2cab6aa77fba47a31e7af270b7fe01e4559
|