ENCODEfetch: a command-line tool for retrieving matched case-control data and standardized metadata from ENCODE
Project description
ENCODEfetch 
ENCODEfetch is a command-line tool and Python package for retrieving matched case–control datasets and standardized metadata from the ENCODE Project.
ENCODEfetch automates:
- Search for ENCODE experiments by assay, target, organism, status, and more
- Get case-control matched experiments
- File retrieval (FASTQ, BAM, BED, bigWig, etc.) with filtering by status/assembly
- Parallel downloads files with resumable transfers and interactive progress bars.
- Standardized metadata outputs (
manifest.tsv,metadata.jsonl). - Plug-and-play samplesheets for nf-core and Snakemake workflows for reproduciable analysis.
- Interactive API returning tidy
pandas.DataFrameobjects of metadata with file paths for downstream analysis.
🚀 Installation
From PyPI (recommended)
pip install encodefetch
From source
git clone https://github.com/khan-lab/ENCODEfetch.git
cd ENCODEfetch
pip install -e .
Requires Python 3.9+. or newer versions.
🔧 Command-line usage
encodefetch --assay-title "TF ChIP-seq" \
--target-label BRD4,SMAD3 \
--organism "Homo sapiens" \
--file-type fastq \
--status released \
--progress \
--download \
--threads 8 \
--nfcore
Key options
--accessions ENCSR123ABC,ENCSR456DEF— fetch experiments by accession directly.--accessions accessions.txt— read one experiment accession per line; blank lines and#comments are ignored.--assay-title— e.g.TF ChIP-seq,Histone ChIP-seq,ATAC-seq,RNA-seq.--target-label— one or more targets (comma-separated).--organism— e.g.Homo sapiens,Mus musculus.--file-type— restrict formats (fastq,bam,bed,bigWig…).--status— defaultreleased(can also includearchived).--perturbed true|false— filter perturbed experiments.--download— actually download matched files.--threads— number of worker threads for metadata fetching, control fetching, and downloads.--max-retries/--chunk-size— tune download retry count and streamed chunk size.--nfcore/--snakemake— export pipeline-ready sample sheets.--control-strategy all|pool|best— choose how samplesheets represent multiple controls.
Run encodefetch --help to see all options.
📦 Outputs
After a run, outdir/ contains:
manifest.tsv— tidy table of case/control files with metadata.metadata.jsonl— raw record dump (one JSON per line).files/— downloaded files, organized by experiment/control.nfcore_*_samplesheet.csv— optional nf-core samplesheet.snakemake_samples.tsv— optional Snakemake sample table.
ENCODEfetch preserves all experiment-level controls in matched_control_experiments. File-level controlled_by links are normalized into controlled_by_files when ENCODE provides them. Samplesheets prefer file-level control mappings, then fall back to experiment-level controls; --control-strategy all duplicates case rows per control, pool joins controls with semicolons, and best currently chooses the first control deterministically.
🐍 Python API
import encodefetch as ef
# Search experiments
metadata, recs = ef.search_experiments(
assay_title="TF ChIP-seq",
target_labels=["BRD4","SMAD3"],
organism="Homo sapiens",
file_types={"fastq"},
status="released",
progress=False,
threads=2,
)
metadata.head()
# Collapse paired-end FASTQs to one row
metadata_collapse = ef.collapse_fastq_pairs(metadata)
# Write nf-core samplesheet
ef.write_nfcore_sheet(metadata_collapse, "nfcore_chipseq.csv")
🧬 Assay support
ENCODEfetch currently provides assay-aware normalization and exporters to nf-core/snakemake samplesheets for:
- ChIP-seq (production)
- ATAC-seq (in beta)
- RNA-seq (in beta)
- more to be added ..
Each assay can plug in its own normalization (e.g., FASTQ collapsing, strandedness detection) and samplesheet exporters.
🤝 Contributing
Contributions are welcome!
- Add new assay classes under
encodefetch/assays/. - Add new exporters under
encodefetch/exporters/. - Extend metadata fields in
build_file_record.
See CONTRIBUTING.md for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file encodefetch-0.4.0.tar.gz.
File metadata
- Download URL: encodefetch-0.4.0.tar.gz
- Upload date:
- Size: 21.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc96921c6a651f0c9268afed464825e075950673a1d194ed884584ae92ad37b4
|
|
| MD5 |
9bff864932ecaf8a117611a1b9806630
|
|
| BLAKE2b-256 |
1884f9fe8af91072019cfa1bc35737d797a67b7ae9f7b11eea6343dcf45d7636
|
File details
Details for the file encodefetch-0.4.0-py3-none-any.whl.
File metadata
- Download URL: encodefetch-0.4.0-py3-none-any.whl
- Upload date:
- Size: 23.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1894150e127cdeb25ab3fca8bd76047bd9e7e5b53cf05d9cfdb0525cfeb28207
|
|
| MD5 |
644caacdbfd5926d5b09a0cc6e3a70bb
|
|
| BLAKE2b-256 |
cd06319ce6b2ed5082f5395fe17e8f846764722212e6c0970841ec97deebd925
|