MCP server for searching European Nucleotide Archive (ENA) datasets. Find RNA-seq studies, retrieve metadata, and discover related publications to validate research hypotheses.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

mkuehl

These details have not been verified by PyPI

Project links

Documentation

Project description

RNA Dataset Search - MCP Server

A Model Context Protocol (MCP) server for searching and accessing RNA sequencing datasets from the European Nucleotide Archive (ENA). Find publicly available bulk RNA-seq and single-cell RNA-seq datasets to validate research hypotheses or reproduce published analyses.

Optimized for: Human and mouse disease-related RNA-seq studies with support for bulk, single-cell, and spatial transcriptomics.

Features

Disease-Focused Search: Find datasets by disease, organism, and tissue type
Advanced Technology Filtering:
- Simple presets: bulk, single-cell, small-rna, ribo-seq, rna-all
- Granular control: Filter by 50+ library strategies (RNA-Seq, miRNA-Seq, ChIP-Seq, ATAC-seq, etc.)
- Source filtering: TRANSCRIPTOMIC, GENOMIC, METAGENOMIC, etc.
Common Organism Names: Use "human", "mouse", "rat" instead of scientific names
Download Support: Generate wget/curl scripts for downloading FASTQ files
Study Metadata: Retrieve comprehensive metadata including PubMed IDs
Publication Links: Discover datasets associated with PubMed publications
Flexible Queries: Build custom queries with multiple field conditions
Field Discovery: Explore available search and return fields
Environment Configuration: Customize API endpoints, timeouts, and logging via environment variables

Available Tools

The MCP server provides 10 specialized tools:

Search & Discovery

search_rna_studies - Unified search with preset filters or advanced library strategy/source filtering
list_library_types - List all 50+ available library strategies and sources
get_study_details - Get comprehensive metadata for a specific study (includes PubMed IDs)
find_studies_by_publication - Find studies associated with a PubMed ID
search_studies_by_keywords - Flexible keyword search across study titles

Download & Access

get_download_urls - Get FTP download URLs for all data files in a study
generate_download_script - Generate bash scripts (wget/curl) for downloading data

Advanced

get_available_fields - Discover searchable and returnable fields for different data types
get_result_types - List all available data types in ENA
build_custom_query - Construct advanced queries with multiple field conditions

Example Use Cases

Simple Searches (Preset Filters)

Find human cancer bulk RNA-seq datasets: disease="cancer"
Search for single-cell RNA-seq in mouse brain: organism="mouse", tissue="brain", technology="single-cell"
Find small RNA sequencing studies: technology="small-rna"
Ribosome profiling experiments: technology="ribo-seq"

Advanced Searches (Specific Library Types)

ChIP-Seq chromatin studies: library_strategies=["ChIP-Seq"]
ATAC-seq accessibility data: library_strategies=["ATAC-seq"]
Combined small RNA types: library_strategies=["miRNA-Seq", "ncRNA-Seq"]
Any single-cell data: library_sources=["TRANSCRIPTOMIC SINGLE CELL"]
Metagenomic RNA: library_sources=["METATRANSCRIPTOMIC"]

Workflow Examples

Download FASTQ files from a specific study
Discover datasets from a specific publication
Generate download scripts with MD5 verification
List all available sequencing technologies: list_library_types()

Getting started

Please refer to the documentation, in particular, the API documentation.

You can also find the project on BioContextAI, the community-hub for biomedical MCP servers: nucleotide_archive_mcp on BioContextAI.

Installation

You need to have Python 3.11 or newer installed on your system. If you don't have Python installed, we recommend installing uv.

There are several alternative options to install nucleotide_archive_mcp:

1. Use `uvx` to run it immediately

After publication to PyPI:

uvx nucleotide_archive_mcp

Or from a Git repository:

uvx git+https://github.com/biocontext-ai/nucleotide_archive_mcp.git@main

2. Include it in one of various clients that supports the `mcp.json` standard

If your MCP server is published to PyPI, use the following configuration:

{
  "mcpServers": {
    "nucleotide_archive_mcp": {
      "command": "uvx",
      "args": ["nucleotide_archive_mcp"]
    }
  }
}

In case the MCP server is not yet published to PyPI, use this configuration:

{
  "mcpServers": {
    "nucleotide_archive_mcp": {
      "command": "uvx",
      "args": ["git+https://github.com/biocontext-ai/nucleotide_archive_mcp.git@main"]
    }
  }
}

For purely local development (e.g., in Cursor or VS Code), use the following configuration:

{
  "mcpServers": {
    "nucleotide_archive_mcp": {
      "command": "uvx",
      "args": [
        "--refresh",
        "--from",
        "path/to/repository",
        "nucleotide_archive_mcp"
      ]
    }
  }
}

If you want to reuse and existing environment for local development, use the following configuration:

{
  "mcpServers": {
    "nucleotide_archive_mcp": {
      "command": "uv",
      "args": ["run", "--directory", "path/to/repository", "nucleotide_archive_mcp"]
    }
  }
}

3. Install it through `pip`:

pip install --user nucleotide_archive_mcp

4. Install the latest development version:

pip install git+https://github.com/biocontext-ai/nucleotide_archive_mcp.git@main

Configuration

The server can be configured via environment variables. Copy .env.example to .env and customize:

# ENA API Configuration
ENA_PORTAL_API_BASE=https://www.ebi.ac.uk/ena/portal/api  # Override API base URL
ENA_BROWSER_API_BASE=https://www.ebi.ac.uk/ena/browser/api
ENA_TIMEOUT=30.0                # Request timeout in seconds
ENA_SEARCH_LIMIT=20            # Default search result limit
ENA_MAX_RPS=10.0               # Rate limiting (requests per second)

# Logging
LOG_LEVEL=INFO                 # DEBUG, INFO, WARNING, ERROR, CRITICAL

These settings allow you to:

Use custom or mirror ENA API endpoints
Adjust timeouts for slow connections
Control default result limits
Configure rate limiting for large batch operations
Set logging verbosity for debugging

Data Citation and Attribution

When using data from ENA in publications, please cite the data appropriately:

How to Cite ENA Data

The top-level Project accession should be cited along with a link to the data in the ENA browser:

"The data for this study have been deposited in the European Nucleotide Archive (ENA) at EMBL-EBI under accession number PRJEBxxxx (https://www.ebi.ac.uk/ena/browser/view/PRJEBxxxx)."

Replace PRJEBxxxx with the actual study accession number from your search results.

Accessing Data in ENA Browser

All accessions can be viewed in the ENA browser:

Direct URL: https://www.ebi.ac.uk/ena/browser/view/<accession>
Example: https://www.ebi.ac.uk/ena/browser/view/PRJDB2345

ORCID Data Claiming

ENA studies can be claimed against your ORCID ID through the EBI Search interface. Search for your projects and click "Claim to ORCID" to link them to your ORCID profile.

Data Policy and Usage

ENA/INSDC Data Policy

This tool accesses data from the European Nucleotide Archive (ENA), which is part of the International Nucleotide Sequence Database Collaboration (INSDC) with DDBJ and GenBank.

Key Points:

Open Access: All data in ENA/INSDC databases are freely and publicly accessible
No Restrictions: Data have no use restrictions or licensing requirements
Redistribution: Free redistribution and use of data is permitted
Permanence: All submitted records remain permanently accessible
Attribution: Proper citation of original submissions is expected (see above)

Data Availability

Data in ENA can be:

Public: Freely accessible through this tool and ENA browser
Confidential: Pre-publication data not yet publicly available (not searchable through this tool)

Released data should be cited appropriately in publications and claimed via ORCID where applicable.

Data Standards

ENA promotes data harmonization through:

Sample Checklists: Minimum information standards for different data types
MIxS Standards: Genomic Standards Consortium (GSC) minimum information standards
Community Standards: Research community-developed reporting standards

For more information, see the ENA Data Standards documentation.

Disclaimer

This tool provides access to data from the European Nucleotide Archive (ENA) at EMBL-EBI. The tool is:

Independent: Not officially affiliated with or endorsed by ENA, EMBL-EBI, or INSDC
Quality: Data quality and accuracy are the responsibility of the original submitters
Updates: ENA data and APIs may change; this tool is maintained to reflect current ENA services
Support: For issues with ENA data or services, contact ENA Support

The European Nucleotide Archive is developed and maintained at EMBL-EBI under the guidance of the INSDC International Advisory Committee.

Contact

If you found a bug with this MCP server, please use the issue tracker.

For questions about ENA data or services, contact ENA Support.

Acknowledgments

This tool accesses data from:

European Nucleotide Archive (ENA) at EMBL-EBI
International Nucleotide Sequence Database Collaboration (INSDC)

Special thanks to the ENA team for maintaining the public API and comprehensive documentation.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

mkuehl

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

0.0.7

Feb 27, 2026

0.0.6

Nov 28, 2025

0.0.5

Nov 28, 2025

0.0.4

Nov 28, 2025

0.0.3

Nov 19, 2025

0.0.2

Nov 17, 2025

This version

0.0.1

Nov 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nucleotide_archive_mcp-0.0.1.tar.gz (168.0 kB view details)

Uploaded Nov 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nucleotide_archive_mcp-0.0.1-py3-none-any.whl (36.3 kB view details)

Uploaded Nov 17, 2025 Python 3

File details

Details for the file nucleotide_archive_mcp-0.0.1.tar.gz.

File metadata

Download URL: nucleotide_archive_mcp-0.0.1.tar.gz
Upload date: Nov 17, 2025
Size: 168.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nucleotide_archive_mcp-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`4e9fde9f3bd25b77af936d81c2714a8e41f0a710a870182078795b4b6e18629f`
MD5	`0d00bff3275fd62c5cb513bfdefb957b`
BLAKE2b-256	`68ea3a62d9b63fd990f75071dff9f1809c7f23f63bacbaa5f4ed905e9a564749`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nucleotide_archive_mcp-0.0.1.tar.gz:

Publisher: release.yaml on biocontext-ai/nucleotide_archive_mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nucleotide_archive_mcp-0.0.1.tar.gz
- Subject digest: 4e9fde9f3bd25b77af936d81c2714a8e41f0a710a870182078795b4b6e18629f
- Sigstore transparency entry: 705393856
- Sigstore integration time: Nov 17, 2025
Source repository:
- Permalink: biocontext-ai/nucleotide_archive_mcp@b162bd227d621e38bff80eebbb31785eb2ef726f
- Branch / Tag: refs/tags/0.0.1
- Owner: https://github.com/biocontext-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yaml@b162bd227d621e38bff80eebbb31785eb2ef726f
- Trigger Event: release

File details

Details for the file nucleotide_archive_mcp-0.0.1-py3-none-any.whl.

File metadata

Download URL: nucleotide_archive_mcp-0.0.1-py3-none-any.whl
Upload date: Nov 17, 2025
Size: 36.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nucleotide_archive_mcp-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`341428caf42b52aaa33f05916fb7201c76b80866b6ffa2710f0654d44b961e4d`
MD5	`8b0a9912ee10d5a1cc9fbdd2ec2c2cf5`
BLAKE2b-256	`28439548afb7896948396271b88f492c8504882c529b2a142ca5f6c4356bef6b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nucleotide_archive_mcp-0.0.1-py3-none-any.whl:

Publisher: release.yaml on biocontext-ai/nucleotide_archive_mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nucleotide_archive_mcp-0.0.1-py3-none-any.whl
- Subject digest: 341428caf42b52aaa33f05916fb7201c76b80866b6ffa2710f0654d44b961e4d
- Sigstore transparency entry: 705393860
- Sigstore integration time: Nov 17, 2025
Source repository:
- Permalink: biocontext-ai/nucleotide_archive_mcp@b162bd227d621e38bff80eebbb31785eb2ef726f
- Branch / Tag: refs/tags/0.0.1
- Owner: https://github.com/biocontext-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yaml@b162bd227d621e38bff80eebbb31785eb2ef726f
- Trigger Event: release

nucleotide-archive-mcp 0.0.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

RNA Dataset Search - MCP Server

Features

Available Tools

Search & Discovery

Download & Access

Advanced

Example Use Cases

Simple Searches (Preset Filters)

Advanced Searches (Specific Library Types)

Workflow Examples

Getting started

Installation

1. Use uvx to run it immediately

2. Include it in one of various clients that supports the mcp.json standard

3. Install it through pip:

4. Install the latest development version:

Configuration

Data Citation and Attribution

How to Cite ENA Data

Accessing Data in ENA Browser

ORCID Data Claiming

Data Policy and Usage

ENA/INSDC Data Policy

Data Availability

Data Standards

Disclaimer

Contact

Acknowledgments

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

1. Use `uvx` to run it immediately

2. Include it in one of various clients that supports the `mcp.json` standard

3. Install it through `pip`: