Skip to main content

PydanticAI and MCP approaches for getting textual representations of scientific literature from PMIDs, DOIs, etc.

Project description

ARTL-MCP: All Roads to Literature

An MCP (Model Context Protocol) server and CLI toolkit for comprehensive scientific literature retrieval and analysis using PMIDs, DOIs, PMCIDs, and keyword searches.

Quick Start

MCP Server (Recommended)

Add this to your Claude Desktop MCP configuration:

{
  "mcpServers": {
    "artl-mcp": {
      "command": "uvx",
      "args": ["artl-mcp"]
    }
  }
}

Standalone CLI

# Install and use CLI commands
uvx artl-cli get-doi-metadata --doi "10.1038/nature12373"
uvx artl-cli search-papers-by-keyword --query "CRISPR gene editing" --max-results 5

Core Features

🔍 Literature Search & Discovery

  • Keyword-based paper search with advanced filtering
  • Recent publication discovery
  • PubMed search with multiple output formats

📄 Metadata & Content Retrieval

  • DOI/PMID/PMCID metadata extraction
  • Abstract retrieval from PubMed
  • Full-text access via multiple sources (PMC, Unpaywall, BioC)
  • PDF text extraction and processing

🔗 Identifier Management

  • Universal identifier conversion (DOI ↔ PMID ↔ PMCID)
  • Support for multiple input formats (URLs, CURIEs, raw IDs)
  • Comprehensive identifier validation

📊 Citation Networks

  • Reference analysis (papers cited BY a given paper)
  • Citation analysis (papers that CITE a given paper)
  • Multi-source citation data (CrossRef, OpenAlex, Semantic Scholar)
  • Related paper discovery through citation networks

💾 File Management

  • MCP Mode: Returns data directly without file saving (optimal for AI assistants)
  • CLI Mode: Full file saving with path reporting and content management
  • Content size management - large content automatically handled appropriately
  • Memory-efficient streaming for large files (PDFs, datasets)
  • Cross-platform filename sanitization
  • Multiple output formats (JSON, TXT, CSV, PDF) in CLI mode
  • Configurable directories and temp file management in CLI mode

Available MCP Tools

When running as an MCP server, you get access to 32 tools organized into categories:

🔄 MCP vs CLI Mode Differences

MCP Mode (AI assistants): Returns data directly without file saving:

{
  "data": { /* tool-specific content */ },
  "mcp_mode": true,
  "note": "Data returned directly - use CLI for file saving"
}

CLI Mode (command line): Full file saving with path reporting:

{
  "data": { /* tool-specific content */ },
  "saved_to": "/path/to/saved/file.json"
}

Literature Search

  • search_papers_by_keyword - Advanced keyword search with filtering
  • search_recent_papers - Find recent publications
  • search_pubmed_for_pmids - PubMed search returning PMIDs

Metadata & Abstracts

  • get_doi_metadata - Comprehensive DOI metadata
  • get_abstract_from_pubmed_id - PubMed abstracts
  • get_doi_fetcher_metadata - Enhanced metadata (requires email)
  • get_unpaywall_info - Open access availability

Full Text Access

  • get_full_text_from_doi - Multi-source full text (requires email)
  • extract_pdf_text - PDF text extraction
  • get_pmcid_text - PMC full text
  • get_full_text_from_bioc - BioC format text

Identifier Conversion

  • get_all_identifiers - Get all IDs for any identifier
  • doi_to_pmid, pmid_to_doi - Individual conversions
  • validate_identifier - Format validation

Citation Networks

  • get_paper_references - Papers cited by a given paper
  • get_paper_citations - Papers citing a given paper
  • get_citation_network - Comprehensive citation data
  • find_related_papers - Citation-based recommendations

CLI Commands

The artl-cli command provides access to all functionality:

# Metadata retrieval
artl-cli get-doi-metadata --doi "10.1038/nature12373"
artl-cli get-abstract-from-pubmed-id --pmid "23851394"

# Literature search
artl-cli search-papers-by-keyword --query "machine learning" --max-results 10
artl-cli search-recent-papers --query "COVID-19" --years-back 2

# Full text (requires email for some sources)
artl-cli get-full-text-from-doi --doi "10.1038/nature12373" --email "user@institution.edu"

# Identifier conversion
artl-cli doi-to-pmid --doi "10.1038/nature12373"
artl-cli get-all-identifiers --identifier "PMC3737249"

# Citation analysis  
artl-cli get-paper-citations --doi "10.1038/nature12373"

Configuration

Email Requirements

Several APIs require institutional email addresses:

export ARTL_EMAIL_ADDR="researcher@university.edu"
# or create local/.env file with: ARTL_EMAIL_ADDR=researcher@university.edu

MCP Client Configuration: Different MCP clients support configuration injection. ARTL-MCP's enhanced configuration system provides multiple methods for email setup:

  • Claude Desktop: Inherits system environment variables automatically
  • Goose Desktop: Requires MCP extension configuration (see USERS.md)
  • Other clients: May support client-specific configuration injection

See USERS.md for comprehensive configuration instructions.

File Output (CLI Mode Only)

Configure where files are saved when using CLI commands:

export ARTL_OUTPUT_DIR="~/Papers"           # Default: ~/Documents/artl-mcp
export ARTL_TEMP_DIR="/tmp/my-artl-temp"    # Default: system temp + artl-mcp
export ARTL_KEEP_TEMP_FILES=true            # Default: false

Note: MCP mode returns data directly without file saving.

Supported Identifier Formats

DOI: 10.1038/nature12373, doi:10.1038/nature12373, https://doi.org/10.1038/nature12373

PMID: 23851394, PMID:23851394, pmid:23851394

PMCID: PMC3737249, 3737249, PMC:3737249

All tools automatically detect and normalize identifier formats.

Development Setup

git clone https://github.com/contextualizer-ai/artl-mcp.git
cd artl-mcp
uv sync --group dev

# Run tests
make test                    # Fast development tests
make test-coverage          # Full test suite with coverage

# Code quality
make lint                   # Ruff linting
make format                 # Black formatting
make mypy                   # Type checking

Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

artl_mcp-0.34.0.tar.gz (104.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

artl_mcp-0.34.0-py3-none-any.whl (65.1 kB view details)

Uploaded Python 3

File details

Details for the file artl_mcp-0.34.0.tar.gz.

File metadata

  • Download URL: artl_mcp-0.34.0.tar.gz
  • Upload date:
  • Size: 104.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for artl_mcp-0.34.0.tar.gz
Algorithm Hash digest
SHA256 616d12a1c3af8755dc964fd5f93ddb962069fc98bf02e1a0152e141123e5f15f
MD5 ec06998e64f5662c93ce0bc84b9bf40a
BLAKE2b-256 3cc52e64e87610e4ed8eacf1b70f214d3a15355b2281862dafb8a712fa60d074

See more details on using hashes here.

Provenance

The following attestation bundles were made for artl_mcp-0.34.0.tar.gz:

Publisher: pypi-publish.yaml on contextualizer-ai/artl-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file artl_mcp-0.34.0-py3-none-any.whl.

File metadata

  • Download URL: artl_mcp-0.34.0-py3-none-any.whl
  • Upload date:
  • Size: 65.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for artl_mcp-0.34.0-py3-none-any.whl
Algorithm Hash digest
SHA256 215ff5079ea5ba1ecfba3682b4975f557e2c7ad7082c8f0c31d47065f7a4792f
MD5 c46ddfdf87d5b3c688c157cc1d06d8b9
BLAKE2b-256 8b5127f54e06eff16b6631843691d87286fc8cda9f57d51ac757b2a41e5be018

See more details on using hashes here.

Provenance

The following attestation bundles were made for artl_mcp-0.34.0-py3-none-any.whl:

Publisher: pypi-publish.yaml on contextualizer-ai/artl-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page