PydanticAI and MCP approaches for getting textual representations of scientific literature from PMIDs, DOIs, etc.
Project description
ARTL-MCP: All Roads to Literature
An MCP (Model Context Protocol) server and CLI toolkit for comprehensive scientific literature retrieval and analysis using PMIDs, DOIs, PMCIDs, and keyword searches.
Quick Start
MCP Server (Recommended)
Add this to your Claude Desktop MCP configuration:
{
"mcpServers": {
"artl-mcp": {
"command": "uvx",
"args": ["artl-mcp"]
}
}
}
Standalone CLI
# Install and use CLI commands
uvx artl-cli get-doi-metadata --doi "10.1038/nature12373"
uvx artl-cli search-papers-by-keyword --query "CRISPR gene editing" --max-results 5
Core Features
🔍 Literature Search & Discovery
- Keyword-based paper search with advanced filtering
- Recent publication discovery
- PubMed search with multiple output formats
📄 Metadata & Content Retrieval
- DOI/PMID/PMCID metadata extraction
- Abstract retrieval from PubMed
- Full-text access via multiple sources (PMC, Unpaywall, BioC)
- PDF text extraction and processing
🔗 Identifier Management
- Universal identifier conversion (DOI ↔ PMID ↔ PMCID)
- Support for multiple input formats (URLs, CURIEs, raw IDs)
- Comprehensive identifier validation
📊 Citation Networks
- Reference analysis (papers cited BY a given paper)
- Citation analysis (papers that CITE a given paper)
- Multi-source citation data (CrossRef, OpenAlex, Semantic Scholar)
- Related paper discovery through citation networks
💾 File Management
- MCP Mode: Returns data directly without file saving (optimal for AI assistants)
- CLI Mode: Full file saving with path reporting and content management
- Content size management - large content automatically handled appropriately
- Memory-efficient streaming for large files (PDFs, datasets)
- Cross-platform filename sanitization
- Multiple output formats (JSON, TXT, CSV, PDF) in CLI mode
- Configurable directories and temp file management in CLI mode
Available MCP Tools
When running as an MCP server, you get access to 32 tools organized into categories:
🔄 MCP vs CLI Mode Differences
MCP Mode (AI assistants): Returns data directly without file saving:
{
"data": { /* tool-specific content */ },
"mcp_mode": true,
"note": "Data returned directly - use CLI for file saving"
}
CLI Mode (command line): Full file saving with path reporting:
{
"data": { /* tool-specific content */ },
"saved_to": "/path/to/saved/file.json"
}
Literature Search
search_papers_by_keyword- Advanced keyword search with filteringsearch_recent_papers- Find recent publicationssearch_pubmed_for_pmids- PubMed search returning PMIDs
Metadata & Abstracts
get_doi_metadata- Comprehensive DOI metadataget_abstract_from_pubmed_id- PubMed abstractsget_doi_fetcher_metadata- Enhanced metadata (requires email)get_unpaywall_info- Open access availability
Full Text Access
get_full_text_from_doi- Multi-source full text (requires email)extract_pdf_text- PDF text extractionget_pmcid_text- PMC full textget_full_text_from_bioc- BioC format text
Identifier Conversion
get_all_identifiers- Get all IDs for any identifierdoi_to_pmid,pmid_to_doi- Individual conversionsvalidate_identifier- Format validation
Citation Networks
get_paper_references- Papers cited by a given paperget_paper_citations- Papers citing a given paperget_citation_network- Comprehensive citation datafind_related_papers- Citation-based recommendations
CLI Commands
The artl-cli command provides access to all functionality:
# Metadata retrieval
artl-cli get-doi-metadata --doi "10.1038/nature12373"
artl-cli get-abstract-from-pubmed-id --pmid "23851394"
# Literature search
artl-cli search-papers-by-keyword --query "machine learning" --max-results 10
artl-cli search-recent-papers --query "COVID-19" --years-back 2
# Full text (requires email for some sources)
artl-cli get-full-text-from-doi --doi "10.1038/nature12373" --email "user@institution.edu"
# Identifier conversion
artl-cli doi-to-pmid --doi "10.1038/nature12373"
artl-cli get-all-identifiers --identifier "PMC3737249"
# Citation analysis
artl-cli get-paper-citations --doi "10.1038/nature12373"
Configuration
Email Requirements
Several APIs require institutional email addresses:
export ARTL_EMAIL_ADDR="researcher@university.edu"
# or create local/.env file with: ARTL_EMAIL_ADDR=researcher@university.edu
MCP Client Configuration: Different MCP clients support configuration injection. ARTL-MCP's enhanced configuration system provides multiple methods for email setup:
- Claude Desktop: Inherits system environment variables automatically
- Goose Desktop: Requires MCP extension configuration (see USERS.md)
- Other clients: May support client-specific configuration injection
See USERS.md for comprehensive configuration instructions.
File Output (CLI Mode Only)
Configure where files are saved when using CLI commands:
export ARTL_OUTPUT_DIR="~/Papers" # Default: ~/Documents/artl-mcp
export ARTL_TEMP_DIR="/tmp/my-artl-temp" # Default: system temp + artl-mcp
export ARTL_KEEP_TEMP_FILES=true # Default: false
Note: MCP mode returns data directly without file saving.
Supported Identifier Formats
DOI: 10.1038/nature12373, doi:10.1038/nature12373, https://doi.org/10.1038/nature12373
PMID: 23851394, PMID:23851394, pmid:23851394
PMCID: PMC3737249, 3737249, PMC:3737249
All tools automatically detect and normalize identifier formats.
Development Setup
git clone https://github.com/contextualizer-ai/artl-mcp.git
cd artl-mcp
uv sync --group dev
# Run tests
make test # Fast development tests
make test-coverage # Full test suite with coverage
# Code quality
make lint # Ruff linting
make format # Black formatting
make mypy # Type checking
Documentation
- USERS.md - Comprehensive user guide with examples
- DEVELOPERS.md - Development setup and architecture
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file artl_mcp-0.34.0.tar.gz.
File metadata
- Download URL: artl_mcp-0.34.0.tar.gz
- Upload date:
- Size: 104.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
616d12a1c3af8755dc964fd5f93ddb962069fc98bf02e1a0152e141123e5f15f
|
|
| MD5 |
ec06998e64f5662c93ce0bc84b9bf40a
|
|
| BLAKE2b-256 |
3cc52e64e87610e4ed8eacf1b70f214d3a15355b2281862dafb8a712fa60d074
|
Provenance
The following attestation bundles were made for artl_mcp-0.34.0.tar.gz:
Publisher:
pypi-publish.yaml on contextualizer-ai/artl-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
artl_mcp-0.34.0.tar.gz -
Subject digest:
616d12a1c3af8755dc964fd5f93ddb962069fc98bf02e1a0152e141123e5f15f - Sigstore transparency entry: 653538127
- Sigstore integration time:
-
Permalink:
contextualizer-ai/artl-mcp@99a01f05965c30c1a01db42a1dfa0a48a2d4cf28 -
Branch / Tag:
refs/tags/v0.34.0 - Owner: https://github.com/contextualizer-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yaml@99a01f05965c30c1a01db42a1dfa0a48a2d4cf28 -
Trigger Event:
push
-
Statement type:
File details
Details for the file artl_mcp-0.34.0-py3-none-any.whl.
File metadata
- Download URL: artl_mcp-0.34.0-py3-none-any.whl
- Upload date:
- Size: 65.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
215ff5079ea5ba1ecfba3682b4975f557e2c7ad7082c8f0c31d47065f7a4792f
|
|
| MD5 |
c46ddfdf87d5b3c688c157cc1d06d8b9
|
|
| BLAKE2b-256 |
8b5127f54e06eff16b6631843691d87286fc8cda9f57d51ac757b2a41e5be018
|
Provenance
The following attestation bundles were made for artl_mcp-0.34.0-py3-none-any.whl:
Publisher:
pypi-publish.yaml on contextualizer-ai/artl-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
artl_mcp-0.34.0-py3-none-any.whl -
Subject digest:
215ff5079ea5ba1ecfba3682b4975f557e2c7ad7082c8f0c31d47065f7a4792f - Sigstore transparency entry: 653538128
- Sigstore integration time:
-
Permalink:
contextualizer-ai/artl-mcp@99a01f05965c30c1a01db42a1dfa0a48a2d4cf28 -
Branch / Tag:
refs/tags/v0.34.0 - Owner: https://github.com/contextualizer-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yaml@99a01f05965c30c1a01db42a1dfa0a48a2d4cf28 -
Trigger Event:
push
-
Statement type: