Systematic literature search library for scientific papers
Project description
scimesh
A Python library for systematic literature search across multiple academic databases.
Search arXiv, OpenAlex, and Scopus with a unified API. Export to BibTeX, RIS, CSV, or JSON. Download PDFs via Open Access (Unpaywall).
Features
- Multi-provider search - arXiv, OpenAlex, Scopus (parallel queries)
- Scopus-style query syntax -
TITLE(transformers) AND AUTHOR(Vaswani) - Programmatic query API - Compose queries with Python operators (
&,|,~) - Export formats - BibTeX, RIS, CSV, JSON
- PDF download - Open Access via Unpaywall (Sci-Hub opt-in)
- Async streaming - Results arrive as they're found
- Automatic deduplication - By DOI or title+year across providers
Installation
Run directly without installing:
uvx scimesh search "TITLE(transformer)"
Install as a CLI tool (recommended):
uv tool install scimesh
Add to a project:
uv add scimesh
With pip:
pip install scimesh
Quick Start
CLI
# Search arXiv and OpenAlex (default providers)
scimesh search "TITLE(transformer) AND AUTHOR(Vaswani)"
# Export to BibTeX
scimesh search "TITLE(BERT)" -f bibtex -o papers.bib
# Download PDFs from search results
scimesh search "TITLE(attention)" -f json | scimesh download -o ./pdfs
Python API
import asyncio
from scimesh import search, title, author, year
from scimesh.providers import Arxiv, OpenAlex
async def main():
query = title("transformer") & author("Vaswani") & year(2017, 2023)
result = await search(
query,
providers=[Arxiv(), OpenAlex()],
max_results=100,
)
for paper in result.papers:
print(f"{paper.title} ({paper.year})")
asyncio.run(main())
Query Syntax
Scopus-Style Strings
The library parses Scopus-compatible query strings automatically.
Field Operators:
| Operator | Description | Example |
|---|---|---|
TITLE(...) |
Search in title | TITLE(transformer) |
ABS(...) |
Search in abstract | ABS(attention mechanism) |
KEY(...) |
Search in keywords | KEY(machine learning) |
TITLE-ABS(...) |
Title OR abstract | TITLE-ABS(neural network) |
TITLE-ABS-KEY(...) |
Title OR abstract OR keywords | TITLE-ABS-KEY(deep learning) |
AUTHOR(...) |
Search by author | AUTHOR(Vaswani) |
AUTH(...) |
Alias for AUTHOR | AUTH(Hinton) |
DOI(...) |
Search by DOI | DOI(10.1038/nature14539) |
ALL(...) |
Full text search | ALL(protein folding) |
Year Operators:
| Operator | Description | Example |
|---|---|---|
PUBYEAR = 2023 |
Exact year | Papers from 2023 |
PUBYEAR > 2020 |
After year | Papers from 2021+ |
PUBYEAR < 2020 |
Before year | Papers until 2019 |
PUBYEAR >= 2020 |
From year | Papers from 2020+ |
PUBYEAR <= 2023 |
Until year | Papers until 2023 |
Logical Operators:
| Operator | Description | Example |
|---|---|---|
AND |
Both conditions | TITLE(BERT) AND AUTHOR(Google) |
OR |
Either condition | TITLE(GPT) OR TITLE(BERT) |
AND NOT |
Exclude condition | TITLE(neural) AND NOT AUTHOR(Smith) |
(...) |
Grouping | (TITLE(A) OR TITLE(B)) AND AUTHOR(C) |
Examples:
# Basic title search
scimesh search "TITLE(transformer)"
# Author + title
scimesh search "TITLE(attention is all you need) AND AUTHOR(Vaswani)"
# Multiple terms with OR
scimesh search "TITLE(GPT-4) OR TITLE(GPT-3) OR TITLE(ChatGPT)"
# Exclusion
scimesh search "TITLE(machine learning) AND NOT AUTHOR(Smith)"
# Year range
scimesh search "TITLE(BERT) AND PUBYEAR > 2018 AND PUBYEAR < 2022"
# Complex nested query
scimesh search "(TITLE(transformer) OR TITLE(attention)) AND AUTHOR(Google) AND PUBYEAR >= 2017"
# Search across title, abstract, and keywords
scimesh search "TITLE-ABS-KEY(reinforcement learning) AND PUBYEAR = 2023"
# Full text search
scimesh search "ALL(CRISPR gene editing)"
Programmatic Query API
Build queries with Python operators for type safety and composability.
Field Builders:
from scimesh import title, abstract, author, keyword, doi, fulltext, year
# Single field queries
q = title("transformer architecture")
q = author("Yoshua Bengio")
q = abstract("self-attention mechanism")
q = keyword("natural language processing")
q = doi("10.1038/nature14539")
q = fulltext("protein structure prediction")
Year Filters:
from scimesh import year
q = year(2020, 2024) # Range: 2020-2024 inclusive
q = year(start=2020) # From 2020 onwards
q = year(end=2023) # Until 2023
q = year(2023, 2023) # Exact year 2023
Combining with Operators:
from scimesh import title, author, year
# AND: both conditions must match
q = title("BERT") & author("Google")
# OR: either condition matches
q = title("GPT-3") | title("GPT-4")
# NOT: exclude matches
q = title("neural networks") & ~author("Smith")
# Complex combinations
q = (
(title("transformer") | title("attention"))
& author("Vaswani")
& year(2017, 2023)
& ~keyword("computer vision")
)
Full Example:
import asyncio
from scimesh import search, title, author, year
from scimesh.providers import Arxiv, OpenAlex, Scopus
async def main():
# Build query programmatically
query = title("large language model") & year(2022, 2024)
# Or use string syntax (equivalent)
query = "TITLE(large language model) AND PUBYEAR >= 2022"
result = await search(
query,
providers=[Arxiv(), OpenAlex()],
max_results=50,
)
print(f"Found {len(result.papers)} papers")
# Export to BibTeX
from scimesh.export import get_exporter
get_exporter("bibtex").export(result, "papers.bib")
asyncio.run(main())
Streaming Mode:
# Process papers as they arrive from providers
async for paper in search(query, providers, stream=True):
print(f"Found: {paper.title}")
CLI Reference
scimesh search
scimesh search <query> [OPTIONS]
| Flag | Description | Default |
|---|---|---|
-p, --provider |
Providers: arxiv, openalex, scopus | arxiv, openalex |
-n, --max |
Max results per provider | 100 |
-t, --total |
Max total results across all providers | - |
-f, --format |
Output: tree, csv, json, bibtex, ris | tree |
-o, --output |
Output file path | stdout |
--on-error |
Error handling: fail, warn, ignore | warn |
--no-dedupe |
Disable deduplication | false |
scimesh download
scimesh download [DOI] [OPTIONS]
| Flag | Description | Default |
|---|---|---|
-f, --from |
File with DOIs (one per line) | - |
-o, --output |
Output directory | current dir |
--scihub |
Enable Sci-Hub fallback (see disclaimer) | false |
Examples:
# Single DOI (Open Access only)
scimesh download "10.1038/nature14539" -o ./pdfs
# With Sci-Hub fallback enabled
scimesh download "10.1038/nature14539" -o ./pdfs --scihub
# From file
scimesh download -f dois.txt -o ./pdfs
# From search results (piped JSON)
scimesh search "TITLE(attention)" -f json | scimesh download -o ./pdfs
Requires UNPAYWALL_EMAIL env var for Open Access.
Disclaimer: Sci-Hub is disabled by default. The
--scihubflag enables it as a fallback when Open Access sources fail. Sci-Hub may violate copyright laws in your jurisdiction. Use at your own discretion and risk.
Providers
| Provider | API Key | Notes |
|---|---|---|
| arXiv | No | Preprints |
| OpenAlex | No | 61M+ papers, largest open database |
| Scopus | SCOPUS_API_KEY |
Requires institutional access |
from scimesh.providers import Arxiv, OpenAlex, Scopus
providers = [
Arxiv(),
OpenAlex(mailto="you@example.com"), # Optional, for polite pool
Scopus(), # Uses SCOPUS_API_KEY env var
]
Local Development
git clone https://github.com/gabfssilva/scimesh
cd scimesh
uv sync
# Run CLI
uv run scimesh search "TITLE(transformer)"
# Install as tool
uv tool install --reinstall .
# Tests
uv run pytest
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scimesh-0.1.0.tar.gz.
File metadata
- Download URL: scimesh-0.1.0.tar.gz
- Upload date:
- Size: 53.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5b9c7a34d3c2f5eb37da8c361ed61401de25057d6c8bf6735e0295ad310b1fe0
|
|
| MD5 |
9008d810d8cdee1d1d3ccc73ffc8cbf6
|
|
| BLAKE2b-256 |
5acdb183e36af02219b5a5202799ac399704129eb7a25be23d1c31579c81989c
|
Provenance
The following attestation bundles were made for scimesh-0.1.0.tar.gz:
Publisher:
publish.yml on gabfssilva/scimesh
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scimesh-0.1.0.tar.gz -
Subject digest:
5b9c7a34d3c2f5eb37da8c361ed61401de25057d6c8bf6735e0295ad310b1fe0 - Sigstore transparency entry: 863268853
- Sigstore integration time:
-
Permalink:
gabfssilva/scimesh@487f0ee1f9b322b29323d20d910976561fa9a8a6 -
Branch / Tag:
refs/tags/0.1.0 - Owner: https://github.com/gabfssilva
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@487f0ee1f9b322b29323d20d910976561fa9a8a6 -
Trigger Event:
release
-
Statement type:
File details
Details for the file scimesh-0.1.0-py3-none-any.whl.
File metadata
- Download URL: scimesh-0.1.0-py3-none-any.whl
- Upload date:
- Size: 31.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
defc9f6f17e1e460aa3e295b6d021439694b13e14057d82921dcfb2a098269a3
|
|
| MD5 |
c1444459fe1a65e50b788ad8dbb2cc34
|
|
| BLAKE2b-256 |
5856f1d4b7010ca229effdbb808ea78a88e4f180bdf7351e7ec83e752dd87c03
|
Provenance
The following attestation bundles were made for scimesh-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on gabfssilva/scimesh
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scimesh-0.1.0-py3-none-any.whl -
Subject digest:
defc9f6f17e1e460aa3e295b6d021439694b13e14057d82921dcfb2a098269a3 - Sigstore transparency entry: 863268856
- Sigstore integration time:
-
Permalink:
gabfssilva/scimesh@487f0ee1f9b322b29323d20d910976561fa9a8a6 -
Branch / Tag:
refs/tags/0.1.0 - Owner: https://github.com/gabfssilva
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@487f0ee1f9b322b29323d20d910976561fa9a8a6 -
Trigger Event:
release
-
Statement type: