Multi-source Scientific Article Index and Collector
Project description
MOSAIC
Multi-sOurce Scientific Article Index and Collector
Search, discover, and download scientific papers from multiple open databases — with a single command.
What is MOSAIC?
Instead of visiting a dozen or more different websites to hunt for a paper, MOSAIC queries them all simultaneously, deduplicates results by DOI, and downloads open-access PDFs — including those found via Unpaywall — in one shot. Results can also be sent directly to a Google NotebookLM notebook for AI-powered Q&A, audio overviews, video summaries, slide decks, mind maps, flashcards, quizzes, infographics, study guides, and more.
mosaic search "attention is all you need" --oa-only --download
mosaic notebook create "Attention Papers" --query "attention is all you need" --oa-only --podcast
Sources
| Source | Shorthand | Coverage | Auth | OA PDF |
|---|---|---|---|---|
| arXiv | arxiv |
Physics, CS, Math, Biology… | None | Always |
| Semantic Scholar | ss |
214 M papers, all disciplines | Optional key | When indexed |
| ScienceDirect | sd |
Elsevier journals & books | API key required | OA articles |
| DOAJ | doaj |
8 M+ fully open-access articles | None | Always |
| Europe PMC | epmc |
45 M biomedical papers | None | PMC articles |
| OpenAlex | oa |
250 M+ works, all disciplines | None | When available |
| BASE | base |
300 M+ docs from 10 000+ repos | None | When OA + PDF format |
| CORE | core |
200 M+ OA full-text from repos | Free API key | downloadUrl field |
| Unpaywall | — | PDF resolver for any DOI | Email only | Legal OA copy |
Installation
# recommended — isolated install, globally available
pipx install mosaic-search # or: uv tool install mosaic-search
# pip — must be inside a virtualenv (modern systems enforce PEP 668)
python -m venv ~/.venvs/mosaic && source ~/.venvs/mosaic/bin/activate
pip install mosaic-search
# from source
git clone https://github.com/szaghi/mosaic
cd mosaic
python -m venv .venv && source .venv/bin/activate
pip install -e .
Requires Python 3.11+
Quick Start
# 1. Set your email (enables Unpaywall PDF fallback)
mosaic config --unpaywall-email you@example.com
# 2. Optional: add an Elsevier API key to unlock ScienceDirect
mosaic config --elsevier-key YOUR_KEY
# 3. Search and download
mosaic search "transformer architecture" --oa-only --download
Usage
Search
# Search all enabled sources (10 results per source by default)
mosaic search "protein folding"
# More results, open-access only
mosaic search "deep learning" -n 25 --oa-only
# Single source
mosaic search "RNA velocity" --source epmc
Source shorthands: arxiv · ss · sd · doaj · epmc · oa · base · core
Filters
# By year — single, range, or list
mosaic search "BERT" --year 2019
mosaic search "diffusion models" -y 2020-2023
mosaic search "GPT" -y 2020,2022,2024
# By author (repeatable, OR logic, case-insensitive substring)
mosaic search "attention" -a Vaswani -a Shazeer
# By journal (case-insensitive substring)
mosaic search "CRISPR" --journal "Nature"
# Combine freely
mosaic search "graph neural" -y 2021-2023 -a Kipf -j "ICLR" --oa-only --download
Download by DOI
mosaic get 10.48550/arXiv.1706.03762
Checks the local cache first, then tries Unpaywall if no PDF URL is known.
Configuration
mosaic config --show # print current config
mosaic config --unpaywall-email me@uni.edu
mosaic config --elsevier-key abc123
mosaic config --ss-key xyz789
mosaic config --download-dir ~/papers
Config is stored at ~/.config/mosaic/config.toml. Downloaded PDFs go to ~/mosaic-papers/ by default.
NotebookLM
Send search results directly to a Google NotebookLM notebook:
# 1. Inject into MOSAIC (--include-apps exposes the notebooklm CLI)
pipx inject --include-apps mosaic-search "notebooklm-py[browser]"
# 2. Install Chromium — playwright lives inside the pipx venv, call it directly
~/.local/share/pipx/venvs/mosaic-search/bin/playwright install chromium
# 3. Authenticate once
notebooklm login
# 4. Search, download, and create a notebook in one command
mosaic notebook create "Transformers" --query "transformer architecture" --oa-only --podcast
# Or import PDFs you already have
mosaic notebook create "My Papers" --from-dir ~/mosaic-papers/
MOSAIC uploads local PDFs when available, falls back to URLs otherwise, and respects NotebookLM's 50-source limit. With --podcast, an Audio Overview is queued automatically.
Architecture
flowchart LR
CLI -->|query + filters| Search
Search --> arXiv & SS[Semantic Scholar] & SD[ScienceDirect] & DOAJ & EPMC[Europe PMC] & OA[OpenAlex] & BASE & CORE
arXiv & SS & SD & DOAJ & EPMC & OA & BASE & CORE -->|Paper list| Dedup{Deduplicate\nby DOI}
Dedup --> Cache[(SQLite\ncache)]
Dedup --> Table[Rich table]
Table -->|--download| DL[Downloader]
DL -->|no pdf_url| UPW[Unpaywall]
UPW --> DL
DL --> Disk[(~/mosaic-papers/)]
DL -->|mosaic notebook create| NLM[NotebookLM]
Development
pip install -e ".[dev]"
# with NotebookLM integration (includes Playwright for auth)
pip install -e ".[dev,notebooklm]"
playwright install chromium
# run tests + coverage
pytest
# live docs
cd docs && npm install && npm run docs:dev
Coverage report and badge JSON are written to docs/public/ after every test run.
License
MOSAIC is available under your choice of license:
| License | SPDX | File |
|---|---|---|
| GNU General Public License v3 | GPL-3.0-or-later |
LICENSE.gpl3.md |
| BSD 2-Clause | BSD-2-Clause |
LICENSE.bsd-2.md |
| BSD 3-Clause | BSD-3-Clause |
LICENSE.bsd-3.md |
| MIT | MIT |
LICENSE.mit.md |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mosaic_search-0.0.7.tar.gz.
File metadata
- Download URL: mosaic_search-0.0.7.tar.gz
- Upload date:
- Size: 45.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e1e3d1a8c414134a73a1722abb408a7d600dce0888d7cbef52b0a75ec0b0271
|
|
| MD5 |
36764b72460984f85c357240d5c0161d
|
|
| BLAKE2b-256 |
888ce2d830647edfc243872ccce1fe3c6ac9c8eb5f1b37afd6d7588a0117bb3c
|
Provenance
The following attestation bundles were made for mosaic_search-0.0.7.tar.gz:
Publisher:
tests.yml on szaghi/mosaic
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mosaic_search-0.0.7.tar.gz -
Subject digest:
9e1e3d1a8c414134a73a1722abb408a7d600dce0888d7cbef52b0a75ec0b0271 - Sigstore transparency entry: 1051632465
- Sigstore integration time:
-
Permalink:
szaghi/mosaic@e78f10a335cf01903fb17b9dd2f710348a9bb160 -
Branch / Tag:
refs/tags/v0.0.7 - Owner: https://github.com/szaghi
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
tests.yml@e78f10a335cf01903fb17b9dd2f710348a9bb160 -
Trigger Event:
push
-
Statement type:
File details
Details for the file mosaic_search-0.0.7-py3-none-any.whl.
File metadata
- Download URL: mosaic_search-0.0.7-py3-none-any.whl
- Upload date:
- Size: 42.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
66da8c59713873e99a920787ebae9f3e9da649e2802b8be6f78dbacb1cef8cf9
|
|
| MD5 |
5658a76636a1c3564a34c1950d0f6af4
|
|
| BLAKE2b-256 |
a2c3ca6fbaba7a73a6b82f7450016d12a80062d6c614e7bbb8b0dcde4eb0ca91
|
Provenance
The following attestation bundles were made for mosaic_search-0.0.7-py3-none-any.whl:
Publisher:
tests.yml on szaghi/mosaic
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mosaic_search-0.0.7-py3-none-any.whl -
Subject digest:
66da8c59713873e99a920787ebae9f3e9da649e2802b8be6f78dbacb1cef8cf9 - Sigstore transparency entry: 1051632479
- Sigstore integration time:
-
Permalink:
szaghi/mosaic@e78f10a335cf01903fb17b9dd2f710348a9bb160 -
Branch / Tag:
refs/tags/v0.0.7 - Owner: https://github.com/szaghi
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
tests.yml@e78f10a335cf01903fb17b9dd2f710348a9bb160 -
Trigger Event:
push
-
Statement type: