Skip to main content

Multi-source Scientific Article Index and Collector

Project description

MOSAIC

Multi-sOurce Scientific Article Index and Collector

Search, discover, and download scientific papers from multiple open databases — with a single command.

Tests Coverage Python License: GPL v3 License: BSD-2 License: BSD-3 License: MIT

Full documentation


What is MOSAIC?

Instead of visiting a dozen or more different websites to hunt for a paper, MOSAIC queries them all simultaneously, deduplicates results by DOI, and downloads open-access PDFs — including those found via Unpaywall — in one shot. Results can also be sent directly to a Google NotebookLM notebook for AI-powered Q&A, audio overviews, video summaries, slide decks, mind maps, flashcards, quizzes, infographics, study guides, and more.

mosaic search "attention is all you need" --oa-only --download
mosaic notebook create "Attention Papers" --query "attention is all you need" --oa-only --podcast

Sources

Source Shorthand Coverage Auth OA PDF
arXiv arxiv Physics, CS, Math, Biology… None Always
Semantic Scholar ss 214 M papers, all disciplines Optional key When indexed
ScienceDirect sd Elsevier journals & books API key required OA articles
DOAJ doaj 8 M+ fully open-access articles None Always
Europe PMC epmc 45 M biomedical papers None PMC articles
OpenAlex oa 250 M+ works, all disciplines None When available
BASE base 300 M+ docs from 10 000+ repos None When OA + PDF format
CORE core 200 M+ OA full-text from repos Free API key downloadUrl field
Unpaywall PDF resolver for any DOI Email only Legal OA copy

Installation

# recommended — isolated install, globally available
pipx install mosaic-search        # or: uv tool install mosaic-search
# pip — must be inside a virtualenv (modern systems enforce PEP 668)
python -m venv ~/.venvs/mosaic && source ~/.venvs/mosaic/bin/activate
pip install mosaic-search
# from source
git clone https://github.com/szaghi/mosaic
cd mosaic
python -m venv .venv && source .venv/bin/activate
pip install -e .

Requires Python 3.11+


Quick Start

# 1. Set your email (enables Unpaywall PDF fallback)
mosaic config --unpaywall-email you@example.com

# 2. Optional: add an Elsevier API key to unlock ScienceDirect
mosaic config --elsevier-key YOUR_KEY

# 3. Search and download
mosaic search "transformer architecture" --oa-only --download

Usage

Search

# Search all enabled sources (10 results per source by default)
mosaic search "protein folding"

# More results, open-access only
mosaic search "deep learning" -n 25 --oa-only

# Single source
mosaic search "RNA velocity" --source epmc

Source shorthands: arxiv · ss · sd · doaj · epmc · oa · base · core

Filters

# By year — single, range, or list
mosaic search "BERT" --year 2019
mosaic search "diffusion models" -y 2020-2023
mosaic search "GPT" -y 2020,2022,2024

# By author (repeatable, OR logic, case-insensitive substring)
mosaic search "attention" -a Vaswani -a Shazeer

# By journal (case-insensitive substring)
mosaic search "CRISPR" --journal "Nature"

# Combine freely
mosaic search "graph neural" -y 2021-2023 -a Kipf -j "ICLR" --oa-only --download

Download by DOI

mosaic get 10.48550/arXiv.1706.03762

Checks the local cache first, then tries Unpaywall if no PDF URL is known.

Configuration

mosaic config --show                          # print current config
mosaic config --unpaywall-email me@uni.edu
mosaic config --elsevier-key abc123
mosaic config --ss-key xyz789
mosaic config --download-dir ~/papers

Config is stored at ~/.config/mosaic/config.toml. Downloaded PDFs go to ~/mosaic-papers/ by default.

NotebookLM

Send search results directly to a Google NotebookLM notebook:

# 1. Inject into MOSAIC (--include-apps exposes the notebooklm CLI)
pipx inject --include-apps mosaic-search "notebooklm-py[browser]"

# 2. Install Chromium — playwright lives inside the pipx venv, call it directly
~/.local/share/pipx/venvs/mosaic-search/bin/playwright install chromium

# 3. Authenticate once
notebooklm login

# 4. Search, download, and create a notebook in one command
mosaic notebook create "Transformers" --query "transformer architecture" --oa-only --podcast

# Or import PDFs you already have
mosaic notebook create "My Papers" --from-dir ~/mosaic-papers/

MOSAIC uploads local PDFs when available, falls back to URLs otherwise, and respects NotebookLM's 50-source limit. With --podcast, an Audio Overview is queued automatically.


Architecture

flowchart LR
    CLI -->|query + filters| Search
    Search --> arXiv & SS[Semantic Scholar] & SD[ScienceDirect] & DOAJ & EPMC[Europe PMC] & OA[OpenAlex] & BASE & CORE
    arXiv & SS & SD & DOAJ & EPMC & OA & BASE & CORE -->|Paper list| Dedup{Deduplicate\nby DOI}
    Dedup --> Cache[(SQLite\ncache)]
    Dedup --> Table[Rich table]
    Table -->|--download| DL[Downloader]
    DL -->|no pdf_url| UPW[Unpaywall]
    UPW --> DL
    DL --> Disk[(~/mosaic-papers/)]
    DL -->|mosaic notebook create| NLM[NotebookLM]

Development

pip install -e ".[dev]"

# with NotebookLM integration (includes Playwright for auth)
pip install -e ".[dev,notebooklm]"
playwright install chromium

# run tests + coverage
pytest

# live docs
cd docs && npm install && npm run docs:dev

Coverage report and badge JSON are written to docs/public/ after every test run.


License

MOSAIC is available under your choice of license:

License SPDX File
GNU General Public License v3 GPL-3.0-or-later LICENSE.gpl3.md
BSD 2-Clause BSD-2-Clause LICENSE.bsd-2.md
BSD 3-Clause BSD-3-Clause LICENSE.bsd-3.md
MIT MIT LICENSE.mit.md

© Stefano Zaghi

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mosaic_search-0.0.7.tar.gz (45.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mosaic_search-0.0.7-py3-none-any.whl (42.5 kB view details)

Uploaded Python 3

File details

Details for the file mosaic_search-0.0.7.tar.gz.

File metadata

  • Download URL: mosaic_search-0.0.7.tar.gz
  • Upload date:
  • Size: 45.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mosaic_search-0.0.7.tar.gz
Algorithm Hash digest
SHA256 9e1e3d1a8c414134a73a1722abb408a7d600dce0888d7cbef52b0a75ec0b0271
MD5 36764b72460984f85c357240d5c0161d
BLAKE2b-256 888ce2d830647edfc243872ccce1fe3c6ac9c8eb5f1b37afd6d7588a0117bb3c

See more details on using hashes here.

Provenance

The following attestation bundles were made for mosaic_search-0.0.7.tar.gz:

Publisher: tests.yml on szaghi/mosaic

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mosaic_search-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: mosaic_search-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 42.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mosaic_search-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 66da8c59713873e99a920787ebae9f3e9da649e2802b8be6f78dbacb1cef8cf9
MD5 5658a76636a1c3564a34c1950d0f6af4
BLAKE2b-256 a2c3ca6fbaba7a73a6b82f7450016d12a80062d6c614e7bbb8b0dcde4eb0ca91

See more details on using hashes here.

Provenance

The following attestation bundles were made for mosaic_search-0.0.7-py3-none-any.whl:

Publisher: tests.yml on szaghi/mosaic

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page