Automated research paper tracking and knowledge synthesis
Project description
research-cruise ๐
An autonomous, serverless, multi-agent system that tracks academic papers, extracts structured data, and weaves them into a local, interconnected Markdown knowledge graph โ a Second Brain for ML research.
Built to eventually communicate with other identical systems, forming a decentralised Hive Mind.
Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Triggers โ
โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโผโโโโโโโโโโโโโ
โ Federation Agent โ โ consumes external public_feed.json feeds
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโผโโโโโโโโโโโโโ
โ Watcher โ โ queries ArXiv API by keyword
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโ
โ RawPaper[]
โโโโโโโโโโโโโโผโโโโโโโโโโโโโ
โ Router (Skill โ โ routes each paper to a domain skill
โ Registry) โ (NLP, Vision, TimeSeries, โฆ)
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโ
โ Skill
โโโโโโโโโโโโโโผโโโโโโโโโโโโโ
โ Analyst โ โ pydantic-ai structured extraction
โ (pydantic-ai) โ with taxonomy injection
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโ
โ PaperAnalysis
โโโโโโโโโโโโโโผโโโโโโโโโโโโโ
โ Vault Writer โ โ writes .md to tmp_vault/
โ โ generates concept stubs
โ โ updates public_feed.json
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโ
โ atomic move
โโโโโโโโโโโโโโผโโโโโโโโโโโโโ
โ /vault โ โ permanent, file-based knowledge graph
โ papers/ concepts/ โ
โ datasets/ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโ
Directory Structure
research-cruise/
โโโ .github/
โ โโโ workflows/
โ โโโ autonomous-tracker.yml # CI/CD pipeline
โโโ vault/
โ โโโ papers/ # One .md file per paper
โ โโโ concepts/ # Auto-generated concept stubs
โ โโโ datasets/ # Dataset stubs
โโโ swarm_notes/
โ โโโ config.py # Configuration & env vars
โ โโโ vault_manager.py # Staging pattern (tmp_vault โ vault)
โ โโโ watcher.py # Configurable paper-source watcher
โ โโโ router.py # Skill registry router
โ โโโ analyst.py # pydantic-ai extraction agent
โ โโโ vault_writer.py # Markdown writer + public_feed.json
โ โโโ federation.py # Hive Mind federation agent
โ โโโ main.py # Pipeline orchestrator
Quick Start
Prerequisites
- Python 3.11+
- An LLM API key
Local Dev Run
# Install dependencies
uv sync
# Set your API key in .env file
export LLM_API_KEY="sk-..."
export PAPER_SOURCE="semantic_scholar"
export SEMANTIC_SCHOLAR_API_KEY="..."
# prepare configs in configs/ folder
...
# Run the pipeline
python -m swarm_notes.main
Configuration (Environment Variables)
Use the example in configs folder to create your own version.
CI/CD Setup
Add the required secret
The pipeline needs an OpenAI-compatible API key to run the LLM analyst step.
- Open your forked repository on GitHub.
- Go to Settings โ Secrets and variables โ Actions.
- Click New repository secret.
- Set Name to
LLM_API_KEYand Secret to your API key (e.g.sk-...). - Click Add secret.
Note: The workflow exposes
LLM_API_KEYas bothLLM_API_KEYandOPENAI_API_KEYso that pydantic-ai's OpenAI provider picks it up automatically.
The Hive Mind (Federation)
Every successful run updates public_feed.json at the root of the repository with the metadata and summaries of the last 20 processed papers.
To subscribe to another agent's feed, pass their raw public_feed.json URL:
export FEDERATION_FEEDS="https://raw.githubusercontent.com/alice/research-cruise/main/public_feed.json,https://raw.githubusercontent.com/bob/research-cruise/main/public_feed.json"
python -m swarm_notes.main
Conflict resolution: If an external feed contains a review of a paper that already exists locally, the local metadata is preserved. The external summary is appended under a ### External Perspectives section:
### External Perspectives
> "Transformers are over-engineered for this dataset." - @Agent_alice
> *(Retrieved 2024-01-15)*
Vault File Format
Each paper note uses hybrid YAML frontmatter (CSL-compatible fields + custom fields):
---
# CSL-compatible fields
title: "Attention Is All You Need"
author:
- literal: "Ashish Vaswani"
issued:
date-parts:
- [2017, 6, 12]
url: "https://arxiv.org/abs/1706.03762"
# Custom fields
arxiv_id: "1706.03762"
domain: "nlp"
tags:
- "transformer"
- "attention-mechanism"
architectures:
- "encoder-decoder"
datasets:
- "WMT 2014"
skill: "NLPSkill"
processed_at: "2024-01-15T06:00:00Z"
---
Body sections: Summary, Key Contributions, Key Concepts (with relative links to ../concepts/), Datasets, Limitations, Links.
Taxonomy
taxonomy.json contains the controlled vocabulary of tags, architectures, and domains injected into the analyst's system prompt. This prevents LLM hallucination and keeps metadata consistent. Edit taxonomy.json to add new terms.
License
MIT โ see LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file swarm_notes-0.1.2.tar.gz.
File metadata
- Download URL: swarm_notes-0.1.2.tar.gz
- Upload date:
- Size: 33.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7cfabaa499ab562389da3d2d50778f8a72f50d75c780aa821192197a59735a6
|
|
| MD5 |
2155e64cbb409e145ae771bd1798a26d
|
|
| BLAKE2b-256 |
bb92f57ecec227c968749a3b44f6f773bd9d2cb3d42b18625e4654fd313bcbdb
|
File details
Details for the file swarm_notes-0.1.2-py3-none-any.whl.
File metadata
- Download URL: swarm_notes-0.1.2-py3-none-any.whl
- Upload date:
- Size: 45.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
69da652276a9ca4de18901176a140ddc559d8a82031d247fb452c5f6d733bea5
|
|
| MD5 |
17a5271072e96c29ca7e78c0a04b8f64
|
|
| BLAKE2b-256 |
28493d8cb5d459309e430e82e137df9b5cbffe003e83489d3d899491f2db2e31
|