Tracking daily arXiv updates and generating intelligent summaries with LLMs.
Project description
arxiv-daily
AI-powered arXiv research assistant - Beautiful terminal interface for tracking arXiv preprints and generating intelligent summaries with LLMs.
Build your own personal RAG (Retrieval-Augmented Generation) knowledge base - track daily papers, generate structured summaries with rich metadata, and export them to Markdown for seamless integration with vector databases, semantic search engines, and note-taking workflows like Obsidian.
Key capabilities:
- Daily arXiv Updates: Fetch and filter the latest preprints from any arXiv channel.
- AI-Powered Summaries: Generate structured, organized summaries using LLMs.
- Paper Metadata: Fetch detailed metadata for any arXiv paper.
- Beautiful Output: Colorful terminal output, syntax highlighting, and progress bars using the Rich library.
- Smart Filtering: Filter by arXiv categories and channels for focused research.
- Obsidian Integration: Export summaries as Markdown with frontmatter for knowledge management.
Quick Start
Install
Install the package from PyPI:
pip install arxiv-daily
Or install from source for development:
git clone https://github.com/GZU-MuTian/arxiv-daily.git
cd arxiv-daily
pip install -e .
Environment Setup (Recommended)
To streamline usage and avoid repetitive CLI flags, we recommend configuring environment variables. This approach simplifies command execution and enhances security by avoiding credentials in command history.
# LLM Configuration (required)
DEEPSEEK_API_KEY="your-deepseek-api-key-here"
# Default arXiv categories (comma-separated)
export ARXIV_CATEGORY="cs.AI,astro-ph.HE,hep-ph"
# Default output directory for summaries (optional)
export ARXIV_SUMMARIZE_OUTPUT="/path/to/your/obsidian/vault"
# Default output directory for knowledge graph concepts (optional)
export ARXIV_EXTRACTOR_OUTPUT="/path/to/your/obsidian/vault/concepts"
Usage Guide
Command-Line Interface
arxiv-daily includes a CLI named arXiv.
Tip: Run
arXiv --helpfor an overview, orarXiv <command> --helpfor command-specific options.
Fetch the latest preprints from any arXiv channel with beautiful terminal formatting:
# Get the latest papers in Astrophysics
arXiv new
# Specific channel (e.g., Computer Science - AI)
arXiv new --channel cs.AI
# Filter by multiple categories
arXiv new --channel astro-ph --category astro-ph.HE,astro-ph.IM
Fetch Paper Metadata:
# Get metadata for a specific paper
arXiv meta 2401.12345
# Supports various input formats
arXiv meta arXiv:2401.12345
arXiv meta arXiv:2401.12345v1
Generate AI Summaries:
# Basic summary with default model (DeepSeek)
arXiv summarize 2401.12345
# Specify model and provider
arXiv summarize 2401.12345 --model deepseek-chat --provider deepseek
# Short form
arXiv summarize 2401.12345 -m deepseek-chat -p deepseek -t 0.5
# Save to file (if ARXIV_SUMMARIZE_OUTPUT is set)
arXiv summarize 2401.12345
# Save to specific directory
arXiv summarize 2401.12345 -o /path/to/output
Extract Knowledge Graph Relationships:
# Basic extraction with default model (DeepSeek)
arXiv extractor 2401.12345
# Specify model and provider
arXiv extractor 2401.12345 --model deepseek-chat --provider deepseek
# Short form
arXiv extractor 2401.12345 -m deepseek-chat -p deepseek -t 0.5
# Save concept files to directory (if ARXIV_EXTRACTOR_OUTPUT is set)
arXiv extractor 2401.12345
# Save to specific directory
arXiv extractor 2401.12345 -o /path/to/concepts
The extractor command analyzes paper summaries and extracts key concepts with their relationships, creating a structured knowledge graph. Each concept is categorized and linked to the source paper, making it perfect for building a personal research knowledge base.
Obsidian Integration:
When using the -o option, concepts are saved as individual Markdown files with:
- YAML frontmatter for metadata
- Obsidian-style links (
[[arxiv-id]]) - Automatic deduplication (same paper won't be added twice)
Adjust verbosity for debugging or quiet runs:
# Production - errors only (default)
arXiv --log-level ERROR new
# Short form for detailed debugging
arXiv -v DEBUG new
Knowledge Graph Extraction
The arXiv extractor command builds a structured knowledge base by extracting key concepts and relationships from academic papers.
Concept Categories
The extractor classifies concepts into these research domains:
- galaxy-physics: Galaxy formation, evolution, dynamics
- cosmology: Dark matter, cosmic microwave background, large-scale structure
- earth-planetary: Exoplanets, planetary atmospheres, astrobiology
- high-energy-astrophysics: Black holes, neutron stars, gamma-ray bursts
- solar-stellar: Stellar evolution, solar physics, star formation
- statistics-ai: Machine learning, statistical methods, neural networks
- numerical-simulation: N-body simulations, hydrodynamics, radiative transfer
- instrumental-design: Telescopes, spectrographs, detectors
- astronomical-events: Supernovae, gravitational waves, fast radio bursts
Example Workflow
# 1. Generate summary first
arXiv summarize 2401.12345 -o ./summaries
# 2. Extract knowledge graph
arXiv extractor 2401.12345 -o ./concepts
Integration with Obsidian
The extractor is designed to work seamlessly with Obsidian:
- Backlinks: Use
[[arxiv-id]]syntax for paper references - Tags: Automatic tagging for easy filtering
- Graph View: Visualize connections between papers and concepts
- Search: Find all papers mentioning a specific concept
Project Structure
arxiv_daily/
├── agents.py # LangGraph agents for complex summarization workflows
├── chains.py # LangChain chains for LLM interactions (includes KnowledgeGraphExtractor)
├── cli.py # Command-line interface built with Typer
├── core.py # Core functions (_run_new, _run_summarize, _run_extractor)
├── llm_client.py # Unified LLM provider interface
├── utils.py # Utility functions
└── __init__.py
Related Resources
Contact
For questions and support:
- Author: Yu Liu
- Email: yuliu@gzu.edu.cn
- GitHub Issues: Report bugs or request features
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file arxiv_daily-0.1.5.tar.gz.
File metadata
- Download URL: arxiv_daily-0.1.5.tar.gz
- Upload date:
- Size: 167.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9911045c9e235b76a3837d829a7d9f6918cf7af8691c8374d2b80f924f606e2b
|
|
| MD5 |
ea06f7a6b74998dd7ac16f5a63753c8d
|
|
| BLAKE2b-256 |
df60931419a2a600938f0fba7175f052f28d98f444a6d1ed1c1a5baafed8c5f5
|
File details
Details for the file arxiv_daily-0.1.5-py3-none-any.whl.
File metadata
- Download URL: arxiv_daily-0.1.5-py3-none-any.whl
- Upload date:
- Size: 24.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df78d6df65ced214657e53f4235377f97c8578b845b0a97c3a14b4c2e15c2ba0
|
|
| MD5 |
1481da300d2274e20fcd42465e21a9aa
|
|
| BLAKE2b-256 |
d285385890c180db6648fccb8961f7bc75ccfd1867cb4a426f135c12f14d95a8
|