Local-first paper manager with semantic search and LLM reasoning
Project description
📚 Lemma - Local-First Research Paper Manager
A privacy-first research paper manager with local semantic search and optional AI-powered insights.
✨ Features
- 🔒 Privacy-First: All papers stored locally, no cloud uploads
- 🚀 Fast Semantic Search: Local vector search across all papers
- 🤖 AI Q&A (Optional): Ask questions using cloud LLMs
- 📊 Auto-Processing: One command to scan, rename, and embed
- 🔄 Incremental Updates: 70-90% faster re-embedding
- 📂 Smart Cleanup: Automatically removes deleted papers from database
- 👀 Watch Mode: Auto-process new papers as they're added
🚀 Quick Start
1. Installation
pip install -r requirements.txt
2. Set Your Papers Folder (One-Time Setup)
# Set default papers directory and process all papers
lemma sync ~/Papers --set-default
# That's it! Your papers are now:
# ✓ Scanned and indexed
# ✓ Renamed with metadata
# ✓ Embedded for semantic search
# ✓ Ready for questions
3. Add New Papers (Automatic)
Option A: Manual Sync
# Just drop PDFs into ~/Papers, then run:
lemma sync
Option B: Auto-Sync (Watch Mode)
# Start watching (leave running in terminal)
lemma sync --watch
# Now just drop PDFs into ~/Papers
# They're automatically processed in seconds!
4. Query Your Papers
lemma ask "What are the main findings?"
📖 Common Workflows
First-Time Setup
# 1. Set your papers folder and sync everything
lemma sync ~/Papers --set-default
# 2. Query your papers
lemma ask "What is the main contribution?"
Daily Use
# Download new papers to ~/Papers, then:
lemma sync
# Or enable auto-processing:
lemma sync --watch # Leave running
Browse Your Library
lemma list # List all papers
lemma search "transformers" # Search by keyword
lemma show 5 # Show paper details
🔧 Advanced Usage
Sync Options
lemma sync # Use default directory
lemma sync ~/Papers # Specify directory
lemma sync --no-rename # Skip automatic renaming
lemma sync --no-embed # Skip embedding (faster)
lemma sync --watch # Continuous monitoring
Manual Control (If Needed)
lemma scan ~/Papers # Just scan (no rename/embed)
lemma organize # Rename existing files
lemma embed # Generate embeddings only
lemma embed-status # Check embedding coverage
lemma verify --remove # Clean up missing files
🤖 AI Q&A Setup (Optional)
Set up an API key to enable question answering:
# Option 1: Environment variable
export GROQ_API_KEY="your_key_here"
# Option 2: .env file
echo "GROQ_API_KEY=your_key_here" > ~/.lemma/.env
# Then ask questions
lemma ask "What are the main approaches discussed?"
Get Free API Keys:
- Groq - Fast and generous free tier (recommended)
- Google Gemini - Alternative option
📋 Key Commands
| Command | Description |
|---|---|
lemma sync |
Auto-process papers (scan + rename + embed) |
lemma sync --watch |
Monitor folder and auto-process new papers |
lemma list |
List all indexed papers |
lemma ask <question> |
Ask questions across papers (requires API key) |
lemma search <query> |
Search papers by keyword |
lemma show <id> |
Show paper details |
lemma embed-status |
Check embedding coverage |
🔒 Privacy
- All papers stay on your machine - never uploaded anywhere
- Embeddings generated locally - no external API calls
- Cloud APIs only for Q&A - and only if you configure them
- Database stored locally at
~/.lemma/lemma.db
📦 Requirements
- Python 3.10 or higher
- ~500MB disk space for embedding models
- Internet connection only for optional AI Q&A
License
MIT License - see LICENSE file for details
Support
- Report issues: GitHub Issues
- Questions: Open a discussion on GitHub
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lemma_ai-2.0.0.tar.gz.
File metadata
- Download URL: lemma_ai-2.0.0.tar.gz
- Upload date:
- Size: 68.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
691d77fcfc66eb1a9c8cd125c5965525922c085e0699caf3a96fd3b80bcbe786
|
|
| MD5 |
6a9382b2f1a1b6e257bd4c020a2d59b3
|
|
| BLAKE2b-256 |
ab40581c686bddeac95762c927614ce024976e6dcf747c4d6a99dcf8fdccf33f
|
File details
Details for the file lemma_ai-2.0.0-py3-none-any.whl.
File metadata
- Download URL: lemma_ai-2.0.0-py3-none-any.whl
- Upload date:
- Size: 74.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07bb46ceb57c9e7d03bbfbc6828c56f1498e5a0f638f912abb8ba982491ef48a
|
|
| MD5 |
a4f7ca0ed610f214c36339dbfb0c6c06
|
|
| BLAKE2b-256 |
522e5bed5a31a30db51c5b8b5f9a48b45f2d3d4bf06b72ef2960b3e98fdae968
|