Skip to main content

Local-first paper manager with semantic search and LLM reasoning

Project description

📚 Lemma - Local-First Research Paper Manager

A privacy-first research paper manager with local semantic search and optional AI-powered insights.

Python 3.10+ License: MIT

✨ Features

  • 🔒 Privacy-First: All papers stored locally, no cloud uploads
  • 🚀 Fast Semantic Search: Local vector search across all papers
  • 🤖 AI Q&A (Optional): Ask questions using cloud LLMs
  • 📊 Auto-Processing: One command to scan, rename, and embed
  • 🔄 Incremental Updates: 70-90% faster re-embedding
  • 📂 Smart Cleanup: Automatically removes deleted papers from database
  • 👀 Watch Mode: Auto-process new papers as they're added

🚀 Quick Start

1. Installation

pip install -r requirements.txt

2. Set Your Papers Folder (One-Time Setup)

# Set default papers directory and process all papers
lemma sync ~/Papers --set-default

# That's it! Your papers are now:
# ✓ Scanned and indexed
# ✓ Renamed with metadata
# ✓ Embedded for semantic search
# ✓ Ready for questions

3. Add New Papers (Automatic)

Option A: Manual Sync

# Just drop PDFs into ~/Papers, then run:
lemma sync

Option B: Auto-Sync (Watch Mode)

# Start watching (leave running in terminal)
lemma sync --watch

# Now just drop PDFs into ~/Papers
# They're automatically processed in seconds!

4. Query Your Papers

lemma ask "What are the main findings?"

📖 Common Workflows

First-Time Setup

# 1. Set your papers folder and sync everything
lemma sync ~/Papers --set-default

# 2. Query your papers
lemma ask "What is the main contribution?"

Daily Use

# Download new papers to ~/Papers, then:
lemma sync

# Or enable auto-processing:
lemma sync --watch  # Leave running

Browse Your Library

lemma list                    # List all papers
lemma search "transformers"   # Search by keyword
lemma show 5                  # Show paper details

🔧 Advanced Usage

Sync Options

lemma sync                    # Use default directory
lemma sync ~/Papers           # Specify directory
lemma sync --no-rename        # Skip automatic renaming
lemma sync --no-embed         # Skip embedding (faster)
lemma sync --watch            # Continuous monitoring

Manual Control (If Needed)

lemma scan ~/Papers           # Just scan (no rename/embed)
lemma organize                # Rename existing files
lemma embed                   # Generate embeddings only
lemma embed-status            # Check embedding coverage
lemma verify --remove         # Clean up missing files

🤖 AI Q&A Setup (Optional)

Set up an API key to enable question answering:

# Option 1: Environment variable
export GROQ_API_KEY="your_key_here"

# Option 2: .env file
echo "GROQ_API_KEY=your_key_here" > ~/.lemma/.env

# Then ask questions
lemma ask "What are the main approaches discussed?"

Get Free API Keys:

  • Groq - Fast and generous free tier (recommended)
  • Google Gemini - Alternative option

📋 Key Commands

Command Description
lemma sync Auto-process papers (scan + rename + embed)
lemma sync --watch Monitor folder and auto-process new papers
lemma list List all indexed papers
lemma ask <question> Ask questions across papers (requires API key)
lemma search <query> Search papers by keyword
lemma show <id> Show paper details
lemma embed-status Check embedding coverage

🔒 Privacy

  • All papers stay on your machine - never uploaded anywhere
  • Embeddings generated locally - no external API calls
  • Cloud APIs only for Q&A - and only if you configure them
  • Database stored locally at ~/.lemma/lemma.db

📦 Requirements

  • Python 3.10 or higher
  • ~500MB disk space for embedding models
  • Internet connection only for optional AI Q&A

License

MIT License - see LICENSE file for details

Support

  • Report issues: GitHub Issues
  • Questions: Open a discussion on GitHub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lemma_ai-2.0.0.tar.gz (68.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lemma_ai-2.0.0-py3-none-any.whl (74.3 kB view details)

Uploaded Python 3

File details

Details for the file lemma_ai-2.0.0.tar.gz.

File metadata

  • Download URL: lemma_ai-2.0.0.tar.gz
  • Upload date:
  • Size: 68.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for lemma_ai-2.0.0.tar.gz
Algorithm Hash digest
SHA256 691d77fcfc66eb1a9c8cd125c5965525922c085e0699caf3a96fd3b80bcbe786
MD5 6a9382b2f1a1b6e257bd4c020a2d59b3
BLAKE2b-256 ab40581c686bddeac95762c927614ce024976e6dcf747c4d6a99dcf8fdccf33f

See more details on using hashes here.

File details

Details for the file lemma_ai-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: lemma_ai-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 74.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for lemma_ai-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 07bb46ceb57c9e7d03bbfbc6828c56f1498e5a0f638f912abb8ba982491ef48a
MD5 a4f7ca0ed610f214c36339dbfb0c6c06
BLAKE2b-256 522e5bed5a31a30db51c5b8b5f9a48b45f2d3d4bf06b72ef2960b3e98fdae968

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page