Skip to main content

Systematic review, literature analysis, PRISMA export, and citation-safe research workflows.

Project description

LitSynth: Research Management System

Research Management System (RMS) is a local-first research workflow for systematic review, literature analysis, provenance-aware retrieval, and citation-safe writing.

It combines:

  • PDF and RIS/BibTeX ingestion
  • screening and review-table generation
  • PRISMA-oriented export workflows
  • evidence-grounded research chat over indexed papers
  • deterministic citation insertion backed by authoritative metadata

RMS is designed to run in three operating modes:

  • local only: everything stays on your machine, including vector storage and project files
  • cloud only: the API and web app run on Google Cloud Run with cloud-managed storage and hosted LLM providers
  • hybrid: local indexing and private project files stay on the workstation while selected artifacts, configuration, or indexes are mirrored to Google Cloud for demos or team access

The system architecture and deployment design live in docs/architecture.md.

System Design Diagram

flowchart TD
  user[Researcher] --> ui[Streamlit UI\napps/webapp/app.py]
  ui --> api[FastAPI API\napps/api/main.py]
  api --> orch[RMS Orchestrator\nand Review Pipeline]

  orch --> ingest[PDF + RIS/BibTeX Ingestion]
  orch --> review[Screening + Review Matrix\nPRISMA Exports]
  orch --> citation[Citation Store +\nDeterministic Insertion]
  orch --> rag[RAG Routing]

  rag --> chroma[Chroma Vector DB\nBAAI/bge-base-en-v1.5]
  rag --> rmsllm[Local RMS Answering\nOllama qwen3:8b]
  rag --> paperqa[PaperQA Adapter\nollama/nomic-embed-text]

  ingest --> files[Project Files\nPDFs RIS Outputs]
  review --> files
  citation --> files
  chroma --> files

  files -. hybrid sync .-> gcs[Google Cloud Storage\nproject artifacts + manifests]
  gcs --> cloudapi[Cloud Run API]
  gcs --> cloudui[Cloud Run Web App]
  cloudapi --> hosted[Hosted LLM Providers\nOpenAI Claude Gemini]

What RMS Does

RMS is built around an end-to-end research pipeline rather than a single chat surface.

Core capabilities:

  • ingest RIS metadata and PDF full text
  • validate and screen papers before review
  • index research papers into a local vector database
  • answer questions with supporting evidence chunks
  • generate review-ready outputs such as Excel matrices and PRISMA artifacts
  • insert citations into markdown drafts using imported RIS/BibTeX records only

Citation integrity is a hard constraint in this repository:

  • citations come from imported RIS/BibTeX metadata
  • the system does not treat LLM-generated references as authoritative
  • citation insertion requires source document and character-range provenance

Repository Layout

apps/api/                 FastAPI service for search, RAG, citations, status, and indexing
apps/webapp/              Streamlit UI for review workflows and Research Copilot
apps/web-static/          Static frontend assets
docs/                     Architecture, roadmap, PRISMA, and product documentation
infra/aws/                AWS deployment artifacts kept for reference
infra/gcp/                Cloud Run deployment assets for API and Streamlit UI
papers/                   Research writeups and project papers
rms/                      Core pipeline, retrieval, citation, and orchestration modules
extensions/asreview-rms/  ASReview-oriented extension package

System Components

1. Core RMS library

The rms package contains the research pipeline:

  • PDF parsing and chunking
  • keyword and threshold-based screening
  • local vector indexing with Chroma
  • semantic retrieval and reranking hooks
  • local and API-backed LLM integrations
  • citation-store and deterministic insertion workflows

2. FastAPI backend

The API in apps/api/main.py exposes the main service surface:

  • GET /health
  • GET /system-status
  • POST /index-documents
  • POST /semantic-search
  • POST /rag-with-provenance
  • citation load and insertion endpoints

This service owns retrieval orchestration, provider routing, status inspection, and corpus indexing.

3. Streamlit web application

The UI in apps/webapp/app.py is the main operator console for:

  • review setup and filtering
  • Excel and PRISMA export
  • Research Copilot chat
  • provider/model configuration
  • vector corpus status and indexing controls
  • citation-safe export to markdown

4. Local and hosted LLM paths

RMS currently supports local-first and hosted model usage:

  • local Ollama for RMS and PaperQA-backed workflows
  • direct local Qwen inference for review generation on macOS with mps
  • OpenAI, Claude, and Gemini for hosted or user-provided API workflows

The retrieval embedding path and the answer-generation path are intentionally separate. Today the default split is:

  • RMS local vector embedding: BAAI/bge-base-en-v1.5
  • RMS default answer model: ollama/qwen3:8b
  • PaperQA local embedding: ollama/nomic-embed-text

Quick Start

Prerequisites

  • Python 3.10+
  • macOS, Linux, or a compatible container runtime
  • optional: Ollama for local inference

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -e .

If you publish this package to PyPI, the intended public install flow is:

pip install litsynth

If you prefer requirements-based installation:

pip install -r requirements.txt

Local LLM setup with Ollama

If you want fully local answering:

ollama serve
ollama pull qwen3:8b
ollama pull llama3.1:8b
ollama pull nomic-embed-text

Run the API and web app

If you install the package, you can now launch both services with one command:

litsynth launch

This starts the FastAPI backend and the Streamlit UI together, then opens the browser.

If you prefer the manual two-terminal workflow, use the commands below.

From the repository root:

PYTHONPATH="$PWD" ./.venv/bin/python -m uvicorn apps.api.main:app --host 127.0.0.1 --port 8000

In a second terminal:

PYTHONPATH="$PWD" RMS_API_URL="http://127.0.0.1:8000" ./.venv/bin/streamlit run apps/webapp/app.py --server.address 127.0.0.1 --server.port 8501 --server.headless true

Open the UI in a browser and use the system-status panel to confirm:

  • indexed paper count
  • active RMS embedding model
  • active RMS and PaperQA answer models
  • available Ollama models

Index papers

You can index papers from the UI or directly through the API:

curl -X POST http://127.0.0.1:8000/index-documents \
  -H "Content-Type: application/json" \
  -d '{"directory_path": "data/mdpi", "max_files": 10}'

Run the CLI

The package installs an rms command:

rms --help

It also installs a litsynth launcher, so the beginner-friendly local startup flow is:

litsynth launch

Use litsynth launch --help to override ports or disable automatic browser opening.

Typical review run:

rms run-review \
  --ris-dir data \
  --pdf-dir data/mdpi \
  --output-dir outputs/literature_review \
  --provider ollama \
  --model qwen3:8b

Configuration

Important runtime settings:

  • RMS_API_URL: Streamlit UI target for the backend API
  • OLLAMA_BASE_URL: local or remote Ollama endpoint
  • RMS_EMBEDDING_MODEL: RMS vector embedding model, default BAAI/bge-base-en-v1.5
  • RMS_PAPERQA_EMBEDDING: optional PaperQA embedding override
  • OPENAI_API_KEY: hosted OpenAI provider
  • ANTHROPIC_API_KEY: hosted Claude provider
  • GEMINI_API_KEY: hosted Gemini provider

If you change the RMS embedding model, reindex the corpus. Mixing old and new embeddings in the same vector store is a bad retrieval configuration.

Local, Cloud, and Hybrid Operation

Local only

Use this mode when data privacy and low-latency iteration matter most.

  • PDFs, RIS files, outputs, and Chroma persistence remain on local disk
  • Ollama and local Qwen can serve all generation paths
  • no external cloud service is required for the main workflow

Cloud only

Use this mode when you want a hosted demo or a shared environment.

Hybrid

Use this mode when local research data stays private but you still want a hosted demo or sync target.

  • local workstation remains the source of truth for PDFs and indexing
  • selected outputs, manifests, and review artifacts are mirrored to GCS
  • cloud deployment uses the same embedding configuration and project manifest to avoid retrieval drift
  • hosted UI can point to a local or cloud API depending on the demo topology

Recommended Google Cloud Sync Design

RMS already has Cloud Run deployment assets. For consistent local, cloud, and hybrid behavior, keep these assets synchronized at the project level:

  1. project files Store PDFs, RIS files, exported review sheets, and PRISMA outputs under a project-scoped directory locally and a matching prefix in GCS.

  2. embedding manifest Persist a small project manifest with at least:

  • embedding model name
  • chunk size and overlap
  • vector store type
  • index build timestamp
  • corpus file list or content hashes
  1. vector index lifecycle Rebuild the cloud index whenever the embedding model, chunking policy, or document set changes. Do not assume vector files produced with one embedding family are valid for another.

  2. provider separation Keep retrieval embeddings, answer-generation models, and citation metadata as separate concerns. A Qwen or Llama answer model can work well with a BGE or Nomic embedding model as long as the retrieval layer is internally consistent.

  3. citation authority Sync RIS/BibTeX files and citation logs as first-class project artifacts. Citation metadata should not be reconstructed from chat output.

The detailed system design for these modes is documented in docs/architecture.md.

Additional Documentation

Current State and Boundaries

What is implemented now:

  • local Chroma-backed retrieval
  • API and Streamlit app for indexing, search, and citation-safe workflows
  • Cloud Run packaging for the API and UI
  • local-first research chat with evidence chunks and visible corpus status

What still requires deployment choices outside the core local default:

  • managed cloud vector storage
  • object-storage sync and project manifests as an operational convention
  • production secret handling for hosted LLM providers

That separation is intentional. The repository already runs well locally, and the cloud design can be layered on without weakening the local research workflow.

Packaging Direction

Yes, a single-command beginner flow is possible and is now wired into the package metadata.

  • after package installation, users can run litsynth launch
  • once published to PyPI under the same name, the public install flow becomes pip install litsynth

Yes, a macOS desktop app is also realistic.

The clean upgrade path is:

  1. stabilize the local litsynth launch flow
  2. package the same launcher and backend into a desktop shell
  3. ship a beginner-friendly macOS app bundle that starts the local services automatically

For a future desktop version, the lowest-friction options are usually:

  • PyInstaller or Briefcase for a Python-first desktop package
  • Tauri or Electron if you want a more polished native app shell later

For this codebase, the simplest near-term path is a Python-packaged macOS app that wraps the same FastAPI plus Streamlit launcher you now have.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

litsynth-0.1.1.tar.gz (91.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

litsynth-0.1.1-py3-none-any.whl (103.4 kB view details)

Uploaded Python 3

File details

Details for the file litsynth-0.1.1.tar.gz.

File metadata

  • Download URL: litsynth-0.1.1.tar.gz
  • Upload date:
  • Size: 91.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for litsynth-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0b95024ad537125d31c281c870c10e115e4455dd9251655848486f4275b14f4a
MD5 5bf6f3546743e75dc09cbce3e0fd6b97
BLAKE2b-256 be3fe9a79f1cff74477115565c8b54f8ca3fc21ea9167b7fef515b9eb35e6ae4

See more details on using hashes here.

File details

Details for the file litsynth-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: litsynth-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 103.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for litsynth-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e365d1bcd16d0f12b2087c4374d3f3ab43d603cab459f9605bff3fcf27389e7c
MD5 6d40eb8649b542e21a15817e47f159ad
BLAKE2b-256 cbe2e74be5bc5cff37f9ca24deb05e0c82c0656dbb180d80801ea94f9dc1ce89

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page