Systematic review, literature analysis, PRISMA export, and citation-safe research workflows.

Project description

LitSynth: Research Management System

Research Management System (RMS) is a local-first research workflow for systematic review, literature analysis, provenance-aware retrieval, and citation-safe writing.

It combines:

PDF and RIS/BibTeX ingestion
screening and review-table generation
PRISMA-oriented export workflows
evidence-grounded research chat over indexed papers
deterministic citation insertion backed by authoritative metadata

RMS is designed to run in three operating modes:

local only: everything stays on your machine, including vector storage and project files
cloud only: the API and web app run on Google Cloud Run with cloud-managed storage and hosted LLM providers
hybrid: local indexing and private project files stay on the workstation while selected artifacts, configuration, or indexes are mirrored to Google Cloud for demos or team access

The system architecture and deployment design live in docs/architecture.md.

System Design Diagram

flowchart TD
  user[Researcher] --> ui[Streamlit UI\napps/webapp/app.py]
  ui --> api[FastAPI API\napps/api/main.py]
  api --> orch[RMS Orchestrator\nand Review Pipeline]

  orch --> ingest[PDF + RIS/BibTeX Ingestion]
  orch --> review[Screening + Review Matrix\nPRISMA Exports]
  orch --> citation[Citation Store +\nDeterministic Insertion]
  orch --> rag[RAG Routing]

  rag --> chroma[Chroma Vector DB\nBAAI/bge-base-en-v1.5]
  rag --> rmsllm[Local RMS Answering\nOllama qwen3:8b]
  rag --> paperqa[PaperQA Adapter\nollama/nomic-embed-text]

  ingest --> files[Project Files\nPDFs RIS Outputs]
  review --> files
  citation --> files
  chroma --> files

  files -. hybrid sync .-> gcs[Google Cloud Storage\nproject artifacts + manifests]
  gcs --> cloudapi[Cloud Run API]
  gcs --> cloudui[Cloud Run Web App]
  cloudapi --> hosted[Hosted LLM Providers\nOpenAI Claude Gemini]

What RMS Does

RMS is built around an end-to-end research pipeline rather than a single chat surface.

Core capabilities:

ingest RIS metadata and PDF full text
validate and screen papers before review
index research papers into a local vector database
answer questions with supporting evidence chunks
generate review-ready outputs such as Excel matrices and PRISMA artifacts
insert citations into markdown drafts using imported RIS/BibTeX records only

Citation integrity is a hard constraint in this repository:

citations come from imported RIS/BibTeX metadata
the system does not treat LLM-generated references as authoritative
citation insertion requires source document and character-range provenance

Repository Layout

apps/api/                 FastAPI service for search, RAG, citations, status, and indexing
apps/webapp/              Streamlit UI for review workflows and Research Copilot
apps/web-static/          Static frontend assets
docs/                     Architecture, roadmap, PRISMA, and product documentation
infra/aws/                AWS deployment artifacts kept for reference
infra/gcp/                Cloud Run deployment assets for API and Streamlit UI
papers/                   Research writeups and project papers
rms/                      Core pipeline, retrieval, citation, and orchestration modules
extensions/asreview-rms/  ASReview-oriented extension package

System Components

1. Core RMS library

The rms package contains the research pipeline:

PDF parsing and chunking
keyword and threshold-based screening
local vector indexing with Chroma
semantic retrieval and reranking hooks
local and API-backed LLM integrations
citation-store and deterministic insertion workflows

2. FastAPI backend

The API in apps/api/main.py exposes the main service surface:

GET /health
GET /system-status
POST /index-documents
POST /semantic-search
POST /rag-with-provenance
citation load and insertion endpoints

This service owns retrieval orchestration, provider routing, status inspection, and corpus indexing.

3. Streamlit web application

The UI in apps/webapp/app.py is the main operator console for:

review setup and filtering
Excel and PRISMA export
Research Copilot chat
provider/model configuration
vector corpus status and indexing controls
citation-safe export to markdown

4. Local and hosted LLM paths

RMS currently supports local-first and hosted model usage:

local Ollama for RMS and PaperQA-backed workflows
direct local Qwen inference for review generation on macOS with mps
OpenAI, Claude, and Gemini for hosted or user-provided API workflows

The retrieval embedding path and the answer-generation path are intentionally separate. Today the default split is:

RMS local vector embedding: BAAI/bge-base-en-v1.5
RMS default answer model: ollama/qwen3:8b
PaperQA local embedding: ollama/nomic-embed-text

Quick Start

Prerequisites

Python 3.10+
macOS, Linux, or a compatible container runtime
optional: Ollama for local inference

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -e .

If you publish this package to PyPI, the intended public install flow is:

pip install litsynth

If you prefer requirements-based installation:

pip install -r requirements.txt

Local LLM setup with Ollama

If you want fully local answering:

ollama serve
ollama pull qwen3:8b
ollama pull llama3.1:8b
ollama pull nomic-embed-text

Run the API and web app

If you install the package, you can now launch both services with one command:

litsynth launch

This starts the FastAPI backend and the Streamlit UI together, then opens the browser.

If you prefer the manual two-terminal workflow, use the commands below.

From the repository root:

PYTHONPATH="$PWD" ./.venv/bin/python -m uvicorn apps.api.main:app --host 127.0.0.1 --port 8000

In a second terminal:

PYTHONPATH="$PWD" RMS_API_URL="http://127.0.0.1:8000" ./.venv/bin/streamlit run apps/webapp/app.py --server.address 127.0.0.1 --server.port 8501 --server.headless true

Open the UI in a browser and use the system-status panel to confirm:

indexed paper count
active RMS embedding model
active RMS and PaperQA answer models
available Ollama models

Index papers

You can index papers from the UI or directly through the API:

curl -X POST http://127.0.0.1:8000/index-documents \
  -H "Content-Type: application/json" \
  -d '{"directory_path": "data/mdpi", "max_files": 10}'

Run the CLI

The package installs an rms command:

rms --help

It also installs a litsynth launcher, so the beginner-friendly local startup flow is:

litsynth launch

Use litsynth launch --help to override ports or disable automatic browser opening.

Typical review run:

rms run-review \
  --ris-dir data \
  --pdf-dir data/mdpi \
  --output-dir outputs/literature_review \
  --provider ollama \
  --model qwen3:8b

Configuration

Important runtime settings:

RMS_API_URL: Streamlit UI target for the backend API
OLLAMA_BASE_URL: local or remote Ollama endpoint
RMS_EMBEDDING_MODEL: RMS vector embedding model, default BAAI/bge-base-en-v1.5
RMS_PAPERQA_EMBEDDING: optional PaperQA embedding override
OPENAI_API_KEY: hosted OpenAI provider
ANTHROPIC_API_KEY: hosted Claude provider
GEMINI_API_KEY: hosted Gemini provider

If you change the RMS embedding model, reindex the corpus. Mixing old and new embeddings in the same vector store is a bad retrieval configuration.

Local, Cloud, and Hybrid Operation

Local only

Use this mode when data privacy and low-latency iteration matter most.

PDFs, RIS files, outputs, and Chroma persistence remain on local disk
Ollama and local Qwen can serve all generation paths
no external cloud service is required for the main workflow

Cloud only

Use this mode when you want a hosted demo or a shared environment.

deploy the API and Streamlit UI from infra/gcp/cloudrun-api/README.md and infra/gcp/cloudrun-webapp/README.md
keep project files and generated outputs in Google Cloud Storage
use hosted LLM providers or a remotely reachable Ollama endpoint
keep the embedding model fixed across index build and query time

Hybrid

Use this mode when local research data stays private but you still want a hosted demo or sync target.

local workstation remains the source of truth for PDFs and indexing
selected outputs, manifests, and review artifacts are mirrored to GCS
cloud deployment uses the same embedding configuration and project manifest to avoid retrieval drift
hosted UI can point to a local or cloud API depending on the demo topology

Recommended Google Cloud Sync Design

RMS already has Cloud Run deployment assets. For consistent local, cloud, and hybrid behavior, keep these assets synchronized at the project level:

project files Store PDFs, RIS files, exported review sheets, and PRISMA outputs under a project-scoped directory locally and a matching prefix in GCS.
embedding manifest Persist a small project manifest with at least:

embedding model name
chunk size and overlap
vector store type
index build timestamp
corpus file list or content hashes

vector index lifecycle Rebuild the cloud index whenever the embedding model, chunking policy, or document set changes. Do not assume vector files produced with one embedding family are valid for another.
provider separation Keep retrieval embeddings, answer-generation models, and citation metadata as separate concerns. A Qwen or Llama answer model can work well with a BGE or Nomic embedding model as long as the retrieval layer is internally consistent.
citation authority Sync RIS/BibTeX files and citation logs as first-class project artifacts. Citation metadata should not be reconstructed from chat output.

The detailed system design for these modes is documented in docs/architecture.md.

Additional Documentation

Current State and Boundaries

What is implemented now:

local Chroma-backed retrieval
API and Streamlit app for indexing, search, and citation-safe workflows
Cloud Run packaging for the API and UI
local-first research chat with evidence chunks and visible corpus status

What still requires deployment choices outside the core local default:

managed cloud vector storage
object-storage sync and project manifests as an operational convention
production secret handling for hosted LLM providers

That separation is intentional. The repository already runs well locally, and the cloud design can be layered on without weakening the local research workflow.

Packaging Direction

Yes, a single-command beginner flow is possible and is now wired into the package metadata.

after package installation, users can run litsynth launch
once published to PyPI under the same name, the public install flow becomes pip install litsynth

Yes, a macOS desktop app is also realistic.

The clean upgrade path is:

stabilize the local litsynth launch flow
package the same launcher and backend into a desktop shell
ship a beginner-friendly macOS app bundle that starts the local services automatically

For a future desktop version, the lowest-friction options are usually:

PyInstaller or Briefcase for a Python-first desktop package
Tauri or Electron if you want a more polished native app shell later

For this codebase, the simplest near-term path is a Python-packaged macOS app that wraps the same FastAPI plus Streamlit launcher you now have.

Project details

Release history Release notifications | RSS feed

0.1.3

May 1, 2026

0.1.1

May 1, 2026

This version

0.1.0

May 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

litsynth-0.1.0.tar.gz (91.5 kB view details)

Uploaded May 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

litsynth-0.1.0-py3-none-any.whl (103.3 kB view details)

Uploaded May 1, 2026 Python 3

File details

Details for the file litsynth-0.1.0.tar.gz.

File metadata

Download URL: litsynth-0.1.0.tar.gz
Upload date: May 1, 2026
Size: 91.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for litsynth-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`f2e5ad37b9f96182f327f4859ba216a2227ec11bb6a53c9b4b8880f992087b2f`
MD5	`0ecd3282ece7dc671ff29c3838898eb8`
BLAKE2b-256	`cfea52cbaa8c9f2899948aef325abebf047f82d009c85a2da512c9eaaacab54e`

See more details on using hashes here.

File details

Details for the file litsynth-0.1.0-py3-none-any.whl.

File metadata

Download URL: litsynth-0.1.0-py3-none-any.whl
Upload date: May 1, 2026
Size: 103.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for litsynth-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`daa884dcb3f931ed5b5c0b45834398111e5d05e6960d3fd6ba0897216537942e`
MD5	`721b45d01806f942bda362f2b47a5d8d`
BLAKE2b-256	`6adc14d19e926fbe9596f30347f77cf9b10e3ded59138b2b80819ed186bc4879`

See more details on using hashes here.

litsynth 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

LitSynth: Research Management System

System Design Diagram

What RMS Does

Repository Layout

System Components

1. Core RMS library

2. FastAPI backend

3. Streamlit web application

4. Local and hosted LLM paths

Quick Start

Prerequisites

Local LLM setup with Ollama

Run the API and web app

Index papers

Run the CLI

Configuration

Local, Cloud, and Hybrid Operation

Local only

Cloud only

Hybrid

Recommended Google Cloud Sync Design

Additional Documentation

Current State and Boundaries

Packaging Direction

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes