Systematic review, literature analysis, PRISMA export, and citation-safe research workflows.
Project description
LitSynth: Research Management System
Research Management System (RMS) is a local-first research workflow for systematic review, literature analysis, provenance-aware retrieval, and citation-safe writing.
It combines:
- PDF and RIS/BibTeX ingestion
- screening and review-table generation
- PRISMA-oriented export workflows
- evidence-grounded research chat over indexed papers
- deterministic citation insertion backed by authoritative metadata
RMS is designed to run in three operating modes:
- local only: everything stays on your machine, including vector storage and project files
- cloud only: the API and web app run on Google Cloud Run with cloud-managed storage and hosted LLM providers
- hybrid: local indexing and private project files stay on the workstation while selected artifacts, configuration, or indexes are mirrored to Google Cloud for demos or team access
The system architecture and deployment design live in docs/architecture.md.
System Design Diagram
flowchart TD
user[Researcher] --> ui[Streamlit UI\napps/webapp/app.py]
ui --> api[FastAPI API\napps/api/main.py]
api --> orch[RMS Orchestrator\nand Review Pipeline]
orch --> ingest[PDF + RIS/BibTeX Ingestion]
orch --> review[Screening + Review Matrix\nPRISMA Exports]
orch --> citation[Citation Store +\nDeterministic Insertion]
orch --> rag[RAG Routing]
rag --> chroma[Chroma Vector DB\nBAAI/bge-base-en-v1.5]
rag --> rmsllm[Local RMS Answering\nOllama qwen3:8b]
rag --> paperqa[PaperQA Adapter\nollama/nomic-embed-text]
ingest --> files[Project Files\nPDFs RIS Outputs]
review --> files
citation --> files
chroma --> files
files -. hybrid sync .-> gcs[Google Cloud Storage\nproject artifacts + manifests]
gcs --> cloudapi[Cloud Run API]
gcs --> cloudui[Cloud Run Web App]
cloudapi --> hosted[Hosted LLM Providers\nOpenAI Claude Gemini]
What RMS Does
RMS is built around an end-to-end research pipeline rather than a single chat surface.
Core capabilities:
- ingest RIS metadata and PDF full text
- validate and screen papers before review
- index research papers into a local vector database
- answer questions with supporting evidence chunks
- generate review-ready outputs such as Excel matrices and PRISMA artifacts
- insert citations into markdown drafts using imported RIS/BibTeX records only
Citation integrity is a hard constraint in this repository:
- citations come from imported RIS/BibTeX metadata
- the system does not treat LLM-generated references as authoritative
- citation insertion requires source document and character-range provenance
Repository Layout
apps/api/ FastAPI service for search, RAG, citations, status, and indexing
apps/webapp/ Streamlit UI for review workflows and Research Copilot
apps/web-static/ Static frontend assets
docs/ Architecture, roadmap, PRISMA, and product documentation
infra/aws/ AWS deployment artifacts kept for reference
infra/gcp/ Cloud Run deployment assets for API and Streamlit UI
papers/ Research writeups and project papers
rms/ Core pipeline, retrieval, citation, and orchestration modules
extensions/asreview-rms/ ASReview-oriented extension package
System Components
1. Core RMS library
The rms package contains the research pipeline:
- PDF parsing and chunking
- keyword and threshold-based screening
- local vector indexing with Chroma
- semantic retrieval and reranking hooks
- local and API-backed LLM integrations
- citation-store and deterministic insertion workflows
2. FastAPI backend
The API in apps/api/main.py exposes the main service surface:
GET /healthGET /system-statusPOST /index-documentsPOST /semantic-searchPOST /rag-with-provenance- citation load and insertion endpoints
This service owns retrieval orchestration, provider routing, status inspection, and corpus indexing.
3. Streamlit web application
The UI in apps/webapp/app.py is the main operator console for:
- review setup and filtering
- Excel and PRISMA export
- Research Copilot chat
- provider/model configuration
- vector corpus status and indexing controls
- citation-safe export to markdown
4. Local and hosted LLM paths
RMS currently supports local-first and hosted model usage:
- local Ollama for RMS and PaperQA-backed workflows
- direct local Qwen inference for review generation on macOS with
mps - OpenAI, Claude, and Gemini for hosted or user-provided API workflows
The retrieval embedding path and the answer-generation path are intentionally separate. Today the default split is:
- RMS local vector embedding:
BAAI/bge-base-en-v1.5 - RMS default answer model:
ollama/qwen3:8b - PaperQA local embedding:
ollama/nomic-embed-text
Quick Start
Prerequisites
- Python 3.10+
- macOS, Linux, or a compatible container runtime
- optional: Ollama for local inference
Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -e .
If you publish this package to PyPI, the intended public install flow is:
pip install litsynth
If you prefer requirements-based installation:
pip install -r requirements.txt
Local LLM setup with Ollama
If you want fully local answering:
ollama serve
ollama pull qwen3:8b
ollama pull llama3.1:8b
ollama pull nomic-embed-text
Run the API and web app
If you install the package, you can now launch both services with one command:
litsynth launch
This starts the FastAPI backend and the Streamlit UI together, then opens the browser.
If you prefer the manual two-terminal workflow, use the commands below.
From the repository root:
PYTHONPATH="$PWD" ./.venv/bin/python -m uvicorn apps.api.main:app --host 127.0.0.1 --port 8000
In a second terminal:
PYTHONPATH="$PWD" RMS_API_URL="http://127.0.0.1:8000" ./.venv/bin/streamlit run apps/webapp/app.py --server.address 127.0.0.1 --server.port 8501 --server.headless true
Open the UI in a browser and use the system-status panel to confirm:
- indexed paper count
- active RMS embedding model
- active RMS and PaperQA answer models
- available Ollama models
Index papers
You can index papers from the UI or directly through the API:
curl -X POST http://127.0.0.1:8000/index-documents \
-H "Content-Type: application/json" \
-d '{"directory_path": "data/mdpi", "max_files": 10}'
Run the CLI
The package installs an rms command:
rms --help
It also installs a litsynth launcher, so the beginner-friendly local startup flow is:
litsynth launch
Use litsynth launch --help to override ports or disable automatic browser opening.
Typical review run:
rms run-review \
--ris-dir data \
--pdf-dir data/mdpi \
--output-dir outputs/literature_review \
--provider ollama \
--model qwen3:8b
Configuration
Important runtime settings:
RMS_API_URL: Streamlit UI target for the backend APIOLLAMA_BASE_URL: local or remote Ollama endpointRMS_EMBEDDING_MODEL: RMS vector embedding model, defaultBAAI/bge-base-en-v1.5RMS_PAPERQA_EMBEDDING: optional PaperQA embedding overrideOPENAI_API_KEY: hosted OpenAI providerANTHROPIC_API_KEY: hosted Claude providerGEMINI_API_KEY: hosted Gemini provider
If you change the RMS embedding model, reindex the corpus. Mixing old and new embeddings in the same vector store is a bad retrieval configuration.
Local, Cloud, and Hybrid Operation
Local only
Use this mode when data privacy and low-latency iteration matter most.
- PDFs, RIS files, outputs, and Chroma persistence remain on local disk
- Ollama and local Qwen can serve all generation paths
- no external cloud service is required for the main workflow
Cloud only
Use this mode when you want a hosted demo or a shared environment.
- deploy the API and Streamlit UI from infra/gcp/cloudrun-api/README.md and infra/gcp/cloudrun-webapp/README.md
- keep project files and generated outputs in Google Cloud Storage
- use hosted LLM providers or a remotely reachable Ollama endpoint
- keep the embedding model fixed across index build and query time
Hybrid
Use this mode when local research data stays private but you still want a hosted demo or sync target.
- local workstation remains the source of truth for PDFs and indexing
- selected outputs, manifests, and review artifacts are mirrored to GCS
- cloud deployment uses the same embedding configuration and project manifest to avoid retrieval drift
- hosted UI can point to a local or cloud API depending on the demo topology
Recommended Google Cloud Sync Design
RMS already has Cloud Run deployment assets. For consistent local, cloud, and hybrid behavior, keep these assets synchronized at the project level:
-
project files Store PDFs, RIS files, exported review sheets, and PRISMA outputs under a project-scoped directory locally and a matching prefix in GCS.
-
embedding manifest Persist a small project manifest with at least:
- embedding model name
- chunk size and overlap
- vector store type
- index build timestamp
- corpus file list or content hashes
-
vector index lifecycle Rebuild the cloud index whenever the embedding model, chunking policy, or document set changes. Do not assume vector files produced with one embedding family are valid for another.
-
provider separation Keep retrieval embeddings, answer-generation models, and citation metadata as separate concerns. A Qwen or Llama answer model can work well with a BGE or Nomic embedding model as long as the retrieval layer is internally consistent.
-
citation authority Sync RIS/BibTeX files and citation logs as first-class project artifacts. Citation metadata should not be reconstructed from chat output.
The detailed system design for these modes is documented in docs/architecture.md.
Additional Documentation
- docs/architecture.md
- docs/rms_features_prisma_pipeline.md
- apps/api/README.md
- apps/webapp/README.md
- infra/gcp/cloudrun-api/README.md
- infra/gcp/cloudrun-webapp/README.md
Current State and Boundaries
What is implemented now:
- local Chroma-backed retrieval
- API and Streamlit app for indexing, search, and citation-safe workflows
- Cloud Run packaging for the API and UI
- local-first research chat with evidence chunks and visible corpus status
What still requires deployment choices outside the core local default:
- managed cloud vector storage
- object-storage sync and project manifests as an operational convention
- production secret handling for hosted LLM providers
That separation is intentional. The repository already runs well locally, and the cloud design can be layered on without weakening the local research workflow.
Packaging Direction
Yes, a single-command beginner flow is possible and is now wired into the package metadata.
- after package installation, users can run
litsynth launch - once published to PyPI under the same name, the public install flow becomes
pip install litsynth
Yes, a macOS desktop app is also realistic.
The clean upgrade path is:
- stabilize the local
litsynth launchflow - package the same launcher and backend into a desktop shell
- ship a beginner-friendly macOS app bundle that starts the local services automatically
For a future desktop version, the lowest-friction options are usually:
- PyInstaller or Briefcase for a Python-first desktop package
- Tauri or Electron if you want a more polished native app shell later
For this codebase, the simplest near-term path is a Python-packaged macOS app that wraps the same FastAPI plus Streamlit launcher you now have.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file litsynth-0.1.0.tar.gz.
File metadata
- Download URL: litsynth-0.1.0.tar.gz
- Upload date:
- Size: 91.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f2e5ad37b9f96182f327f4859ba216a2227ec11bb6a53c9b4b8880f992087b2f
|
|
| MD5 |
0ecd3282ece7dc671ff29c3838898eb8
|
|
| BLAKE2b-256 |
cfea52cbaa8c9f2899948aef325abebf047f82d009c85a2da512c9eaaacab54e
|
File details
Details for the file litsynth-0.1.0-py3-none-any.whl.
File metadata
- Download URL: litsynth-0.1.0-py3-none-any.whl
- Upload date:
- Size: 103.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
daa884dcb3f931ed5b5c0b45834398111e5d05e6960d3fd6ba0897216537942e
|
|
| MD5 |
721b45d01806f942bda362f2b47a5d8d
|
|
| BLAKE2b-256 |
6adc14d19e926fbe9596f30347f77cf9b10e3ded59138b2b80819ed186bc4879
|