Skip to main content

AI-powered academic article screening and analysis tool

Project description

Lutz

Lutz logo

Languages: English | Português | Español

Python library and command-line tool for organizing, vectorizing, and analyzing academic PDF articles with AI.

DOI Python Version License Status CLI

Tags: systematic review, academic screening, scientific articles, generative AI, LLM, RAG, embeddings, PDF, LanceDB, Python, open science, academic research.

Lutz helps researchers, students, and literature review teams work with large sets of PDF articles. It creates a reproducible project structure, copies PDFs into the right place, performs basic security checks, extracts text, generates embeddings, stores everything in a local vector database, and uses a language model to answer analysis prompts.

Current package version: 0.1.2.

The package is named after Bertha Maria Julia Lutz, an important Brazilian scientist, biologist, and researcher who contributed to biology and to the recognition of science in Brazil.


Table of contents


What Lutz is for

Use Lutz when you need to:

  • Organize a folder of scientific articles in PDF format.
  • Prepare a systematic review, narrative review, literature map, or initial study screening.
  • Ask questions about a set of articles using a language model.
  • Generate structured analysis from Markdown prompts.
  • Keep files, prompts, vector data, and reports inside a reproducible project.

Lutz does not replace critical reading or methodological decisions by researchers. It is a support tool for accelerating organization, semantic search, and first-pass synthesis of texts.


How Lutz works

PDFs -> security check -> text extraction -> [section parsing] -> embeddings -> vector database -> LLM analysis -> JSON report

Basic flow:

  1. lutz init creates a project folder with subfolders, prompt templates, and .env.example.
  2. lutz load copies your PDFs into articles/.
  3. lutz vectorize checks PDFs, extracts text, optionally splits articles into labeled sections (abstract, introduction, methodology…), chunks, and creates embeddings.
  4. lutz analysis uses a Markdown prompt to analyze the vectorized articles.
  5. Results are stored in analysis/execution_reports/.

Before you start

You will need:

  • A computer running Windows, macOS, or Linux.
  • Terminal access. On Windows, use PowerShell; on macOS and Linux, use Terminal.
  • Python 3.10 or higher.
  • A folder with your PDF articles.
  • An AI model for analysis: self-hosted via Docker Model Runner, Ollama, or llama.cpp; OpenAI/OpenRouter; or Anthropic.

The recommended installation path uses the package published on PyPI.


Installation

From PyPI

  1. Install Python 3.10 or higher.

Check your version:

python --version

On some systems, the command may be python3 --version.

  1. Create and activate a virtual environment.

Linux or macOS:

python -m venv .venv
source .venv/bin/activate

Windows PowerShell:

python -m venv .venv
.\.venv\Scripts\Activate.ps1
  1. Install Lutz.
python -m pip install --upgrade pip
pip install lutz-research
  1. Test the installation.
lutz --help
lutz --version

From source

Use this option if you want to contribute or run the latest code from the repository.

git clone https://github.com/jooguilhermesc/lutz.git
cd lutz
python -m pip install --upgrade pip
pip install -e .

First use, step by step

The commands below assume that lutz already works in your terminal.

1. Create a folder for your review

mkdir my-review
cd my-review
lutz init

Lutz creates a structure similar to this:

articles/                   research PDFs
prompts/                    prompt templates
analysis/execution_reports/ generated reports
.env.example                configuration example
README.md                   project notes

2. Configure AI models

Copy the example file:

Linux or macOS:

cp .env.example .env

Windows PowerShell:

Copy-Item .env.example .env

Open .env in a text editor and choose one of the configurations from Model configuration.

3. Add PDFs to the project

You can manually copy files into articles/ or use the load command.

Linux example:

lutz load --f ~/Downloads/my-articles --so linux

macOS example:

lutz load --f ~/Desktop/articles --so mac

Windows example:

lutz load --f "C:\Users\Ana\Downloads\articles" --so windows

If the PDFs are already in articles/, you can skip this step.

4. Create the article vector index

lutz vectorize

This command may take time on the first run, especially if there are many PDFs or if a local model still needs to be downloaded.

5. Run an analysis

lutz analysis --p prompts/systematic_review.md

To analyze each article separately, use:

lutz analysis --p prompts/systematic_review.md --per-article

6. Open the result

Files are stored in:

analysis/execution_reports/

Each run generates a .json file with metadata, articles used, token usage, and the model response.


Model configuration

Configuration lives in .env, created from .env.example.

Local/self-hosted option: Docker Model Runner

This option uses local models through Docker Model Runner and does not require an external API key.

  1. Pull the models.
docker model pull nomic-embed-text
docker model pull ai/llama3.2
  1. Configure .env.
EMBEDDING_PROVIDER=docker_model_runner
EMBEDDING_MODEL=nomic-embed-text

LLM_PROVIDER=docker_model_runner
LLM_MODEL=ai/llama3.2

DOCKER_MODEL_HOST=http://localhost:12434/engines/v1

Self-hosted option with Ollama or llama.cpp

Lutz can also use local servers compatible with the OpenAI API, including Ollama and llama.cpp server.

For local endpoints, OPENAI_API_KEY can be a dummy value when the server does not require authentication.

Example with Ollama:

EMBEDDING_PROVIDER=sentence_transformers
EMBEDDING_MODEL=all-MiniLM-L6-v2

LLM_PROVIDER=openai
OPENAI_BASE_URL=http://localhost:11434/v1
OPENAI_API_KEY=ollama
LLM_MODEL=llama3.2

Example with llama.cpp server:

EMBEDDING_PROVIDER=sentence_transformers
EMBEDDING_MODEL=all-MiniLM-L6-v2

LLM_PROVIDER=openai
OPENAI_BASE_URL=http://localhost:8080/v1
OPENAI_API_KEY=llama-cpp
LLM_MODEL=model-loaded-in-server

If the self-hosted server also provides embeddings through an OpenAI-compatible API, you can set EMBEDDING_PROVIDER=openai and use the corresponding embedding model.

OpenRouter or OpenAI-compatible API

Use this option if you have an API key or want to use OpenRouter models.

  1. Create an account at https://openrouter.ai.
  2. Generate a key at https://openrouter.ai/keys.
  3. Configure .env.
EMBEDDING_PROVIDER=sentence_transformers
EMBEDDING_MODEL=all-MiniLM-L6-v2

LLM_PROVIDER=openai
OPENAI_BASE_URL=https://openrouter.ai/api/v1
OPENAI_API_KEY=your-key-here
LLM_MODEL=google/gemma-3-12b-it:free

Standard OpenAI also works:

EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small

LLM_PROVIDER=openai
OPENAI_API_KEY=your-key-here
LLM_MODEL=gpt-4o-mini

Anthropic

EMBEDDING_PROVIDER=sentence_transformers
EMBEDDING_MODEL=all-MiniLM-L6-v2

LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=your-key-here
LLM_MODEL=claude-haiku-4-5-20251001

Useful variables

Variable Purpose
EMBEDDING_PROVIDER Embedding provider: docker_model_runner, openai, or sentence_transformers.
EMBEDDING_MODEL Embedding model name.
LLM_PROVIDER Language model provider: docker_model_runner, openai, or anthropic.
LLM_MODEL Model used for analysis.
OPENAI_API_KEY Key for OpenAI or a compatible service. For unauthenticated local endpoints, it can be a dummy value.
OPENAI_BASE_URL Alternative URL for OpenAI-compatible APIs.
ANTHROPIC_API_KEY Anthropic API key.
DOCKER_MODEL_HOST Docker Model Runner address when using a local Python installation.
DOCKER_MODEL_API_KEY Key used by the OpenAI-compatible Docker Model Runner client. Usually does not need to be changed.
LLM_MAX_TOKENS Maximum response size. Default: 4096.
LLM_TEMPERATURE Response variation. Default: 0.2.
HUGGINGFACE_TOKEN Optional token for gated models used through sentence_transformers.

Main commands

lutz init [PROJECT_NAME]

Creates a new Lutz project.

lutz init
lutz init my-review

The command creates:

  • articles/
  • prompts/
  • analysis/execution_reports/
  • .env.example
  • .gitignore
  • project README.md
  • local Git repository

lutz load --f FOLDER [--so OS] [--overwrite]

Copies PDFs from a source folder into articles/.

Option Description Default
--f Path to the folder containing PDFs. required
--so Path operating system: linux, windows, or mac. choose your system
--overwrite Overwrite files that already exist in articles/. disabled

Examples:

lutz load --f ~/Downloads/articles --so linux
lutz load --f ~/Desktop/articles --so mac

Windows PowerShell:

lutz load --f "C:\Users\Ana\Downloads\articles" --so windows

lutz vectorize [options]

Processes PDFs from articles/ and creates the local vector database in .lutz/vector_store/.

Option Description Default
--skip-security Skip security checks. Not recommended. disabled
--chunk-size Text chunk size in words. 512
--chunk-overlap Overlap between chunks. 64
--quarantine Process files in articles/_quarantine/. disabled
--section-parse Split each article into labeled sections (abstract, introduction, methodology, results, discussion, conclusion, references…) before chunking. Each chunk is tagged with its section name. Chunks never cross section boundaries. disabled
--layout-parse / --no-layout-parse When --section-parse is active, use layout-parser for visual section detection. Requires pip install "lutz-research[layout]". Falls back to text heuristics if not installed. Has no effect without --section-parse. enabled

Examples:

lutz vectorize
lutz vectorize --chunk-size 256 --chunk-overlap 32

# Section-aware vectorization (text heuristics, no extra deps)
lutz vectorize --section-parse --no-layout-parse

# Section-aware vectorization with visual layout detection
pip install "lutz-research[layout]"
lutz vectorize --section-parse

Installing the layout detection backend

Visual layout detection uses layout-parser with a Detectron2 model trained on PubLayNet. Model weights (~250 MB) are downloaded on first use.

# Install optional deps
pip install "lutz-research[layout]"

# System dependency (required by pdf2image)
# Debian/Ubuntu:
apt install poppler-utils
# macOS:
brew install poppler

If layout-parser is not installed, --section-parse falls back to regex-based text heuristics with no extra dependencies.

lutz unvectorize

Deletes the vector database, but does not delete your PDFs.

lutz unvectorize

Use it when you want to rebuild the index from scratch.

lutz analysis --p PROMPT [options]

Analyzes vectorized articles using a Markdown prompt. Two modes are available.

RAG mode (default)

Embeds the prompt, retrieves the most relevant chunks from the full corpus, and makes one model call. Useful for general synthesis and semantic search.

Per-article mode (--per-article)

Makes a separate model call for each article in the vector database. Useful for systematic screening where you need an inclusion or exclusion decision per article.

Option Description Default
--p Path to the .md prompt. required
--top-k Chunks to retrieve in RAG mode. Use '*' for all. 10
--per-article Analyze each article in a separate model call. disabled
--workers Parallel model calls in --per-article mode. 1
--max-chunks-per-article Chunk limit per article in --per-article mode. no limit
--filter-sections Comma-separated list of sections to include (e.g. abstract,methodology,results). Only chunks with a matching section label are retrieved. Requires articles vectorized with --section-parse. Use lutz vector-store --sections to check what is available. no filter
--output-name Base output filename. generated automatically

Examples:

# Default RAG mode
lutz analysis --p prompts/systematic_review.md

# RAG retrieving more chunks
lutz analysis --p prompts/methodology_analysis.md --top-k 20

# RAG using all chunks in the corpus
lutz analysis --p prompts/systematic_review.md --top-k '*'

# Sequential per-article screening
lutz analysis --p prompts/screening.md --per-article

# Per-article screening with 4 parallel calls
lutz analysis --p prompts/screening.md --per-article --workers 4

# Per-article screening with a 10-chunk context limit per article
lutz analysis --p prompts/screening.md --per-article --workers 4 --max-chunks-per-article 10

# Analyze only methodology and results sections (RAG mode)
lutz analysis --p prompts/methodology_analysis.md \
  --filter-sections methodology,results

# Screen articles using only the abstract (per-article, parallel)
lutz analysis --p prompts/screening.md --per-article --workers 4 \
  --filter-sections abstract

# Custom output name
lutz analysis --p prompts/systematic_review.md --output-name my-analysis-v1

Section filter (--filter-sections)

When articles have been vectorized with --section-parse, each chunk carries a section label (abstract, introduction, background, methodology, results, discussion, conclusion, references, acknowledgements, appendix). The --filter-sections flag restricts the analysis to only those sections, reducing context size and focusing the model's attention.

  • In RAG mode the similarity search is run only over the specified sections, then ranked by relevance as usual.
  • In per-article mode each article receives only the chunks from the specified sections. Articles with no chunks in those sections show chunks_used: 0 in the report.
  • Articles vectorized without --section-parse have no section label and are excluded when the filter is active.
  • Run lutz vector-store --sections first to confirm which sections are present in the store.

Performance in --per-article mode

With many articles, --per-article can take a long time because each article requires a model call. Use --workers to parallelize:

Articles --workers 1 --workers 4 --workers 8
52 articles at ~50s each ~43 min ~11 min ~6 min

The practical limit depends on the provider. Remote APIs such as OpenRouter have rate limits; self-hosted models may bottleneck on CPU, GPU, memory, or request queues. Tune --workers according to your service capacity.

Use --max-chunks-per-article to reduce context size per call, which lowers latency and cost. Chunks are sent in document order.

Context size note: --chunk-size in lutz vectorize is measured in words, not model tokens. A 512-word chunk is roughly 680 tokens. With 23 chunks per article, a typical article can produce around 15,000 to 16,000 input tokens. Check that your configured model supports the required context window.

lutz citations --analysis FILE [options]

Extracts structured citations from a report generated by lutz analysis --per-article.

Option Description Default
--analysis Path to the per-article analysis JSON. required
--workers Parallel model calls. 1
--only-relevant Include only relevant articles in the report. disabled
--output-name Base output filename. generated automatically

Internal flow:

  1. Reads the JSON produced by lutz analysis --per-article.
  2. Classifies each article as relevant, not relevant, or unknown using the analysis text, without an LLM call.
  3. For each relevant article, retrieves original chunks from the vector database and asks the LLM to extract the 3 to 5 passages that best justify the classification.
  4. Saves a JSON report in analysis/execution_reports/.

The output filename follows <analysis_name>_citations_<timestamp>.json.

# Basic extraction
lutz citations --analysis analysis/execution_reports/screening_20260501.json

# Parallel calls and relevant articles only
lutz citations --analysis analysis/execution_reports/screening_20260501.json \
  --workers 4 --only-relevant

# Custom output name
lutz citations --analysis analysis/execution_reports/screening_20260501.json \
  --output-name review_citations_v1

Prerequisite: the input report must have been generated with lutz analysis --per-article. The vector database must be available at .lutz/vector_store/ because citations are extracted from original article chunks.

lutz vector-store [--summarize] [--sections] [--export [FILE]]

Inspects the local vector database.

Option Description
--summarize Display a summary in the terminal.
--sections Show a per-article section breakdown (abstract, introduction, methodology…). Articles vectorized without --section-parse appear under (no section).
--export Export the summary as JSON, with an automatic path in .lutz/.
--export FILE Export to a specific path. Use - to print to stdout.

The options can be combined.

# Display summary
lutz vector-store --summarize

# Check which sections were detected per article
lutz vector-store --sections

# Summary + section breakdown together
lutz vector-store --summarize --sections

# Export JSON with automatic path
lutz vector-store --export

# Export to a specific file
lutz vector-store --export summary.json

# Print JSON to stdout
lutz vector-store --export -

How to write prompts

Prompts are Markdown files inside prompts/. They tell the model what you want to analyze.

A good prompt usually includes:

# Analysis title

## Objective
Explain in a few lines what you want to discover.

## Questions
1. What is the main question?
2. What information should be extracted from the articles?
3. Which inclusion or exclusion criteria should be considered?

## Response format
Ask for a table, a list, or sections with clear headings.

## Research topic
Describe the topic or research question.

lutz init creates ready-to-edit prompt templates:

File Suggested use
prompts/systematic_review.md Systematic review with evidence table.
prompts/methodology_analysis.md Comparison of research methods.
prompts/evidence_quality.md Quality and bias assessment.
prompts/thematic_synthesis.md Thematic synthesis across articles.

Before running lutz analysis, open the chosen prompt and replace example fields with your research question.


Where results are stored

After lutz analysis, results are stored in:

analysis/execution_reports/

The generated file is a .json. It includes:

  • prompt used in the analysis;
  • execution date and duration;
  • analysis mode, such as rag or per_article;
  • embedding model and language model used;
  • token counts;
  • covered articles;
  • model response.

Example filename:

systematic_review_20260501_153000.json

Security model

Before vectorizing, Lutz can check PDFs to reduce common risks in malicious or unsuitable files.

Check What it looks for
Structural analysis Embedded JavaScript, automatic actions, and XFA forms.
Prompt injection Phrases that try to override model instructions.
Academic structure Basic signs of academic articles, such as abstract, methodology, and references.
Corpus anomaly When there are 5 or more documents, identifies possible statistical outliers.

Suspicious files can be moved to:

articles/_quarantine/

To process quarantined files after manual review:

lutz vectorize --quarantine

To skip security checks:

lutz vectorize --skip-security

Use --skip-security only if you trust the PDF source.


Architecture

lutz/
├── cli.py                    # main Click CLI entry point
├── commands/
│   ├── init.py               # lutz init
│   ├── load.py               # lutz load
│   ├── vectorize.py          # lutz vectorize / lutz unvectorize
│   ├── analysis.py           # lutz analysis
│   ├── citations.py          # lutz citations
│   └── vector_store.py       # lutz vector-store
├── core/
│   ├── security_checker.py   # PDF security checks
│   ├── pdf_processor.py      # text extraction and chunking
│   ├── section_parser.py     # section detection (layout-parser or text heuristics)
│   ├── vector_store.py       # LanceDB wrapper
│   ├── embedding_client.py   # embedding providers
│   └── llm_client.py         # LLM providers
└── utils/
    ├── pdf.py                # basic PDF validation
    ├── project.py            # project detection and .env loading
    └── templates.py          # files created by lutz init

The vector database uses LanceDB and is stored in .lutz/vector_store/ inside the project. This directory should not be committed to Git.


Complete systematic review workflow

# 1. Create project
lutz init my-review && cd my-review

# 2. Add PDFs
lutz load --f ~/Downloads/articles --so linux

# 3. Vectorize with section-aware parsing (optional but recommended)
lutz vectorize --section-parse

# 4. Inspect the section breakdown to confirm detection worked
lutz vector-store --sections

# 5. Per-article screening (abstract only — faster and cheaper)
lutz analysis --p prompts/screening.md --per-article --workers 4 \
  --filter-sections abstract

# 6. Deep analysis on methodology and results sections
lutz analysis --p prompts/methodology_analysis.md \
  --filter-sections methodology,results

# 7. Extract citations from relevant articles
lutz citations --analysis analysis/execution_reports/screening_<timestamp>.json \
  --workers 4 --only-relevant

# 8. Inspect the vector database
lutz vector-store --summarize
lutz vector-store --export

Contributing

Contributions are welcome. To prepare a development environment:

git clone https://github.com/jooguilhermesc/lutz.git
cd lutz
pip install -e ".[dev]"
pytest

Before proposing large changes, open an issue to discuss the idea.


How to cite

If you use Lutz in your research, please cite it using the information below or refer to the CITATION.cff file.

APA

Cabral, J. G. S., & Azevedo Farias, A. K. (2026). Lutz: AI-powered academic article screening and analysis tool (Version 0.1.2) [Software]. Zenodo. https://doi.org/10.5281/zenodo.19982571

BibTeX

@software{cabral2026lutz,
  author  = {Cabral, João Guilherme Silva and Azevedo Farias, Anna Karoline},
  title   = {{Lutz: AI-powered academic article screening and analysis tool}},
  year    = {2026},
  version = {0.1.2},
  doi     = {10.5281/zenodo.19982571},
  url     = {https://github.com/jooguilhermesc/lutz},
  license = {MIT}
}

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lutz_research-0.1.3.tar.gz (2.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lutz_research-0.1.3-py3-none-any.whl (55.8 kB view details)

Uploaded Python 3

File details

Details for the file lutz_research-0.1.3.tar.gz.

File metadata

  • Download URL: lutz_research-0.1.3.tar.gz
  • Upload date:
  • Size: 2.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lutz_research-0.1.3.tar.gz
Algorithm Hash digest
SHA256 f3f39297c9324a07ad5ae10b47997b3eaef03337489cf359c418444bb29d88b4
MD5 f6fb77d08f450d763ca608bb5c465184
BLAKE2b-256 de4d39dcd8ad945ab336971780ea250feca22d9d4e38284b930ac183d00a2c3f

See more details on using hashes here.

File details

Details for the file lutz_research-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: lutz_research-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 55.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lutz_research-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 5bc8445e1c1084e497ea61594680df9d562444d3b464df506be01d706ef979f4
MD5 87133edcd4502878f9417a82b2ed74ea
BLAKE2b-256 f80c49533c181a2102bab28e02be17cab9eb93aabccd8fe3535b32a77f92f05d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page