AI-powered academic article screening and analysis tool

These details have not been verified by PyPI

Project links

Project description

Lutz

Lutz logo

Languages: English | Português | Español

Python library and command-line tool for organizing, vectorizing, and analyzing academic PDF articles with AI.

Python Version License Status CLI

Tags: systematic review, academic screening, scientific articles, generative AI, LLM, RAG, embeddings, PDF, LanceDB, Python, open science, academic research.

Lutz helps researchers, students, and literature review teams work with large sets of PDF articles. It creates a reproducible project structure, copies PDFs into the right place, performs basic security checks, extracts text, generates embeddings, stores everything in a local vector database, and uses a language model to answer analysis prompts.

Current package version: 0.1.2.

The package is named after Bertha Maria Julia Lutz, an important Brazilian scientist, biologist, and researcher who contributed to biology and to the recognition of science in Brazil.

What Lutz is for
How Lutz works
Before you start
Installation
First use, step by step
Model configuration
Main commands
Complete systematic review workflow
How to write prompts
Where results are stored
Security model
Architecture
Contributing
How to cite
License

What Lutz is for

Use Lutz when you need to:

Organize a folder of scientific articles in PDF format.
Prepare a systematic review, narrative review, literature map, or initial study screening.
Ask questions about a set of articles using a language model.
Generate structured analysis from Markdown prompts.
Keep files, prompts, vector data, and reports inside a reproducible project.

Lutz does not replace critical reading or methodological decisions by researchers. It is a support tool for accelerating organization, semantic search, and first-pass synthesis of texts.

How Lutz works

PDFs -> security check -> text extraction -> [section parsing] -> embeddings -> vector database -> LLM analysis -> JSON report

Basic flow:

lutz init creates a project folder with subfolders, prompt templates, and .env.example.
lutz load copies your PDFs into articles/.
lutz vectorize checks PDFs, extracts text, optionally splits articles into labeled sections (abstract, introduction, methodology…), chunks, and creates embeddings.
lutz analysis uses a Markdown prompt to analyze the vectorized articles.
Results are stored in analysis/execution_reports/.

Before you start

You will need:

A computer running Windows, macOS, or Linux.
Terminal access. On Windows, use PowerShell; on macOS and Linux, use Terminal.
Python 3.10 or higher.
A folder with your PDF articles.
An AI model for analysis: self-hosted via Docker Model Runner, Ollama, or llama.cpp; OpenAI/OpenRouter; or Anthropic.

The recommended installation path uses the package published on PyPI.

Installation

From PyPI

Install Python 3.10 or higher.

Check your version:

python --version

On some systems, the command may be python3 --version.

Create and activate a virtual environment.

Linux or macOS:

python -m venv .venv
source .venv/bin/activate

Windows PowerShell:

python -m venv .venv
.\.venv\Scripts\Activate.ps1

Install Lutz.

python -m pip install --upgrade pip
pip install lutz-research

Test the installation.

lutz --help
lutz --version

From source

Use this option if you want to contribute or run the latest code from the repository.

git clone https://github.com/jooguilhermesc/lutz.git
cd lutz
python -m pip install --upgrade pip
pip install -e .

First use, step by step

The commands below assume that lutz already works in your terminal.

1. Create a folder for your review

mkdir my-review
cd my-review
lutz init

Lutz creates a structure similar to this:

articles/                   research PDFs
prompts/                    prompt templates
analysis/execution_reports/ generated reports
.env.example                configuration example
README.md                   project notes

2. Configure AI models

Copy the example file:

Linux or macOS:

cp .env.example .env

Windows PowerShell:

Copy-Item .env.example .env

Open .env in a text editor and choose one of the configurations from Model configuration.

3. Add PDFs to the project

You can manually copy files into articles/ or use the load command.

Linux example:

lutz load --f ~/Downloads/my-articles --so linux

macOS example:

lutz load --f ~/Desktop/articles --so mac

Windows example:

lutz load --f "C:\Users\Ana\Downloads\articles" --so windows

If the PDFs are already in articles/, you can skip this step.

4. Create the article vector index

lutz vectorize

This command may take time on the first run, especially if there are many PDFs or if a local model still needs to be downloaded.

5. Run an analysis

lutz analysis --p prompts/systematic_review.md

To analyze each article separately, use:

lutz analysis --p prompts/systematic_review.md --per-article

6. Open the result

Files are stored in:

analysis/execution_reports/

Each run generates a .json file with metadata, articles used, token usage, and the model response.

Model configuration

Configuration lives in .env, created from .env.example.

Local/self-hosted option: Docker Model Runner

This option uses local models through Docker Model Runner and does not require an external API key.

Pull the models.

docker model pull nomic-embed-text
docker model pull ai/llama3.2

Configure .env.

EMBEDDING_PROVIDER=docker_model_runner
EMBEDDING_MODEL=nomic-embed-text

LLM_PROVIDER=docker_model_runner
LLM_MODEL=ai/llama3.2

DOCKER_MODEL_HOST=http://localhost:12434/engines/v1

Self-hosted option with Ollama or llama.cpp

Lutz can also use local servers compatible with the OpenAI API, including Ollama and llama.cpp server.

For local endpoints, OPENAI_API_KEY can be a dummy value when the server does not require authentication.

Example with Ollama:

EMBEDDING_PROVIDER=sentence_transformers
EMBEDDING_MODEL=all-MiniLM-L6-v2

LLM_PROVIDER=openai
OPENAI_BASE_URL=http://localhost:11434/v1
OPENAI_API_KEY=ollama
LLM_MODEL=llama3.2

Example with llama.cpp server:

EMBEDDING_PROVIDER=sentence_transformers
EMBEDDING_MODEL=all-MiniLM-L6-v2

LLM_PROVIDER=openai
OPENAI_BASE_URL=http://localhost:8080/v1
OPENAI_API_KEY=llama-cpp
LLM_MODEL=model-loaded-in-server

If the self-hosted server also provides embeddings through an OpenAI-compatible API, you can set EMBEDDING_PROVIDER=openai and use the corresponding embedding model.

OpenRouter or OpenAI-compatible API

Use this option if you have an API key or want to use OpenRouter models.

Create an account at https://openrouter.ai.
Generate a key at https://openrouter.ai/keys.
Configure .env.

EMBEDDING_PROVIDER=sentence_transformers
EMBEDDING_MODEL=all-MiniLM-L6-v2

LLM_PROVIDER=openai
OPENAI_BASE_URL=https://openrouter.ai/api/v1
OPENAI_API_KEY=your-key-here
LLM_MODEL=google/gemma-3-12b-it:free

Standard OpenAI also works:

EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small

LLM_PROVIDER=openai
OPENAI_API_KEY=your-key-here
LLM_MODEL=gpt-4o-mini

Anthropic

EMBEDDING_PROVIDER=sentence_transformers
EMBEDDING_MODEL=all-MiniLM-L6-v2

LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=your-key-here
LLM_MODEL=claude-haiku-4-5-20251001

Useful variables

Variable	Purpose
`EMBEDDING_PROVIDER`	Embedding provider: `docker_model_runner`, `openai`, or `sentence_transformers`.
`EMBEDDING_MODEL`	Embedding model name.
`LLM_PROVIDER`	Language model provider: `docker_model_runner`, `openai`, or `anthropic`.
`LLM_MODEL`	Model used for analysis.
`OPENAI_API_KEY`	Key for OpenAI or a compatible service. For unauthenticated local endpoints, it can be a dummy value.
`OPENAI_BASE_URL`	Alternative URL for OpenAI-compatible APIs.
`ANTHROPIC_API_KEY`	Anthropic API key.
`DOCKER_MODEL_HOST`	Docker Model Runner address when using a local Python installation.
`DOCKER_MODEL_API_KEY`	Key used by the OpenAI-compatible Docker Model Runner client. Usually does not need to be changed.
`LLM_MAX_TOKENS`	Maximum response size. Default: `4096`.
`LLM_TEMPERATURE`	Response variation. Default: `0.2`.
`HUGGINGFACE_TOKEN`	Optional token for gated models used through `sentence_transformers`.

Main commands

`lutz init [PROJECT_NAME]`

Creates a new Lutz project.

lutz init
lutz init my-review

The command creates:

articles/
prompts/
analysis/execution_reports/
.env.example
.gitignore
project README.md
local Git repository

`lutz load --f FOLDER [--so OS] [--overwrite]`

Copies PDFs from a source folder into articles/.

Option	Description	Default
`--f`	Path to the folder containing PDFs.	required
`--so`	Path operating system: `linux`, `windows`, or `mac`.	choose your system
`--overwrite`	Overwrite files that already exist in `articles/`.	disabled

Examples:

lutz load --f ~/Downloads/articles --so linux
lutz load --f ~/Desktop/articles --so mac

Windows PowerShell:

lutz load --f "C:\Users\Ana\Downloads\articles" --so windows

`lutz vectorize [options]`

Processes PDFs from articles/ and creates the local vector database in .lutz/vector_store/.

Option	Description	Default
`--skip-security`	Skip security checks. Not recommended.	disabled
`--chunk-size`	Text chunk size in words.	`512`
`--chunk-overlap`	Overlap between chunks.	`64`
`--quarantine`	Process files in `articles/_quarantine/`.	disabled
`--section-parse`	Split each article into labeled sections (abstract, introduction, methodology, results, discussion, conclusion, references…) before chunking. Each chunk is tagged with its section name. Chunks never cross section boundaries.	disabled
`--layout-parse` / `--no-layout-parse`	When `--section-parse` is active, use layout-parser for visual section detection. Requires `pip install "lutz-research[layout]"`. Falls back to text heuristics if not installed. Has no effect without `--section-parse`.	enabled

Examples:

lutz vectorize
lutz vectorize --chunk-size 256 --chunk-overlap 32

# Section-aware vectorization (text heuristics, no extra deps)
lutz vectorize --section-parse --no-layout-parse

# Section-aware vectorization with visual layout detection
pip install "lutz-research[layout]"
lutz vectorize --section-parse

Installing the layout detection backend

Visual layout detection uses layout-parser with a Detectron2 model trained on PubLayNet. Model weights (~250 MB) are downloaded on first use.

# Install optional deps
pip install "lutz-research[layout]"

# System dependency (required by pdf2image)
# Debian/Ubuntu:
apt install poppler-utils
# macOS:
brew install poppler

If layout-parser is not installed, --section-parse falls back to regex-based text heuristics with no extra dependencies.

`lutz unvectorize`

Deletes the vector database, but does not delete your PDFs.

lutz unvectorize

Use it when you want to rebuild the index from scratch.

`lutz analysis --p PROMPT [options]`

Analyzes vectorized articles using a Markdown prompt. Two modes are available.

RAG mode (default)

Embeds the prompt, retrieves the most relevant chunks from the full corpus, and makes one model call. Useful for general synthesis and semantic search.

Per-article mode (--per-article)

Makes a separate model call for each article in the vector database. Useful for systematic screening where you need an inclusion or exclusion decision per article.

Option	Description	Default
`--p`	Path to the `.md` prompt.	required
`--top-k`	Chunks to retrieve in RAG mode. Use `'*'` for all.	`10`
`--per-article`	Analyze each article in a separate model call.	disabled
`--workers`	Parallel model calls in `--per-article` mode.	`1`
`--max-chunks-per-article`	Chunk limit per article in `--per-article` mode.	no limit
`--filter-sections`	Comma-separated list of sections to include (e.g. `abstract,methodology,results`). Only chunks with a matching section label are retrieved. Requires articles vectorized with `--section-parse`. Use `lutz vector-store --sections` to check what is available.	no filter
`--output-name`	Base output filename.	generated automatically

Examples:

# Default RAG mode
lutz analysis --p prompts/systematic_review.md

# RAG retrieving more chunks
lutz analysis --p prompts/methodology_analysis.md --top-k 20

# RAG using all chunks in the corpus
lutz analysis --p prompts/systematic_review.md --top-k '*'

# Sequential per-article screening
lutz analysis --p prompts/screening.md --per-article

# Per-article screening with 4 parallel calls
lutz analysis --p prompts/screening.md --per-article --workers 4

# Per-article screening with a 10-chunk context limit per article
lutz analysis --p prompts/screening.md --per-article --workers 4 --max-chunks-per-article 10

# Analyze only methodology and results sections (RAG mode)
lutz analysis --p prompts/methodology_analysis.md \
  --filter-sections methodology,results

# Screen articles using only the abstract (per-article, parallel)
lutz analysis --p prompts/screening.md --per-article --workers 4 \
  --filter-sections abstract

# Custom output name
lutz analysis --p prompts/systematic_review.md --output-name my-analysis-v1

Section filter (--filter-sections)

When articles have been vectorized with --section-parse, each chunk carries a section label (abstract, introduction, background, methodology, results, discussion, conclusion, references, acknowledgements, appendix). The --filter-sections flag restricts the analysis to only those sections, reducing context size and focusing the model's attention.

In RAG mode the similarity search is run only over the specified sections, then ranked by relevance as usual.
In per-article mode each article receives only the chunks from the specified sections. Articles with no chunks in those sections show chunks_used: 0 in the report.
Articles vectorized without --section-parse have no section label and are excluded when the filter is active.
Run lutz vector-store --sections first to confirm which sections are present in the store.

Performance in --per-article mode

With many articles, --per-article can take a long time because each article requires a model call. Use --workers to parallelize:

Articles	`--workers 1`	`--workers 4`	`--workers 8`
52 articles at ~50s each	~43 min	~11 min	~6 min

The practical limit depends on the provider. Remote APIs such as OpenRouter have rate limits; self-hosted models may bottleneck on CPU, GPU, memory, or request queues. Tune --workers according to your service capacity.

Use --max-chunks-per-article to reduce context size per call, which lowers latency and cost. Chunks are sent in document order.

Context size note: --chunk-size in lutz vectorize is measured in words, not model tokens. A 512-word chunk is roughly 680 tokens. With 23 chunks per article, a typical article can produce around 15,000 to 16,000 input tokens. Check that your configured model supports the required context window.

`lutz citations --analysis FILE [options]`

Extracts structured citations from a report generated by lutz analysis --per-article.

Option	Description	Default
`--analysis`	Path to the per-article analysis JSON.	required
`--workers`	Parallel model calls.	`1`
`--only-relevant`	Include only relevant articles in the report.	disabled
`--output-name`	Base output filename.	generated automatically

Internal flow:

Reads the JSON produced by lutz analysis --per-article.
Classifies each article as relevant, not relevant, or unknown using the analysis text, without an LLM call.
For each relevant article, retrieves original chunks from the vector database and asks the LLM to extract the 3 to 5 passages that best justify the classification.
Saves a JSON report in analysis/execution_reports/.

The output filename follows <analysis_name>_citations_<timestamp>.json.

# Basic extraction
lutz citations --analysis analysis/execution_reports/screening_20260501.json

# Parallel calls and relevant articles only
lutz citations --analysis analysis/execution_reports/screening_20260501.json \
  --workers 4 --only-relevant

# Custom output name
lutz citations --analysis analysis/execution_reports/screening_20260501.json \
  --output-name review_citations_v1

Prerequisite: the input report must have been generated with lutz analysis --per-article. The vector database must be available at .lutz/vector_store/ because citations are extracted from original article chunks.

`lutz vector-store [--summarize] [--sections] [--export [FILE]]`

Inspects the local vector database.

Option	Description
`--summarize`	Display a summary in the terminal.
`--sections`	Show a per-article section breakdown (abstract, introduction, methodology…). Articles vectorized without `--section-parse` appear under `(no section)`.
`--export`	Export the summary as JSON, with an automatic path in `.lutz/`.
`--export FILE`	Export to a specific path. Use `-` to print to stdout.

The options can be combined.

# Display summary
lutz vector-store --summarize

# Check which sections were detected per article
lutz vector-store --sections

# Summary + section breakdown together
lutz vector-store --summarize --sections

# Export JSON with automatic path
lutz vector-store --export

# Export to a specific file
lutz vector-store --export summary.json

# Print JSON to stdout
lutz vector-store --export -

How to write prompts

Prompts are Markdown files inside prompts/. They tell the model what you want to analyze.

A good prompt usually includes:

# Analysis title

## Objective
Explain in a few lines what you want to discover.

## Questions
1. What is the main question?
2. What information should be extracted from the articles?
3. Which inclusion or exclusion criteria should be considered?

## Response format
Ask for a table, a list, or sections with clear headings.

## Research topic
Describe the topic or research question.

lutz init creates ready-to-edit prompt templates:

File	Suggested use
`prompts/systematic_review.md`	Systematic review with evidence table.
`prompts/methodology_analysis.md`	Comparison of research methods.
`prompts/evidence_quality.md`	Quality and bias assessment.
`prompts/thematic_synthesis.md`	Thematic synthesis across articles.

Before running lutz analysis, open the chosen prompt and replace example fields with your research question.

Where results are stored

After lutz analysis, results are stored in:

analysis/execution_reports/

The generated file is a .json. It includes:

prompt used in the analysis;
execution date and duration;
analysis mode, such as rag or per_article;
embedding model and language model used;
token counts;
covered articles;
model response.

Example filename:

systematic_review_20260501_153000.json

Security model

Before vectorizing, Lutz can check PDFs to reduce common risks in malicious or unsuitable files.

Check	What it looks for
Structural analysis	Embedded JavaScript, automatic actions, and XFA forms.
Prompt injection	Phrases that try to override model instructions.
Academic structure	Basic signs of academic articles, such as abstract, methodology, and references.
Corpus anomaly	When there are 5 or more documents, identifies possible statistical outliers.

Suspicious files can be moved to:

articles/_quarantine/

To process quarantined files after manual review:

lutz vectorize --quarantine

To skip security checks:

lutz vectorize --skip-security

Use --skip-security only if you trust the PDF source.

Architecture

lutz/
├── cli.py                    # main Click CLI entry point
├── commands/
│   ├── init.py               # lutz init
│   ├── load.py               # lutz load
│   ├── vectorize.py          # lutz vectorize / lutz unvectorize
│   ├── analysis.py           # lutz analysis
│   ├── citations.py          # lutz citations
│   └── vector_store.py       # lutz vector-store
├── core/
│   ├── security_checker.py   # PDF security checks
│   ├── pdf_processor.py      # text extraction and chunking
│   ├── section_parser.py     # section detection (layout-parser or text heuristics)
│   ├── vector_store.py       # LanceDB wrapper
│   ├── embedding_client.py   # embedding providers
│   └── llm_client.py         # LLM providers
└── utils/
    ├── pdf.py                # basic PDF validation
    ├── project.py            # project detection and .env loading
    └── templates.py          # files created by lutz init

The vector database uses LanceDB and is stored in .lutz/vector_store/ inside the project. This directory should not be committed to Git.

Complete systematic review workflow

# 1. Create project
lutz init my-review && cd my-review

# 2. Add PDFs
lutz load --f ~/Downloads/articles --so linux

# 3. Vectorize with section-aware parsing (optional but recommended)
lutz vectorize --section-parse

# 4. Inspect the section breakdown to confirm detection worked
lutz vector-store --sections

# 5. Per-article screening (abstract only — faster and cheaper)
lutz analysis --p prompts/screening.md --per-article --workers 4 \
  --filter-sections abstract

# 6. Deep analysis on methodology and results sections
lutz analysis --p prompts/methodology_analysis.md \
  --filter-sections methodology,results

# 7. Extract citations from relevant articles
lutz citations --analysis analysis/execution_reports/screening_<timestamp>.json \
  --workers 4 --only-relevant

# 8. Inspect the vector database
lutz vector-store --summarize
lutz vector-store --export

Contributing

Contributions are welcome. To prepare a development environment:

git clone https://github.com/jooguilhermesc/lutz.git
cd lutz
pip install -e ".[dev]"
pytest

Before proposing large changes, open an issue to discuss the idea.

How to cite

If you use Lutz in your research, please cite it using the information below or refer to the CITATION.cff file.

APA

Cabral, J. G. S., & Azevedo Farias, A. K. (2026). Lutz: AI-powered academic article screening and analysis tool (Version 0.1.2) [Software]. Zenodo. https://doi.org/10.5281/zenodo.19982571

BibTeX

@software{cabral2026lutz,
  author  = {Cabral, João Guilherme Silva and Azevedo Farias, Anna Karoline},
  title   = {{Lutz: AI-powered academic article screening and analysis tool}},
  year    = {2026},
  version = {0.1.2},
  doi     = {10.5281/zenodo.19982571},
  url     = {https://github.com/jooguilhermesc/lutz},
  license = {MIT}
}

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.3

May 6, 2026

0.1.2

May 2, 2026

0.1.1

May 2, 2026

0.1.0

May 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lutz_research-0.1.3.tar.gz (2.7 MB view details)

Uploaded May 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lutz_research-0.1.3-py3-none-any.whl (55.8 kB view details)

Uploaded May 6, 2026 Python 3

File details

Details for the file lutz_research-0.1.3.tar.gz.

File metadata

Download URL: lutz_research-0.1.3.tar.gz
Upload date: May 6, 2026
Size: 2.7 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lutz_research-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`f3f39297c9324a07ad5ae10b47997b3eaef03337489cf359c418444bb29d88b4`
MD5	`f6fb77d08f450d763ca608bb5c465184`
BLAKE2b-256	`de4d39dcd8ad945ab336971780ea250feca22d9d4e38284b930ac183d00a2c3f`

See more details on using hashes here.

File details

Details for the file lutz_research-0.1.3-py3-none-any.whl.

File metadata

Download URL: lutz_research-0.1.3-py3-none-any.whl
Upload date: May 6, 2026
Size: 55.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lutz_research-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5bc8445e1c1084e497ea61594680df9d562444d3b464df506be01d706ef979f4`
MD5	`87133edcd4502878f9417a82b2ed74ea`
BLAKE2b-256	`f80c49533c181a2102bab28e02be17cab9eb93aabccd8fe3535b32a77f92f05d`

See more details on using hashes here.

lutz-research 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Lutz

Table of contents

What Lutz is for

How Lutz works

Before you start

Installation

From PyPI

From source

First use, step by step

1. Create a folder for your review

2. Configure AI models

3. Add PDFs to the project

4. Create the article vector index

5. Run an analysis

6. Open the result

Model configuration

Local/self-hosted option: Docker Model Runner

Self-hosted option with Ollama or llama.cpp

OpenRouter or OpenAI-compatible API

Anthropic

Useful variables

Main commands

lutz init [PROJECT_NAME]

lutz load --f FOLDER [--so OS] [--overwrite]

lutz vectorize [options]

lutz unvectorize

lutz analysis --p PROMPT [options]

lutz citations --analysis FILE [options]

lutz vector-store [--summarize] [--sections] [--export [FILE]]

How to write prompts

Where results are stored

Security model

Architecture

Complete systematic review workflow

Contributing

How to cite

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`lutz init [PROJECT_NAME]`

`lutz load --f FOLDER [--so OS] [--overwrite]`

`lutz vectorize [options]`

`lutz unvectorize`

`lutz analysis --p PROMPT [options]`

`lutz citations --analysis FILE [options]`

`lutz vector-store [--summarize] [--sections] [--export [FILE]]`