Transform academic PDFs into structured literature notes and critical-thinking canvases for Obsidian

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Heleninsights

These details have not been verified by PyPI

Project description

feature: thumbnails/external/74a4c4ea2d920c8d9a05a7420946145d.svg thumbnail: thumbnails/external/74a4c4ea2d920c8d9a05a7420946145d.svg

PhD Deep Read Workflow

Transform academic PDFs into structured literature notes and critical-thinking canvases for Obsidian using AI-assisted analysis.

What you get

When you run this workflow on a PDF, you get three output files:

Output	What it is
`paper.md`	The full text of your PDF, converted to Markdown
`paper_literature_note.md`	A structured academic note — with summary, critique, wikilinks, and Obsidian frontmatter — written by Claude
`paper.canvas`	A 9-node critical-thinking canvas for deep analysis, ready to open in Obsidian

Installation

Step 1 — Install Python (if you don't have it)

pip is Python's built-in package installer — it comes with Python automatically. So installing Python is all you need to get pip.

Check if you already have Python:

python3 --version

If you see Python 3.9.x or higher, skip to Step 2. If not, download it from python.org.

Step 2 — Install the workflow

This one command installs the phd-deepread CLI and all the PDF/OCR libraries it needs:

pip install phd-deepread-workflow

What this installs:

The phd-deepread command you'll type in the terminal
PyMuPDF — fast PDF text extraction
pytesseract + Pillow — OCR for scanned PDFs
The built-in templates for notes and canvases

What this does NOT install (because they are not Python packages):

An AI provider — needed for note generation; set up in Step 3
Tesseract OCR engine — needed only for scanned PDFs; see Step 4

Step 3 — Set up an AI provider (required for note generation)

The workflow uses AI to read your paper and write the structured literature note. Set up an API key for your chosen provider and export it as an environment variable before running:

export OPENAI_API_KEY=sk-...        # OpenAI — use with --openai flag
# or
export ANTHROPIC_API_KEY=sk-...     # Anthropic / Claude Code

To make it permanent, add the line to your ~/.zshrc or ~/.bashrc.

For users in China: OpenAI and Anthropic may not be directly accessible from mainland China. Consider using a domestic provider such as DeepSeek — it is OpenAI-compatible. Set OPENAI_API_KEY to your DeepSeek key and pass --model deepseek-chat --base-url https://api.deepseek.com when running.

Step 4 — Install Tesseract (optional — only for scanned PDFs)

Tesseract reads text from image-based PDFs (e.g. old scanned papers). Skip this if your PDFs come from a journal website — they almost always have selectable text.

Not sure if you need it? Open your PDF and try to highlight text. If you can highlight it, you don't need Tesseract.

brew install tesseract          # macOS
sudo apt install tesseract-ocr  # Ubuntu/Debian

Verify everything is ready

phd-deepread setup

Running the workflow

The simplest way — one command

# With OpenAI (fully automatic — no copy-pasting)
phd-deepread run paper.pdf --openai

# Without an API key (prints a prompt to paste into Claude Code manually)
phd-deepread run paper.pdf

Tip for beginners: Not sure how to type a file path? Drag and drop the PDF from Finder/Explorer directly into the terminal window — it fills in the path for you.

Step by step (if you prefer more control)

Step 1 — Extract the PDF

phd-deepread extract paper.pdf

Creates markdown_output/paper/ with the extracted text and images.

Step 2 — Generate the literature note

# With OpenAI — writes the note directly to a file
phd-deepread generate markdown_output/paper/ --openai -o notes/paper.md

# Without OpenAI — prints a prompt to paste into Claude Code
phd-deepread generate markdown_output/paper/

Step 3 — Create the canvas from the note

# With OpenAI — fills all 9 canvas nodes from the note automatically
phd-deepread canvas -o notes/paper.canvas --from-note notes/paper.md --openai

# Without OpenAI — creates a canvas with blank node templates to fill in yourself
phd-deepread canvas --title "Paper Title" --authors "Smith, J." --year "2024" \
  -o notes/paper.canvas

Batch process a whole folder of PDFs

phd-deepread batch papers/ --output literature-notes/

Tip: You can drag and drop folders too — type phd-deepread batch , drag your PDF folder into the terminal, type --output, drag your output folder, then press Enter.

What to do with the output

After running the workflow, open your output folder. You will find:

markdown_output/paper/
├── paper.md                  ← Full extracted text (Markdown)
├── paper_literature_note.md  ← Structured note written by Claude
├── paper.canvas              ← Critical-thinking canvas for Obsidian
├── paper_meta.json           ← Extraction metadata (for reference)
└── _page_*_*.png             ← Any images extracted from the PDF

Open .md files in Obsidian, Typora, or any Markdown editor
Open .canvas files in Obsidian with the Canvas plugin enabled
Copy notes into your Obsidian vault — they are already formatted with YAML frontmatter and Dataview callouts

Troubleshooting

"command not found: phd-deepread" The package installed but your terminal can't find it. Try:

python3 -m pip install phd-deepread-workflow
python3 -m phd_deepread_workflow

Or open a new terminal window after installing.

"command not found: claude" Claude Code isn't installed or not in your PATH. Install it:

npm install -g @anthropic-ai/claude-code

Then open a new terminal window and try again. Alternatively, use --openai with an OpenAI API key instead.

The literature note was not generated

With --openai: check that OPENAI_API_KEY is set (echo $OPENAI_API_KEY). If empty, run export OPENAI_API_KEY=sk-...
Without --openai: the command prints a prompt — copy it and paste it into a Claude Code session manually.

"Tesseract not found"

brew install tesseract          # macOS
sudo apt install tesseract-ocr  # Ubuntu/Debian

"PyMuPDF missing"

pip install PyMuPDF

"Template not found" after installing

pip install --upgrade phd-deepread-workflow

Using a virtual environment (recommended for clean installs)

python3 -m venv venv
source venv/bin/activate      # macOS/Linux
# or: venv\Scripts\activate   # Windows
pip install phd-deepread-workflow

All commands

Command	What it does
`setup`	Check that all dependencies are installed
`extract <pdf>`	Extract text and images from a PDF
`generate <dir> [--openai]`	Generate literature note — calls OpenAI directly with `--openai`, otherwise prints a prompt
`canvas -o <file> [--from-note <md> --openai]`	Create a 9-node canvas; populate from a note automatically with `--openai`
`run <pdf> [--openai]`	Full pipeline: extract → generate → canvas
`batch <dir>`	Process all PDFs in a folder
`verify <dir>`	Quality-check output files
`guide`	Show the workflow guide

Integration with Obsidian and Zotero

Obsidian: Notes use YAML frontmatter and Dataview-compatible callouts out of the box. Canvas files open with the Obsidian Canvas plugin. Wikilinks connect to your existing notes.

Zotero: Use your Zotero citation key as the citekey field in the generated note. Export PDFs from Zotero into your processing folder before running the workflow.

Contributing

Fork the repository
Create a feature branch: git checkout -b feature/my-feature
Commit and push, then open a Pull Request

See CONTRIBUTING.md for details.

License

MIT — see LICENSE.

Support

Issues: GitHub Issues
Email: heleninsights@gmail.com

Made with love for the academic community

If this workflow helps your research, consider giving it a star on GitHub!

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Heleninsights

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

May 6, 2026

0.1.6

Mar 6, 2026

0.1.5

Mar 6, 2026

0.1.4

Mar 6, 2026

0.1.3

Mar 6, 2026

0.1.2

Mar 6, 2026

0.1.1

Mar 5, 2026

0.1.0

Mar 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phd_deepread_workflow-0.2.0.tar.gz (54.1 kB view details)

Uploaded May 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

phd_deepread_workflow-0.2.0-py3-none-any.whl (32.3 kB view details)

Uploaded May 6, 2026 Python 3

File details

Details for the file phd_deepread_workflow-0.2.0.tar.gz.

File metadata

Download URL: phd_deepread_workflow-0.2.0.tar.gz
Upload date: May 6, 2026
Size: 54.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for phd_deepread_workflow-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`1cea814b74bd52c6e4421c649c284da523b3dfcc3769986f7c1880f2fbab0012`
MD5	`7f345b11af9f05a4ab28816e1bd27dce`
BLAKE2b-256	`91fc8facec9ff6176e9e091b30026562815cad597a41ac5ceb776b38bb43a393`

See more details on using hashes here.

Provenance

The following attestation bundles were made for phd_deepread_workflow-0.2.0.tar.gz:

Publisher: publish.yml on heleninsights-dot/phd-deepread-workflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: phd_deepread_workflow-0.2.0.tar.gz
- Subject digest: 1cea814b74bd52c6e4421c649c284da523b3dfcc3769986f7c1880f2fbab0012
- Sigstore transparency entry: 1445050203
- Sigstore integration time: May 6, 2026
Source repository:
- Permalink: heleninsights-dot/phd-deepread-workflow@8185b5de6b8425bdc0b1eb2bd519bd87bce9112e
- Branch / Tag: refs/heads/main
- Owner: https://github.com/heleninsights-dot
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8185b5de6b8425bdc0b1eb2bd519bd87bce9112e
- Trigger Event: push

File details

Details for the file phd_deepread_workflow-0.2.0-py3-none-any.whl.

File metadata

Download URL: phd_deepread_workflow-0.2.0-py3-none-any.whl
Upload date: May 6, 2026
Size: 32.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for phd_deepread_workflow-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bb6dbbde889469000418751c1e8096b30a5cff20c275741aa9316228ef36ca4a`
MD5	`a46281ddbbd1f97920825f130da72804`
BLAKE2b-256	`18b81abcc3d3d37f41e6cb174584629bad019b8d8df0eab7cddd174b28e3fa20`

See more details on using hashes here.

Provenance

The following attestation bundles were made for phd_deepread_workflow-0.2.0-py3-none-any.whl:

Publisher: publish.yml on heleninsights-dot/phd-deepread-workflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: phd_deepread_workflow-0.2.0-py3-none-any.whl
- Subject digest: bb6dbbde889469000418751c1e8096b30a5cff20c275741aa9316228ef36ca4a
- Sigstore transparency entry: 1445050310
- Sigstore integration time: May 6, 2026
Source repository:
- Permalink: heleninsights-dot/phd-deepread-workflow@8185b5de6b8425bdc0b1eb2bd519bd87bce9112e
- Branch / Tag: refs/heads/main
- Owner: https://github.com/heleninsights-dot
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8185b5de6b8425bdc0b1eb2bd519bd87bce9112e
- Trigger Event: push

phd-deepread-workflow 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

feature: thumbnails/external/74a4c4ea2d920c8d9a05a7420946145d.svg thumbnail: thumbnails/external/74a4c4ea2d920c8d9a05a7420946145d.svg

PhD Deep Read Workflow

What you get

Installation

Step 1 — Install Python (if you don't have it)

Step 2 — Install the workflow

Step 3 — Set up an AI provider (required for note generation)

Step 4 — Install Tesseract (optional — only for scanned PDFs)

Verify everything is ready

Running the workflow

The simplest way — one command

Step by step (if you prefer more control)

Batch process a whole folder of PDFs

What to do with the output

Troubleshooting

All commands

Integration with Obsidian and Zotero

Contributing

License

Support

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance