Download and organise PDFs from Physics & Maths Tutor pages

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

Chromed

These details have not been verified by PyPI

Project description

pmt-scraper

Download and organise PDFs from Physics & Maths Tutor pages.

Point it at any PMT page that lists PDF links and it scrapes every PDF, sorts them into folders, and downloads them politely (rate-limited, resumable, skips existing files).

Install

pip install requests beautifulsoup4

Usage

python pmt_scrape.py <url> [options]

Options

Output

Flag	Default	Description
`--out <dir>`	`downloads`	Root output folder
`--organise heading`	✓	Group by section heading on the page
`--organise path`		Mirror PMT's own folder structure
`--organise flat`		All files in one folder
`--delay <secs>`	`1.0`	Pause between downloads (be polite)
`--dry-run`		Print what would be saved, download nothing

Filtering

Flag	Description
`--keywords k1 k2 …`	Filter by keywords — see syntax below
`--years y1 y2 …`	Keep only PDFs mentioning any of these years
`--year-range FROM TO`	Keep only PDFs whose year falls within FROM–TO (inclusive)

--years and --year-range can be used together; both constraints must pass (AND).

Keyword syntax — prefix each token to control how it matches:

Prefix	Meaning
`word` or `+word`	Must be present (positive)
`-word`	Must be absent (negative)

Matching is case-insensitive and searches the section heading, link text, and filename. Years embedded in PMT's URL paths (e.g. .../2019/...) are detected automatically. Undated files are always kept.

Examples

# All papers, grouped by heading
python pmt_scrape.py https://www.physicsandmathstutor.com/maths-revision/a-level-papers/

# Mark schemes only
python pmt_scrape.py <url> --keywords "mark scheme"

# Mark schemes only (positive keyword)
python pmt_scrape.py <url> --keywords +markscheme

# Mark schemes, excluding question papers
python pmt_scrape.py <url> --keywords +markscheme -questions

# Papers from 2018 to 2022
python pmt_scrape.py <url> --year-range 2018 2022

# Mark schemes for specific years (combine --years and --year-range)
python pmt_scrape.py <url> --keywords +markscheme --years 2019 2021 2023 --year-range 2019 2023

# Paper 1 only, no mark schemes, preview before downloading
python pmt_scrape.py <url> --keywords +paper1 -markscheme --dry-run

# Mirror PMT's folder structure
python pmt_scrape.py <url> --organise path

Project structure

pmt scraper/
├── pmt_scrape.py          # entry point
├── pmt_scraper/
│   ├── __init__.py
│   ├── cli.py             # argument parsing and main loop
│   ├── scraper.py         # page fetching and PDF link extraction
│   ├── downloader.py      # file download and output path logic
│   ├── filters.py         # keyword and year filtering
│   └── utils.py           # filename sanitisation, URL helpers
└── downloads/             # default output folder

Notes

Downloads use a .part suffix until complete — interrupted runs are safe to resume.
Files already present (non-zero size) are skipped automatically.
Pages that load links via JavaScript will not work; PMT's static pages are fine.

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

Chromed

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jun 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pmt_scraper-0.1.0.tar.gz (7.4 kB view details)

Uploaded Jun 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pmt_scraper-0.1.0-py3-none-any.whl (8.9 kB view details)

Uploaded Jun 4, 2026 Python 3

File details

Details for the file pmt_scraper-0.1.0.tar.gz.

File metadata

Download URL: pmt_scraper-0.1.0.tar.gz
Upload date: Jun 4, 2026
Size: 7.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pmt_scraper-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`f6ac24c67813c160eee3126f0bff7958c89ae65ffc7365ab4d7505ebeead6196`
MD5	`e1b777969be815ec3ae29a7f86300e40`
BLAKE2b-256	`f2fc7aac9c5c8847283abcfdd46cfdba44c9a0f7e2fd2b5261609fd069bc6d9b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pmt_scraper-0.1.0.tar.gz:

Publisher: publish.yml on yvanlok/pmt-scraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pmt_scraper-0.1.0.tar.gz
- Subject digest: f6ac24c67813c160eee3126f0bff7958c89ae65ffc7365ab4d7505ebeead6196
- Sigstore transparency entry: 1722727558
- Sigstore integration time: Jun 4, 2026
Source repository:
- Permalink: yvanlok/pmt-scraper@4ae9894f474cc947428f24d69915372aa3049b33
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/yvanlok
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4ae9894f474cc947428f24d69915372aa3049b33
- Trigger Event: push

File details

Details for the file pmt_scraper-0.1.0-py3-none-any.whl.

File metadata

Download URL: pmt_scraper-0.1.0-py3-none-any.whl
Upload date: Jun 4, 2026
Size: 8.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pmt_scraper-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`457133ced104c539a857d3756b041dc4fbc7a945930155b1da5c030b9e451fb3`
MD5	`9f9c9a6fea75048b66260b64bc44c177`
BLAKE2b-256	`eee12c0fba6c37eabd1d529c718cf3a1b2ebe9de9cc4317d958915ec07f5d3e7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pmt_scraper-0.1.0-py3-none-any.whl:

Publisher: publish.yml on yvanlok/pmt-scraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pmt_scraper-0.1.0-py3-none-any.whl
- Subject digest: 457133ced104c539a857d3756b041dc4fbc7a945930155b1da5c030b9e451fb3
- Sigstore transparency entry: 1722727655
- Sigstore integration time: Jun 4, 2026
Source repository:
- Permalink: yvanlok/pmt-scraper@4ae9894f474cc947428f24d69915372aa3049b33
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/yvanlok
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4ae9894f474cc947428f24d69915372aa3049b33
- Trigger Event: push

pmt-scraper 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

pmt-scraper

Install

Usage

Options

Output

Filtering

Examples

Project structure

Notes

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance