A command-line tool designed to solve content preservation challenges with Ethical Scraping.

These details have not been verified by PyPI

Project links

Homepage

Project description

Capcat — A command-line tool designed to solve content preservation challenges with Ethical Scraping.

Captures articles from 17 sources as clean Markdown files with optional self-contained HTML output. Supports interactive TUI and batch automation.

Installation

pipx install capcat

Requires Python 3.8+.

Quick Start

# Interactive TUI
capcat catch

# Fetch a bundle
capcat bundle tech --count 10

# Fetch specific sources
capcat fetch hn,bbc --count 15

# Archive a single article
capcat single https://example.com/article

# List available sources
capcat list sources

# Show version
capcat --version

Capcat initializes the vault automatically on first run.

Commands

Command	Description
`catch`	Launch the interactive TUI
`single <url>`	Archive a single article
`fetch <sources>`	Batch fetch from sources (comma-separated)
`bundle <name>`	Fetch a pre-configured bundle
`list sources`	List all available sources
`list bundles`	List all available bundles
`add-source --url <url>`	Add a custom RSS/news source
`remove-source`	Remove a source
`generate-config`	Generate a YAML config
`init`	Explicitly scaffold vault (runs automatically on first use)

Options

Flag	Description
`--count N`	Number of articles to fetch (default: 30)
`--output DIR`	Output directory (default: current dir)
`--media`	Download images, video, audio, and PDF files
`--pdfs`	Download PDF files only (independent of --media)
`--html`	Generate self-contained HTML output
`--update`	Re-fetch and update existing articles
`-V, --verbose`	Verbose output
`-q, --quiet`	Quiet output
`-L <file>`	Log output to file
`--version`	Show version and exit
`--help`	Show help and exit

Bundles

Pre-configured topic collections:

Bundle	Sources	Description
`tech`	IEEE, Mashable	Consumer technology news
`techpro`	HN, Lobsters, InfoQ	Professional developer news
`ai`	MIT News, Google Research	AI research and developments
`science`	Nature, Scientific American	Scientific publications
`news`	BBC, Guardian	General news
`sports`	BBC Sport	Sports coverage

Available Sources

Tech Pro: Hacker News (hn), Lobsters (lb), InfoQ (iq)

Tech: IEEE Spectrum (ieee), Mashable (mashable)

AI: Google Research (google-research), MIT News (mitnews)

News: BBC (bbc), The Guardian (guardian)

Science: Nature (nature), Scientific American (scientificamerican)

Sports: BBC Sport (bbcsport)

Custom: Medium, Substack (add via capcat add-source)

Output Structure

Batch mode (`fetch` / `bundle`)

News/news_DD-MM-YYYY/
├── Hacker-News_DD-MM-YYYY/
│   ├── 01_Article_Title/
│   │   ├── article.md
│   │   ├── comments.md
│   │   ├── html/
│   │   │   ├── article.html
│   │   │   └── comments.html
│   │   └── images/
│   └── 02_Another_Article/
└── BBC_DD-MM-YYYY/

Single article mode

Capcats/cc_DD-MM-YYYY-Title/
├── article.md
├── html/
│   └── article.html
└── images/

HTML output is fully self-contained — embedded CSS, no external dependencies. Open in any browser, share via email, archive permanently.

Configuration

Optional capcat.yml in your project directory:

output_base_dir: "../MyNews"
max_workers: 8
download_media: false

Config priority: CLI flag, TUI prompt, per-source Config/sources/active/<source>/config.yaml, Config/Global-settings.yaml.

Automation

# Daily tech news
0 9 * * * cd ~/news && capcat bundle tech --count 20 --html

# Weekly science digest
0 10 * * 0 cd ~/news && capcat bundle science --count 30 --media

Privacy and Ethics

Usernames anonymized as "Anonymous" in comment archives
Respects robots.txt
Rate limiting: 1 request per 10 seconds
Prefers RSS/APIs over HTML scraping
No paywall circumvention
Proper source attribution

Documentation

Full documentation at capcat.org:

Contributing

Open an issue or pull request on GitHub.

License

MIT License — see LICENSE.txt

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.9.85

May 6, 2026

1.9.84

May 6, 2026

1.9.83

May 6, 2026

1.9.82

May 6, 2026

1.9.81

May 6, 2026

1.9.80

May 6, 2026

1.9.79

May 6, 2026

1.9.78

May 5, 2026

1.9.77

May 5, 2026

1.9.76

May 5, 2026

1.9.75

May 5, 2026

1.9.74

May 5, 2026

1.9.73

May 1, 2026

1.9.72

May 1, 2026

1.9.71

May 1, 2026

This version

1.9.70

Apr 26, 2026

1.9.69

Apr 26, 2026

1.9.68

Apr 25, 2026

1.9.67

Apr 24, 2026

1.9.66

Apr 24, 2026

1.9.65

Apr 24, 2026

1.9.64

Apr 24, 2026

1.9.63

Apr 24, 2026

1.9.62

Apr 24, 2026

1.9.61

Apr 23, 2026

1.9.60

Apr 23, 2026

1.9.59

Apr 23, 2026

1.9.58

Apr 19, 2026

1.9.57

Apr 19, 2026

1.9.56

Apr 18, 2026

1.9.55

Apr 18, 2026

1.9.54

Apr 17, 2026

1.9.53

Apr 17, 2026

1.9.52

Apr 17, 2026

1.9.51

Apr 17, 2026

1.9.50

Apr 17, 2026

1.9.49

Apr 17, 2026

1.9.48

Apr 16, 2026

1.9.47

Apr 16, 2026

1.9.46

Apr 16, 2026

1.9.45

Apr 16, 2026

1.9.44

Apr 16, 2026

1.9.43

Apr 16, 2026

1.9.41

Apr 16, 2026

1.9.40

Apr 16, 2026

1.9.39

Apr 15, 2026

1.9.38

Apr 15, 2026

1.9.37

Apr 15, 2026

1.9.36

Apr 15, 2026

1.9.34

Apr 15, 2026

1.9.33

Apr 15, 2026

1.9.32

Apr 15, 2026

1.9.31

Apr 13, 2026

1.9.30

Apr 13, 2026

1.9.29

Apr 13, 2026

1.9.28

Apr 13, 2026

1.9.27

Apr 13, 2026

1.9.26

Apr 13, 2026

1.9.25

Apr 12, 2026

1.9.24

Apr 12, 2026

1.9.23

Apr 12, 2026

1.9.22

Apr 5, 2026

1.9.21

Apr 5, 2026

1.9.20

Apr 5, 2026

1.9.19

Apr 5, 2026

1.9.18

Apr 4, 2026

1.9.17

Apr 4, 2026

1.9.16

Apr 4, 2026

1.9.14

Apr 4, 2026

1.9.13

Apr 4, 2026

1.9.12

Apr 3, 2026

1.9.11

Apr 3, 2026

1.9.10

Apr 3, 2026

1.9.9

Apr 3, 2026

1.9.8

Apr 3, 2026

1.9.7

Apr 3, 2026

1.9.6

Apr 3, 2026

1.9.5

Apr 3, 2026

1.9.4

Apr 3, 2026

1.9.3

Apr 3, 2026

1.9.2

Apr 3, 2026

1.9.1

Apr 3, 2026

1.9.0

Apr 3, 2026

1.8.5

Apr 2, 2026

1.8.4

Apr 1, 2026

1.8.3

Apr 1, 2026

1.8.2

Apr 1, 2026

1.8.1

Apr 1, 2026

1.7.2

Apr 1, 2026

1.7.1

Apr 1, 2026

1.7.0

Apr 1, 2026

1.6.8

Mar 31, 2026

1.6.7

Mar 31, 2026

1.6.6

Mar 30, 2026

1.6.5

Mar 30, 2026

1.6.4

Mar 30, 2026

1.6.3

Mar 27, 2026

1.6.2

Mar 27, 2026

1.6.1

Mar 27, 2026

1.6.0

Mar 27, 2026

1.5.12

Mar 27, 2026

1.5.10

Mar 27, 2026

1.5.9

Mar 27, 2026

1.5.8

Mar 27, 2026

1.5.7

Mar 26, 2026

1.5.6

Mar 26, 2026

1.5.5

Mar 26, 2026

1.5.4

Mar 26, 2026

1.5.3

Mar 26, 2026

1.5.2

Mar 26, 2026

1.5.1

Mar 26, 2026

1.5.0

Mar 26, 2026

1.4.28

Mar 25, 2026

1.4.26

Mar 25, 2026

1.4.25

Mar 25, 2026

1.4.24

Mar 25, 2026

1.4.23

Mar 24, 2026

1.4.22

Mar 24, 2026

1.4.21

Mar 24, 2026

1.4.20

Mar 24, 2026

1.4.19

Mar 24, 2026

1.4.17

Mar 24, 2026

1.4.16

Mar 24, 2026

1.4.15

Mar 24, 2026

1.4.13

Mar 24, 2026

1.4.12

Mar 24, 2026

1.4.11

Mar 24, 2026

1.4.10

Mar 24, 2026

1.4.9

Mar 24, 2026

1.4.8

Mar 24, 2026

1.4.7

Mar 24, 2026

1.4.6

Mar 24, 2026

1.4.4

Mar 24, 2026

1.4.1

Mar 20, 2026

1.4.0

Mar 20, 2026

1.3.1

Mar 19, 2026

1.3.0

Mar 19, 2026

1.2.0

Mar 19, 2026

1.1.9

Mar 19, 2026

1.1.8

Mar 19, 2026

1.1.7

Mar 19, 2026

1.1.6

Mar 19, 2026

1.1.5

Mar 19, 2026

1.1.4

Mar 19, 2026

1.1.0

Mar 18, 2026

1.0.49

Mar 18, 2026

1.0.48

Mar 18, 2026

1.0.47

Mar 17, 2026

1.0.46

Mar 17, 2026

1.0.45

Mar 17, 2026

1.0.44

Mar 17, 2026

1.0.43

Mar 17, 2026

1.0.42

Mar 16, 2026

1.0.41

Mar 16, 2026

1.0.40

Mar 16, 2026

1.0.39

Mar 16, 2026

1.0.38

Mar 16, 2026

1.0.37

Mar 16, 2026

1.0.36

Mar 15, 2026

1.0.35

Mar 15, 2026

1.0.32

Mar 15, 2026

1.0.31

Mar 15, 2026

1.0.30

Mar 14, 2026

1.0.29

Mar 14, 2026

1.0.28

Mar 14, 2026

1.0.27

Mar 14, 2026

1.0.26

Mar 14, 2026

1.0.25

Mar 14, 2026

1.0.24

Mar 14, 2026

1.0.23

Mar 14, 2026

1.0.22

Mar 14, 2026

1.0.21

Mar 14, 2026

1.0.20

Mar 14, 2026

1.0.19

Mar 14, 2026

1.0.18

Mar 14, 2026

1.0.17

Mar 14, 2026

1.0.16

Mar 14, 2026

1.0.15

Mar 14, 2026

1.0.14

Mar 14, 2026

1.0.13

Mar 14, 2026

1.0.12

Mar 14, 2026

1.0.11

Mar 14, 2026

1.0.10

Mar 14, 2026

1.0.9

Mar 14, 2026

1.0.8

Mar 14, 2026

1.0.7

Mar 14, 2026

1.0.6

Mar 14, 2026

1.0.5

Mar 14, 2026

1.0.4

Mar 14, 2026

1.0.3

Mar 14, 2026

1.0.2

Mar 14, 2026

1.0.1

Mar 14, 2026

1.0.0

Mar 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

capcat-1.9.70.tar.gz (388.3 kB view details)

Uploaded Apr 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

capcat-1.9.70-py3-none-any.whl (415.2 kB view details)

Uploaded Apr 26, 2026 Python 3

File details

Details for the file capcat-1.9.70.tar.gz.

File metadata

Download URL: capcat-1.9.70.tar.gz
Upload date: Apr 26, 2026
Size: 388.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for capcat-1.9.70.tar.gz
Algorithm	Hash digest
SHA256	`34ed60b083dbde21f395a4f384ddd71f7ad2168a5e5d6348347356b859a312ff`
MD5	`7a8cea60c780f27dc7f26a8c7035b86d`
BLAKE2b-256	`e1f568c3231243376f21f647eec5f741b0af87ba81d5850daa08ffaf81da0f2d`

See more details on using hashes here.

File details

Details for the file capcat-1.9.70-py3-none-any.whl.

File metadata

Download URL: capcat-1.9.70-py3-none-any.whl
Upload date: Apr 26, 2026
Size: 415.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for capcat-1.9.70-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6a0747c82cef2649ba24a43a0b95dd092ff82297d057d547080cb734b718d875`
MD5	`7428bb6c9d7fb784df39b03c932bec56`
BLAKE2b-256	`6b2789ec45a31e87fcd1b339170433cb39016062cd8fbe81b64549ba94a2c1bc`

See more details on using hashes here.

capcat 1.9.70

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Capcat — A command-line tool designed to solve content preservation challenges with Ethical Scraping.

Installation

Quick Start

Commands

Options

Bundles

Available Sources

Output Structure

Batch mode (fetch / bundle)

Single article mode

Configuration

Automation

Privacy and Ethics

Documentation

Contributing

License

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Batch mode (`fetch` / `bundle`)