Skip to main content

CLI access to ALL_POSTS.txt, ARCHIVE.txt and documentation from vojtamaur.cz

Project description

vojtamaur

A small pip install command-line tool for working with selected public artifacts of vojtamaur.cz:

  • ALL_POSTS.txt
  • ARCHIVE.txt
  • the documentation page at /documentation/
  • kurt-godel-rat.jpg

The tool is not a website parser, crawler, CMS, sync daemon, image processor, or background service. It works mainly with published plaintext/HTML/image endpoints, keeps a local cache of the last successfully loaded artifacts, and includes embedded package snapshots of ALL_POSTS.txt, ARCHIVE.txt, and kurt-godel-rat.jpg as final fallbacks.

Installation

From PyPI:

python -m pip install vojtamaur

From a local repository checkout:

python -m pip install .

For development:

python -m pip install -e .

From GitHub:

python -m pip install git+https://github.com/VojtaMaur/vojtamaur-python.git

From a specific Git tag:

python -m pip install git+https://github.com/VojtaMaur/vojtamaur-python.git@v0.1.4

Quick usage

vojtamaur --help
vojtamaur posts
vojtamaur posts --save
vojtamaur archive
vojtamaur archive --save
vojtamaur docs
vojtamaur docs --save
vojtamaur rat
vojtamaur rat --save
vojtamaur rat --offline
vojtamaur grep metaprogram
vojtamaur search-url archive.today
vojtamaur stats
vojtamaur head 40
vojtamaur random
vojtamaur random --print-only
vojtamaur status
vojtamaur status --limit 10
vojtamaur verify
vojtamaur open site
vojtamaur open docs
vojtamaur open rat
vojtamaur open rat --offline
vojtamaur open archive-link 1

Artifacts

The CLI currently knows these artifact kinds:

Command / kind Online path Embedded in package Notes
posts /ALL_POSTS.txt yes Plain-text export of all posts.
archive /ARCHIVE.txt yes Archive map and external preservation links.
docs /documentation/ no Documentation HTML; cached after a successful fetch.
rat /images/kurt-godel-rat.jpg yes Binary image artifact. Saved as a file, not printed to the terminal.

Source strategy

For each artifact, the tool tries to use the most current available source first.

For posts, archive, and rat, the fallback order is:

  1. primary URL from SITE_SOURCES
  2. additional live fallback URLs from SITE_SOURCES, in order
  3. local cache
  4. embedded package snapshot

For docs, the fallback order is:

  1. primary URL from SITE_SOURCES
  2. additional live fallback URLs from SITE_SOURCES, in order
  3. local cache

docs is not embedded in the package. The embedded snapshots are intended for compact, directly reusable artifacts, not for every page of the website.

If all available sources fail, the command exits with an error.

Adding another live fallback

Live deployment fallbacks are configured in src/vojtamaur/constants.py through SITE_SOURCES:

SITE_SOURCES: list[tuple[str, str]] = [
    ("main", "https://vojtamaur.cz"),
    ("fallback", "https://vojtamaur.neocities.org"),
    ("github_pages", "https://vojtamaur.github.io/vojtamaur-web"),
    (
        "ardrive",
        "https://db6beycsnxhli2vxsahgn3ajpsi6qv5alttkr4d3sfwrj7uurqfq.ardrive.net/GHwSYFJtzrRqt5AOZuwJfJHoV6Bc5qjwe5FtFP6UjAs",
    ),
]

To add another deployment, append another (label, base_url) tuple. Use a base URL without a trailing slash. The posts, archive, docs, and rat endpoint URLs are generated from this list. The order is the priority order used by the fetch logic.

Only add live deployments that expose the same relative artifact paths. Repository browsers, catalog records, web archives, and one-off snapshots belong in ARCHIVE.txt, not in SITE_SOURCES.

Cache

The cache location is platform-specific:

  • Windows: %LOCALAPPDATA%/vojtamaur/
  • macOS: ~/Library/Caches/vojtamaur/
  • Linux/Unix: $XDG_CACHE_HOME/vojtamaur/ or ~/.cache/vojtamaur/

Override on Windows CMD:

set VOJTAMAUR_CACHE_DIR=C:\temp\vojtamaur-cache

Override on PowerShell:

$env:VOJTAMAUR_CACHE_DIR = "C:\temp\vojtamaur-cache"

Override on Unix-like systems:

export VOJTAMAUR_CACHE_DIR=/tmp/vojtamaur-cache

Offline mode

vojtamaur posts --offline
vojtamaur archive --offline
vojtamaur docs --offline
vojtamaur rat --offline
vojtamaur stats --offline
vojtamaur open rat --offline

Or globally through the environment:

export VOJTAMAUR_OFFLINE=1

In offline mode, posts, archive, and rat use the local cache first and the embedded package snapshot if no cache exists. docs uses only the local cache because documentation HTML is not embedded.

Timeout

The default network timeout is 3 seconds.

vojtamaur status --timeout 5

Or:

export VOJTAMAUR_TIMEOUT=5

Commands

posts

Prints or saves ALL_POSTS.txt.

vojtamaur posts
vojtamaur posts --save
vojtamaur posts --save my_copy.txt

archive

Prints or saves ARCHIVE.txt.

vojtamaur archive
vojtamaur archive --save
vojtamaur archive --save my_archive.txt

docs

Downloads the documentation page from /documentation/. By default, it prints a simple plaintext extraction from the HTML. With --raw, it prints the original HTML. With --save, it saves the raw HTML.

vojtamaur docs
vojtamaur docs --raw
vojtamaur docs --save

rat

Downloads kurt-godel-rat.jpg.

Because this is a binary artifact, the command saves the image file instead of printing raw JPEG bytes to the terminal.

vojtamaur rat
vojtamaur rat --save
vojtamaur rat --save my_rat.jpg
vojtamaur rat --offline
vojtamaur kurt-godel-rat --offline

grep

Searches ALL_POSTS.txt as plain text.

vojtamaur grep DullGPT
vojtamaur grep "Boltzmannovy mozky" --context 2
vojtamaur grep Metaweb --case-sensitive

search-url

Searches URLs found in ARCHIVE.txt.

vojtamaur search-url arquivo
vojtamaur search-url archive.today

stats

Prints basic statistics for ALL_POSTS.txt and ARCHIVE.txt: byte size, character count, word count, line count, entry count, unique slug count, languages, sections, and the number of unique archive links.

vojtamaur stats
vojtamaur stats --offline

head

Prints the first N lines of ALL_POSTS.txt.

vojtamaur head
vojtamaur head 80

random

Selects a random URL from URL: headers in ALL_POSTS.txt.

vojtamaur random
vojtamaur random --print-only

By default, the selected URL is also opened in the browser.

status

Checks URLs found in ARCHIVE.txt.

vojtamaur status
vojtamaur status --limit 10

The command uses HEAD first and falls back to GET for selected failures. Plain HTTP URLs are marked as INSECURE_HTTP.

verify

Runs a basic health check:

  • primary and fallback sources for posts, archive, and docs
  • embedded package snapshots for text artifacts
  • local cache writability
  • cache decoding for text/HTML cache files, if they exist
  • URL parsing from ALL_POSTS.txt and ARCHIVE.txt
vojtamaur verify

This is a practical availability and parser check. It is not a cryptographic provenance system. If you need strict integrity verification, use the website's generated checksum artifacts or external SHA-256 tooling.

open

Opens a known target or explicit URL. For posts, archive, docs, and rat, normal mode opens the canonical online URL. With --offline, it opens the corresponding local cache file.

vojtamaur open site
vojtamaur open fallback
vojtamaur open posts
vojtamaur open archive
vojtamaur open docs
vojtamaur open rat
vojtamaur open posts --offline
vojtamaur open archive --offline
vojtamaur open docs --offline
vojtamaur open rat --offline
vojtamaur open random
vojtamaur open archive-link 1
vojtamaur open https://vojtamaur.cz/metawebovy-clanek/

Embedded snapshots

The package includes bundled fallback copies of:

  • src/vojtamaur/data/ALL_POSTS.txt
  • src/vojtamaur/data/ARCHIVE.txt
  • src/vojtamaur/data/kurt-godel-rat.jpg

These files make the installed package partially useful even if the website, fallback deployments, and cache are unavailable. They also make package distributions, wheels, source archives, PyPI mirrors, pip caches, and installed environments act as additional copies of selected public artifacts.

Before publishing a new release, refresh the embedded snapshots:

python scripts/refresh_embedded_data.py

Then verify:

python -m unittest
vojtamaur verify
vojtamaur stats --offline
vojtamaur rat --offline

What this tool does not do

  • it does not parse the rendered website as the source of articles
  • it does not download Markdown/MDX source files
  • it does not synchronize the repository
  • it does not compute diffs
  • it does not store a database
  • it does not run in the background
  • it does not process or rewrite image pixels
  • it does not replace external checksum verification
  • it has no runtime dependencies outside the Python standard library

Limitations

ALL_POSTS.txt is a text export, not a complete replica of the website. Media, iframes, PDFs, and selected long blocks are represented by placeholders or omitted. This is intentional. The tool works with the text sediment, not the full rendered website.

kurt-godel-rat.jpg is treated as a binary public artifact. The CLI downloads, caches, saves, and opens it; it does not inspect or modify its image metadata.

The embedded snapshots are release snapshots. The live endpoints remain the preferred source when available.

Tests

python -m unittest

Build

python -m pip install build
python -m build

Publishing

Publish through PyPI Trusted Publishing in GitHub Actions or upload manually with Twine. A PyPI account and project access are required.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vojtamaur-0.1.4.tar.gz (158.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vojtamaur-0.1.4-py3-none-any.whl (154.4 kB view details)

Uploaded Python 3

File details

Details for the file vojtamaur-0.1.4.tar.gz.

File metadata

  • Download URL: vojtamaur-0.1.4.tar.gz
  • Upload date:
  • Size: 158.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.6

File hashes

Hashes for vojtamaur-0.1.4.tar.gz
Algorithm Hash digest
SHA256 ff1b435ded935c3b195c394230a40f614b0e3cacc7c497b6def8cd32cef785e7
MD5 16db81b0b267078c8a133f24075da12d
BLAKE2b-256 b078d9d3fb2e0fc51d358261a0d4993231b0ff45d192f5623ccd3ae6dd005e0d

See more details on using hashes here.

File details

Details for the file vojtamaur-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: vojtamaur-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 154.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.6

File hashes

Hashes for vojtamaur-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 5241a4e27aec6ac9db3dca3f321f9dc83f3f87487b80da1e7545eb1883aadaa8
MD5 76ebe9f9001f622801ea16dd22e28b64
BLAKE2b-256 be52edfccb1545f713a932c36e47fb4922e1591593c9de32d2651dc019c5a911

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page