Skip to main content

CLI access to ALL_POSTS.txt, ARCHIVE.txt and documentation from vojtamaur.cz

Project description

vojtamaur

A small pip install command-line tool for working with the public text artifacts of vojtamaur.cz:

  • ALL_POSTS.txt
  • ARCHIVE.txt
  • the documentation page at /documentation/

The tool is not a website parser, crawler, CMS, sync daemon, or background service. It works mainly with published plaintext/HTML endpoints, keeps a local cache of the last successfully loaded artifacts, and also includes embedded package snapshots of ALL_POSTS.txt and ARCHIVE.txt as a final fallback.

Installation

From a local repository checkout:

python -m pip install .

For development:

python -m pip install -e .

After publication on PyPI:

python -m pip install vojtamaur

From GitHub:

python -m pip install git+https://github.com/VojtaMaur/vojtamaur-python.git

From a specific Git tag:

python -m pip install git+https://github.com/VojtaMaur/vojtamaur-python.git@v0.1.3

Quick usage

vojtamaur --help
vojtamaur posts
vojtamaur posts --save
vojtamaur archive
vojtamaur archive --save
vojtamaur docs
vojtamaur docs --save
vojtamaur grep metaprogram
vojtamaur search-url archive.today
vojtamaur stats
vojtamaur head 40
vojtamaur random
vojtamaur random --print-only
vojtamaur status
vojtamaur status --limit 10
vojtamaur verify
vojtamaur open site
vojtamaur open docs
vojtamaur open archive-link 1

Source strategy

For each artifact, the tool tries to use the most current available source first.

For posts and archive, the fallback order is:

  1. primary URL from SITE_SOURCES
  2. additional live fallback URLs from SITE_SOURCES, in order
  3. local cache
  4. embedded package snapshot

For docs, the fallback order is:

  1. primary URL from SITE_SOURCES
  2. additional live fallback URLs from SITE_SOURCES, in order
  3. local cache

docs is not embedded in the package. The package snapshot is intended only for the two compact plaintext artifacts.

If all available sources fail, the command exits with an error.

Adding another live fallback

Live deployment fallbacks are configured in src/vojtamaur/constants.py through SITE_SOURCES:

SITE_SOURCES = [
    ("main", "https://vojtamaur.cz"),
    ("fallback", "https://vojtamaur.neocities.org"),
]

To add another deployment, append another (label, base_url) tuple. The posts, archive, and docs endpoint URLs are generated from this list. The order is the priority order used by the fetch logic.

Cache

The cache location is platform-specific:

  • Windows: %LOCALAPPDATA%/vojtamaur/
  • macOS: ~/Library/Caches/vojtamaur/
  • Linux/Unix: $XDG_CACHE_HOME/vojtamaur/ or ~/.cache/vojtamaur/

Override on Windows CMD:

set VOJTAMAUR_CACHE_DIR=C:\temp\vojtamaur-cache

Override on PowerShell:

$env:VOJTAMAUR_CACHE_DIR = "C:\temp\vojtamaur-cache"

Override on Unix-like systems:

export VOJTAMAUR_CACHE_DIR=/tmp/vojtamaur-cache

Offline mode

vojtamaur posts --offline
vojtamaur archive --offline
vojtamaur docs --offline
vojtamaur stats --offline

Or globally through the environment:

export VOJTAMAUR_OFFLINE=1

In offline mode, posts and archive use the local cache first and the embedded package snapshot if no cache exists. docs uses only the local cache because documentation HTML is not embedded.

Timeout

The default network timeout is 3 seconds.

vojtamaur status --timeout 5

Or:

export VOJTAMAUR_TIMEOUT=5

Commands

posts

Prints or saves ALL_POSTS.txt.

vojtamaur posts
vojtamaur posts --save
vojtamaur posts --save my_copy.txt

archive

Prints or saves ARCHIVE.txt.

vojtamaur archive
vojtamaur archive --save

docs

Downloads the documentation page from /documentation/. By default, it prints a simple plaintext extraction from the HTML. With --raw, it prints the original HTML. With --save, it saves the raw HTML.

vojtamaur docs
vojtamaur docs --raw
vojtamaur docs --save

grep

Searches ALL_POSTS.txt as plain text.

vojtamaur grep DullGPT
vojtamaur grep "Boltzmannovy mozky" --context 2
vojtamaur grep Metaweb --case-sensitive

search-url

Searches URLs found in ARCHIVE.txt.

vojtamaur search-url arquivo
vojtamaur search-url archive.today

stats

Prints basic statistics: byte size, character count, word count, line count, entry count, unique slug count, languages, sections, and the number of unique archive links.

vojtamaur stats

head

Prints the first N lines of ALL_POSTS.txt.

vojtamaur head
vojtamaur head 80

random

Selects a random URL from URL: headers in ALL_POSTS.txt.

vojtamaur random
vojtamaur random --print-only

By default, the selected URL is also opened in the browser.

status

Checks URLs found in ARCHIVE.txt.

vojtamaur status
vojtamaur status --limit 10

The command uses HEAD first and falls back to GET for selected failures. Plain HTTP URLs are marked as INSECURE_HTTP.

verify

Runs a basic health check:

  • primary and fallback sources for posts, archive, and docs
  • embedded package snapshots for posts and archive
  • local cache writability
  • cache decoding, if cache files exist
  • URL parsing from ALL_POSTS.txt and ARCHIVE.txt
vojtamaur verify

open

Opens a known target or explicit URL. For posts, archive, and docs, normal mode opens the canonical online URL. With --offline, it opens the corresponding local cache file.

vojtamaur open site
vojtamaur open fallback
vojtamaur open posts
vojtamaur open archive
vojtamaur open docs
vojtamaur open posts --offline
vojtamaur open archive --offline
vojtamaur open docs --offline
vojtamaur open random
vojtamaur open archive-link 1
vojtamaur open https://vojtamaur.cz/metawebovy-clanek/

Embedded snapshots

The package includes bundled fallback copies of:

  • src/vojtamaur/data/ALL_POSTS.txt
  • src/vojtamaur/data/ARCHIVE.txt

These files make the installed package partially useful even if the website, fallback deployment, and cache are unavailable. They also make package distributions, wheels, source archives, PyPI mirrors, pip caches, and installed environments act as additional copies of the text artifacts.

Before publishing a new release, refresh the embedded snapshots:

python scripts/refresh_embedded_data.py

Then verify:

python -m unittest
vojtamaur verify
vojtamaur stats --offline

What this tool does not do

  • it does not parse the rendered website as the source of articles
  • it does not download Markdown/MDX source files
  • it does not synchronize the repository
  • it does not compute diffs
  • it does not store a database
  • it does not run in the background
  • it has no runtime dependencies outside the Python standard library

Limitations

ALL_POSTS.txt is a text export, not a complete replica of the website. Media, iframes, PDFs, and selected long blocks are represented by placeholders or omitted. This is intentional. The tool works with the text sediment, not the full rendered website.

The embedded snapshots are release snapshots. The live endpoints remain the preferred source when available.

Tests

python -m unittest

Build

python -m pip install build
python -m build

Embedded fallback snapshots

The Python CLI package also contains embedded fallback copies of selected text artifacts:

  • ALL_POSTS.txt
  • ARCHIVE.txt

These embedded files are bundled directly inside the package so the CLI can still function in degraded or offline situations even if the live website or mirrors become unavailable.

Before publishing a new package release, refresh the embedded snapshots:

python scripts/refresh_embedded_data.py

## Publishing

Publish through PyPI Trusted Publishing in GitHub Actions or upload manually with Twine. A PyPI account is required.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vojtamaur-0.1.3.tar.gz (108.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vojtamaur-0.1.3-py3-none-any.whl (106.0 kB view details)

Uploaded Python 3

File details

Details for the file vojtamaur-0.1.3.tar.gz.

File metadata

  • Download URL: vojtamaur-0.1.3.tar.gz
  • Upload date:
  • Size: 108.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.6

File hashes

Hashes for vojtamaur-0.1.3.tar.gz
Algorithm Hash digest
SHA256 d74d83a651ab5c2008d9925bbee2159005a437fc2607176690be5d319b87d5bc
MD5 97c73176b8c064af536c769101745110
BLAKE2b-256 fab888203b636ae07e00416ddeb635057a4fda8087f69a175df795377a26a3e0

See more details on using hashes here.

File details

Details for the file vojtamaur-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: vojtamaur-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 106.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.6

File hashes

Hashes for vojtamaur-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c8822ec0ec38c91fc15a61d1dc049ec0e6433c7dc015e17870d73343fd246d05
MD5 da02e5896f3186684e5dfafb82ce2e2b
BLAKE2b-256 38b15aae036094606665aa3d5737af5a6a0fe27342958bbf8c4ad35e79aca6bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page