CLI access to ALL_POSTS.txt, ARCHIVE.txt and documentation from vojtamaur.cz
Project description
vojtamaur
A small pip install command-line tool for working with selected public artifacts of vojtamaur.cz:
ALL_POSTS.txtARCHIVE.txt- the documentation page at
/documentation/ kurt-godel-rat.jpg
The tool is not a website parser, crawler, CMS, sync daemon, image processor, or background service. It works mainly with published plaintext/HTML/image endpoints, keeps a local cache of the last successfully loaded artifacts, and includes embedded package snapshots of ALL_POSTS.txt, ARCHIVE.txt, and kurt-godel-rat.jpg as final fallbacks.
Installation
From PyPI:
python -m pip install vojtamaur
From a local repository checkout:
python -m pip install .
For development:
python -m pip install -e .
From GitHub:
python -m pip install git+https://github.com/VojtaMaur/vojtamaur-python.git
From a specific Git tag:
python -m pip install git+https://github.com/VojtaMaur/vojtamaur-python.git@v0.1.4
Quick usage
vojtamaur --help
vojtamaur posts
vojtamaur posts --save
vojtamaur archive
vojtamaur archive --save
vojtamaur docs
vojtamaur docs --save
vojtamaur rat
vojtamaur rat --save
vojtamaur rat --offline
vojtamaur grep metaprogram
vojtamaur search-url archive.today
vojtamaur stats
vojtamaur head 40
vojtamaur random
vojtamaur random --print-only
vojtamaur status
vojtamaur status --limit 10
vojtamaur verify
vojtamaur open site
vojtamaur open docs
vojtamaur open rat
vojtamaur open rat --offline
vojtamaur open archive-link 1
Artifacts
The CLI currently knows these artifact kinds:
| Command / kind | Online path | Embedded in package | Notes |
|---|---|---|---|
posts |
/ALL_POSTS.txt |
yes | Plain-text export of all posts. |
archive |
/ARCHIVE.txt |
yes | Archive map and external preservation links. |
docs |
/documentation/ |
no | Documentation HTML; cached after a successful fetch. |
rat |
/images/kurt-godel-rat.jpg |
yes | Binary image artifact. Saved as a file, not printed to the terminal. |
Source strategy
For each artifact, the tool tries to use the most current available source first.
For posts, archive, and rat, the fallback order is:
- primary URL from
SITE_SOURCES - additional live fallback URLs from
SITE_SOURCES, in order - local cache
- embedded package snapshot
For docs, the fallback order is:
- primary URL from
SITE_SOURCES - additional live fallback URLs from
SITE_SOURCES, in order - local cache
docs is not embedded in the package. The embedded snapshots are intended for compact, directly reusable artifacts, not for every page of the website.
If all available sources fail, the command exits with an error.
Adding another live fallback
Live deployment fallbacks are configured in src/vojtamaur/constants.py through SITE_SOURCES:
SITE_SOURCES: list[tuple[str, str]] = [
("main", "https://vojtamaur.cz"),
("fallback", "https://vojtamaur.neocities.org"),
("github_pages", "https://vojtamaur.github.io/vojtamaur-web"),
(
"ardrive",
"https://db6beycsnxhli2vxsahgn3ajpsi6qv5alttkr4d3sfwrj7uurqfq.ardrive.net/GHwSYFJtzrRqt5AOZuwJfJHoV6Bc5qjwe5FtFP6UjAs",
),
]
To add another deployment, append another (label, base_url) tuple. Use a base URL without a trailing slash. The posts, archive, docs, and rat endpoint URLs are generated from this list. The order is the priority order used by the fetch logic.
Only add live deployments that expose the same relative artifact paths. Repository browsers, catalog records, web archives, and one-off snapshots belong in ARCHIVE.txt, not in SITE_SOURCES.
Cache
The cache location is platform-specific:
- Windows:
%LOCALAPPDATA%/vojtamaur/ - macOS:
~/Library/Caches/vojtamaur/ - Linux/Unix:
$XDG_CACHE_HOME/vojtamaur/or~/.cache/vojtamaur/
Override on Windows CMD:
set VOJTAMAUR_CACHE_DIR=C:\temp\vojtamaur-cache
Override on PowerShell:
$env:VOJTAMAUR_CACHE_DIR = "C:\temp\vojtamaur-cache"
Override on Unix-like systems:
export VOJTAMAUR_CACHE_DIR=/tmp/vojtamaur-cache
Offline mode
vojtamaur posts --offline
vojtamaur archive --offline
vojtamaur docs --offline
vojtamaur rat --offline
vojtamaur stats --offline
vojtamaur open rat --offline
Or globally through the environment:
export VOJTAMAUR_OFFLINE=1
In offline mode, posts, archive, and rat use the local cache first and the embedded package snapshot if no cache exists. docs uses only the local cache because documentation HTML is not embedded.
Timeout
The default network timeout is 3 seconds.
vojtamaur status --timeout 5
Or:
export VOJTAMAUR_TIMEOUT=5
Commands
posts
Prints or saves ALL_POSTS.txt.
vojtamaur posts
vojtamaur posts --save
vojtamaur posts --save my_copy.txt
archive
Prints or saves ARCHIVE.txt.
vojtamaur archive
vojtamaur archive --save
vojtamaur archive --save my_archive.txt
docs
Downloads the documentation page from /documentation/. By default, it prints a simple plaintext extraction from the HTML. With --raw, it prints the original HTML. With --save, it saves the raw HTML.
vojtamaur docs
vojtamaur docs --raw
vojtamaur docs --save
rat
Downloads kurt-godel-rat.jpg.
Because this is a binary artifact, the command saves the image file instead of printing raw JPEG bytes to the terminal.
vojtamaur rat
vojtamaur rat --save
vojtamaur rat --save my_rat.jpg
vojtamaur rat --offline
vojtamaur kurt-godel-rat --offline
grep
Searches ALL_POSTS.txt as plain text.
vojtamaur grep DullGPT
vojtamaur grep "Boltzmannovy mozky" --context 2
vojtamaur grep Metaweb --case-sensitive
search-url
Searches URLs found in ARCHIVE.txt.
vojtamaur search-url arquivo
vojtamaur search-url archive.today
stats
Prints basic statistics for ALL_POSTS.txt and ARCHIVE.txt: byte size, character count, word count, line count, entry count, unique slug count, languages, sections, and the number of unique archive links.
vojtamaur stats
vojtamaur stats --offline
head
Prints the first N lines of ALL_POSTS.txt.
vojtamaur head
vojtamaur head 80
random
Selects a random URL from URL: headers in ALL_POSTS.txt.
vojtamaur random
vojtamaur random --print-only
By default, the selected URL is also opened in the browser.
status
Checks URLs found in ARCHIVE.txt.
vojtamaur status
vojtamaur status --limit 10
The command uses HEAD first and falls back to GET for selected failures. Plain HTTP URLs are marked as INSECURE_HTTP.
verify
Runs a basic health check:
- primary and fallback sources for
posts,archive, anddocs - embedded package snapshots for text artifacts
- local cache writability
- cache decoding for text/HTML cache files, if they exist
- URL parsing from
ALL_POSTS.txtandARCHIVE.txt
vojtamaur verify
This is a practical availability and parser check. It is not a cryptographic provenance system. If you need strict integrity verification, use the website's generated checksum artifacts or external SHA-256 tooling.
open
Opens a known target or explicit URL. For posts, archive, docs, and rat, normal mode opens the canonical online URL. With --offline, it opens the corresponding local cache file.
vojtamaur open site
vojtamaur open fallback
vojtamaur open posts
vojtamaur open archive
vojtamaur open docs
vojtamaur open rat
vojtamaur open posts --offline
vojtamaur open archive --offline
vojtamaur open docs --offline
vojtamaur open rat --offline
vojtamaur open random
vojtamaur open archive-link 1
vojtamaur open https://vojtamaur.cz/metawebovy-clanek/
Embedded snapshots
The package includes bundled fallback copies of:
src/vojtamaur/data/ALL_POSTS.txtsrc/vojtamaur/data/ARCHIVE.txtsrc/vojtamaur/data/kurt-godel-rat.jpg
These files make the installed package partially useful even if the website, fallback deployments, and cache are unavailable. They also make package distributions, wheels, source archives, PyPI mirrors, pip caches, and installed environments act as additional copies of selected public artifacts.
Before publishing a new release, refresh the embedded snapshots:
python scripts/refresh_embedded_data.py
Then verify:
python -m unittest
vojtamaur verify
vojtamaur stats --offline
vojtamaur rat --offline
What this tool does not do
- it does not parse the rendered website as the source of articles
- it does not download Markdown/MDX source files
- it does not synchronize the repository
- it does not compute diffs
- it does not store a database
- it does not run in the background
- it does not process or rewrite image pixels
- it does not replace external checksum verification
- it has no runtime dependencies outside the Python standard library
Limitations
ALL_POSTS.txt is a text export, not a complete replica of the website. Media, iframes, PDFs, and selected long blocks are represented by placeholders or omitted. This is intentional. The tool works with the text sediment, not the full rendered website.
kurt-godel-rat.jpg is treated as a binary public artifact. The CLI downloads, caches, saves, and opens it; it does not inspect or modify its image metadata.
The embedded snapshots are release snapshots. The live endpoints remain the preferred source when available.
Tests
python -m unittest
Build
python -m pip install build
python -m build
Publishing
Publish through PyPI Trusted Publishing in GitHub Actions or upload manually with Twine. A PyPI account and project access are required.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vojtamaur-0.1.4.tar.gz.
File metadata
- Download URL: vojtamaur-0.1.4.tar.gz
- Upload date:
- Size: 158.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff1b435ded935c3b195c394230a40f614b0e3cacc7c497b6def8cd32cef785e7
|
|
| MD5 |
16db81b0b267078c8a133f24075da12d
|
|
| BLAKE2b-256 |
b078d9d3fb2e0fc51d358261a0d4993231b0ff45d192f5623ccd3ae6dd005e0d
|
File details
Details for the file vojtamaur-0.1.4-py3-none-any.whl.
File metadata
- Download URL: vojtamaur-0.1.4-py3-none-any.whl
- Upload date:
- Size: 154.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5241a4e27aec6ac9db3dca3f321f9dc83f3f87487b80da1e7545eb1883aadaa8
|
|
| MD5 |
76ebe9f9001f622801ea16dd22e28b64
|
|
| BLAKE2b-256 |
be52edfccb1545f713a932c36e47fb4922e1591593c9de32d2651dc019c5a911
|