CLI access to ALL_POSTS.txt, ARCHIVE.txt and documentation from vojtamaur.cz
Project description
vojtamaur
A small pip install command-line tool for working with the public text artifacts of vojtamaur.cz:
ALL_POSTS.txtARCHIVE.txt- the documentation page at
/documentation/
The tool is not a website parser, crawler, CMS, sync daemon, or background service. It works mainly with published plaintext/HTML endpoints, keeps a local cache of the last successfully loaded artifacts, and also includes embedded package snapshots of ALL_POSTS.txt and ARCHIVE.txt as a final fallback.
Installation
From a local repository checkout:
python -m pip install .
For development:
python -m pip install -e .
After publication on PyPI:
python -m pip install vojtamaur
From GitHub:
python -m pip install git+https://github.com/VojtaMaur/vojtamaur-python.git
From a specific Git tag:
python -m pip install git+https://github.com/VojtaMaur/vojtamaur-python.git@v0.1.3
Quick usage
vojtamaur --help
vojtamaur posts
vojtamaur posts --save
vojtamaur archive
vojtamaur archive --save
vojtamaur docs
vojtamaur docs --save
vojtamaur grep metaprogram
vojtamaur search-url archive.today
vojtamaur stats
vojtamaur head 40
vojtamaur random
vojtamaur random --print-only
vojtamaur status
vojtamaur status --limit 10
vojtamaur verify
vojtamaur open site
vojtamaur open docs
vojtamaur open archive-link 1
Source strategy
For each artifact, the tool tries to use the most current available source first.
For posts and archive, the fallback order is:
- primary URL from
SITE_SOURCES - additional live fallback URLs from
SITE_SOURCES, in order - local cache
- embedded package snapshot
For docs, the fallback order is:
- primary URL from
SITE_SOURCES - additional live fallback URLs from
SITE_SOURCES, in order - local cache
docs is not embedded in the package. The package snapshot is intended only for the two compact plaintext artifacts.
If all available sources fail, the command exits with an error.
Adding another live fallback
Live deployment fallbacks are configured in src/vojtamaur/constants.py through SITE_SOURCES:
SITE_SOURCES = [
("main", "https://vojtamaur.cz"),
("fallback", "https://vojtamaur.neocities.org"),
]
To add another deployment, append another (label, base_url) tuple. The posts, archive, and docs endpoint URLs are generated from this list. The order is the priority order used by the fetch logic.
Cache
The cache location is platform-specific:
- Windows:
%LOCALAPPDATA%/vojtamaur/ - macOS:
~/Library/Caches/vojtamaur/ - Linux/Unix:
$XDG_CACHE_HOME/vojtamaur/or~/.cache/vojtamaur/
Override on Windows CMD:
set VOJTAMAUR_CACHE_DIR=C:\temp\vojtamaur-cache
Override on PowerShell:
$env:VOJTAMAUR_CACHE_DIR = "C:\temp\vojtamaur-cache"
Override on Unix-like systems:
export VOJTAMAUR_CACHE_DIR=/tmp/vojtamaur-cache
Offline mode
vojtamaur posts --offline
vojtamaur archive --offline
vojtamaur docs --offline
vojtamaur stats --offline
Or globally through the environment:
export VOJTAMAUR_OFFLINE=1
In offline mode, posts and archive use the local cache first and the embedded package snapshot if no cache exists. docs uses only the local cache because documentation HTML is not embedded.
Timeout
The default network timeout is 3 seconds.
vojtamaur status --timeout 5
Or:
export VOJTAMAUR_TIMEOUT=5
Commands
posts
Prints or saves ALL_POSTS.txt.
vojtamaur posts
vojtamaur posts --save
vojtamaur posts --save my_copy.txt
archive
Prints or saves ARCHIVE.txt.
vojtamaur archive
vojtamaur archive --save
docs
Downloads the documentation page from /documentation/. By default, it prints a simple plaintext extraction from the HTML. With --raw, it prints the original HTML. With --save, it saves the raw HTML.
vojtamaur docs
vojtamaur docs --raw
vojtamaur docs --save
grep
Searches ALL_POSTS.txt as plain text.
vojtamaur grep DullGPT
vojtamaur grep "Boltzmannovy mozky" --context 2
vojtamaur grep Metaweb --case-sensitive
search-url
Searches URLs found in ARCHIVE.txt.
vojtamaur search-url arquivo
vojtamaur search-url archive.today
stats
Prints basic statistics: byte size, character count, word count, line count, entry count, unique slug count, languages, sections, and the number of unique archive links.
vojtamaur stats
head
Prints the first N lines of ALL_POSTS.txt.
vojtamaur head
vojtamaur head 80
random
Selects a random URL from URL: headers in ALL_POSTS.txt.
vojtamaur random
vojtamaur random --print-only
By default, the selected URL is also opened in the browser.
status
Checks URLs found in ARCHIVE.txt.
vojtamaur status
vojtamaur status --limit 10
The command uses HEAD first and falls back to GET for selected failures. Plain HTTP URLs are marked as INSECURE_HTTP.
verify
Runs a basic health check:
- primary and fallback sources for
posts,archive, anddocs - embedded package snapshots for
postsandarchive - local cache writability
- cache decoding, if cache files exist
- URL parsing from
ALL_POSTS.txtandARCHIVE.txt
vojtamaur verify
open
Opens a known target or explicit URL. For posts, archive, and docs, normal mode opens the canonical online URL. With --offline, it opens the corresponding local cache file.
vojtamaur open site
vojtamaur open fallback
vojtamaur open posts
vojtamaur open archive
vojtamaur open docs
vojtamaur open posts --offline
vojtamaur open archive --offline
vojtamaur open docs --offline
vojtamaur open random
vojtamaur open archive-link 1
vojtamaur open https://vojtamaur.cz/metawebovy-clanek/
Embedded snapshots
The package includes bundled fallback copies of:
src/vojtamaur/data/ALL_POSTS.txtsrc/vojtamaur/data/ARCHIVE.txt
These files make the installed package partially useful even if the website, fallback deployment, and cache are unavailable. They also make package distributions, wheels, source archives, PyPI mirrors, pip caches, and installed environments act as additional copies of the text artifacts.
Before publishing a new release, refresh the embedded snapshots:
python scripts/refresh_embedded_data.py
Then verify:
python -m unittest
vojtamaur verify
vojtamaur stats --offline
What this tool does not do
- it does not parse the rendered website as the source of articles
- it does not download Markdown/MDX source files
- it does not synchronize the repository
- it does not compute diffs
- it does not store a database
- it does not run in the background
- it has no runtime dependencies outside the Python standard library
Limitations
ALL_POSTS.txt is a text export, not a complete replica of the website. Media, iframes, PDFs, and selected long blocks are represented by placeholders or omitted. This is intentional. The tool works with the text sediment, not the full rendered website.
The embedded snapshots are release snapshots. The live endpoints remain the preferred source when available.
Tests
python -m unittest
Build
python -m pip install build
python -m build
Embedded fallback snapshots
The Python CLI package also contains embedded fallback copies of selected text artifacts:
ALL_POSTS.txtARCHIVE.txt
These embedded files are bundled directly inside the package so the CLI can still function in degraded or offline situations even if the live website or mirrors become unavailable.
Before publishing a new package release, refresh the embedded snapshots:
python scripts/refresh_embedded_data.py
## Publishing
Publish through PyPI Trusted Publishing in GitHub Actions or upload manually with Twine. A PyPI account is required.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vojtamaur-0.1.3.tar.gz.
File metadata
- Download URL: vojtamaur-0.1.3.tar.gz
- Upload date:
- Size: 108.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d74d83a651ab5c2008d9925bbee2159005a437fc2607176690be5d319b87d5bc
|
|
| MD5 |
97c73176b8c064af536c769101745110
|
|
| BLAKE2b-256 |
fab888203b636ae07e00416ddeb635057a4fda8087f69a175df795377a26a3e0
|
File details
Details for the file vojtamaur-0.1.3-py3-none-any.whl.
File metadata
- Download URL: vojtamaur-0.1.3-py3-none-any.whl
- Upload date:
- Size: 106.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c8822ec0ec38c91fc15a61d1dc049ec0e6433c7dc015e17870d73343fd246d05
|
|
| MD5 |
da02e5896f3186684e5dfafb82ce2e2b
|
|
| BLAKE2b-256 |
38b15aae036094606665aa3d5737af5a6a0fe27342958bbf8c4ad35e79aca6bf
|