A tool for displaying and manipulating Web Request+Response (WRR) files of `Hoardy-Web` WebExtension
Project description
What is hoardy-web
?
hoardy-web
is a tool for displaying, programmatically manipulating, organizing, importing, and exporting Web Request+Response (WRR) files produced by Hoardy-Web
WebExtension (also there).
Quickstart
Installation
- Install with:
pip install hoardy-web
and run ashoardy-web --help
- Alternatively, install it via Nix
nix-env -i -f ./default.nix hoardy-web --help
- Alternatively, run without installing:
alias hoardy-web="python3 -m hoardy_web" hoardy-web --help
Supported input file formats
Simple WRR-dumps (*.wrr
)
When you use the Hoardy-Web
WebExtension together with hoardy-web-sas
archiving server, the latter writes WRR-dumps Hoardy-Web
WebExtension generates into separate .wrr
files (aka "WRR files") in its dumping directory.
No further actions to use that data are required.
The situation is similar if you instead use Hoardy-Web
WebExtension with "Export via saveAs
" option enabled but saveAs
-bundling option disabled (max bundle size set to zero).
The only difference is that WRR files will be put into ~/Downloads
or similar.
ls ~/Downloads/Hoardy-Web-export-*
Bundles of WRR-dumps (*.wrrb
)
However, if instead of using any of the above you use Hoardy-Web
WebExtension with both "Export via saveAs
" and bundling options enabled, then, at the moment, you will need to import
those .wrrb
files (aka WRR-bundles) into separate WRR files first:
hoardy-web import bundle --to ~/hoardy-web/raw ~/Downloads/Hoardy-Web-export-*
Note that hoardy-web
can parse .wrr
files as single-dump .wrrb
files, so the above will work even when some of the exported dumps are simple .wrr
files (Hoardy-Web
generates those when exporting an only available per-bucket dump or when exporting dumps larger than set maximum bundle size).
So, essentially, the above command is equivalent to
hoardy-web organize --copy --to ~/hoardy-web/raw ~/Downloads/Hoardy-Web-export-*.wrr
hoardy-web import bundle --to ~/hoardy-web/raw ~/Downloads/Hoardy-Web-export-*.wrrb
Other file formats
hoardy-web
can also use some other file formats as inputs.
See the documentation of the hoardy-web import
sub-command below for more info.
How to merge multiple archive directories
To merge multiple input directories into one you can simply hoardy-web organize
them --to
a new directory.
hoardy-web
will automatically deduplicate all the files in the generated result.
That is to say, for hoardy-web organize
(see the documentation below for more info):
--move
is de-duplicating when possible,- while
--copy
,--hardlink
, and--symlink
are non-duplicating when possible.
For example, if you duplicate an input directory via --copy
or --hardlink
:
hoardy-web organize --copy --to ~/hoardy-web/copy1 ~/hoardy-web/original
hoardy-web organize --hardlink --to ~/hoardy-web/copy2 ~/hoardy-web/original
(In real-life use different copies usually end up on in different backup drives or some such.)
Then, repeating the same command would a noop:
# noops
hoardy-web organize --copy --to ~/hoardy-web/copy1 ~/hoardy-web/original
hoardy-web organize --hardlink --to ~/hoardy-web/copy2 ~/hoardy-web/original
And running the opposite command would also be a noop:
# noops
hoardy-web organize --hardlink --to ~/hoardy-web/copy1 ~/hoardy-web/original
hoardy-web organize --copy --to ~/hoardy-web/copy2 ~/hoardy-web/original
And copying between copies is also a noop:
# noops
hoardy-web organize --hardlink --to ~/hoardy-web/copy2 ~/hoardy-web/copy1
hoardy-web organize --copy --to ~/hoardy-web/copy2 ~/hoardy-web/copy1
But doing hoardy-web organize --move
while supplying directories that have the same data will deduplicate the results:
hoardy-web organize --move --to ~/hoardy-web/all ~/hoardy-web/copy1 ~/hoardy-web/copy2
# `~/hoardy-web/all` will have each file only once
find ~/hoardy-web/copy1 ~/hoardy-web/copy2 -type f
# the output will be empty
hoardy-web organize --move --to ~/hoardy-web/original ~/hoardy-web/all
# `~/hoardy-web/original` will not change iff it is already organized using `--output default`
# otherwise, some files there will be duplicated
find ~/hoardy-web/all -type f
# the output will be empty
Similarly, hoardy-web organize --symlink
resolves its input symlinks and deduplicates its output symlinks:
hoardy-web organize --symlink --output hupq_msn --to ~/hoardy-web/pointers ~/hoardy-web/original
hoardy-web organize --symlink --output shupq_msn --to ~/hoardy-web/schemed ~/hoardy-web/original
# noop
hoardy-web organize --symlink --output hupq_msn --to ~/hoardy-web/pointers ~/hoardy-web/original ~/hoardy-web/schemed
I.e. the above will produce ~/hoardy-web/pointers
with unique symlinks pointing to each file in ~/hoardy-web/original
only once.
How to build a file system tree of latest versions of all hoarded URLs
Assuming you keep your WRR-dumps in ~/hoardy-web/raw
you can generate a hierarchy of symlinks for each URL pointing from under ~/hoardy-web/latest
to the most recent WRR file that contains 200 OK
response in ~/hoardy-web/raw
via:
hoardy-web organize --symlink --latest --output hupq --to ~/hoardy-web/latest --and "status|~= .200C" ~/hoardy-web/raw
Personally, I prefer flat_mhs
(see the documentation of the --output
below) format as I dislike deep file hierarchies, using it also simplifies filtering in my ranger
file browser, so I do this:
hoardy-web organize --symlink --latest --output flat_mhs --and "status|~= .200C" --to ~/hoardy-web/latest ~/hoardy-web/raw
Update the tree incrementally, in real time
The above commands rescan the whole contents of ~/hoardy-web/raw
and so can take a while to complete.
If you have a lot of WRR files and you want to keep your symlink tree updated in near-real-time you will need to use a two-stage pipeline by giving the output of hoardy-web organize --zero-terminated
to hoardy-web organize --stdin0
to perform complex updates.
E.g. the following will rename new reqres from ../simple_server/pwebarc-dump
to ~/hoardy-web/raw
renaming them with --output default
(the for
loop is there to preserve buckets/profiles):
for arg in ../simple_server/pwebarc-dump/* ; do
hoardy-web organize --zero-terminated --to ~/hoardy-web/raw/"$(basename "$arg")" "$arg"
done > changes
Then, you can reuse the paths saved in changes
file to update the symlink tree, like in the above:
hoardy-web organize --stdin0 --symlink --latest --output flat_mhs --and "status|~= .200C" --to ~/hoardy-web/latest ~/hoardy-web/raw < changes
Then, optionally, you can reuse changes
file again to symlink all new files from ~/hoardy-web/raw
to ~/hoardy-web/all
, showing all URL versions, by using --output hupq_msn
format:
hoardy-web organize --stdin0 --symlink --output hupq_msn --to ~/hoardy-web/all < changes
How to generate a local offline website mirror like wget -mpk
If you want to render your WRR files into a local offline website mirror containing interlinked HTML files and their resources a-la wget -mpk
(wget --mirror --page-requisites --convert-links
), run one of the above --symlink --latest
commands, and then do something like this:
hoardy-web export mirror --to ~/hoardy-web/mirror1 ~/hoardy-web/latest/archiveofourown.org
on completion ~/hoardy-web/mirror1
will contain a bunch of interlinked minimized HTML files, their resources, and everything else available from WRR files living under ~/hoardy-web/latest/archiveofourown.org
.
The above command might fail if the set of WRR-dumps you are trying to export contains two or more dumps with distinct URLs that map to the same --output
path.
This will produce an error since hoardy
does not permit file overwrites.
With the default --output hupq
format this can happen, for instance, when the URLs recorded in the reqres are long and so they end up truncated into the same file system paths.
In this case you can either switch to a more verbose --output
format
hoardy-web export mirror --output hupq_n --to ~/hoardy-web/mirror1 ~/hoardy-web/latest/archiveofourown.org
or skip all reqres that would cause overwrites
hoardy-web export mirror --skip-existing --to ~/hoardy-web/mirror1 ~/hoardy-web/latest/archiveofourown.org
or, almost equivalently for this use case, skip all export errors (which includes "no overwrites allowed" error)
hoardy-web export mirror --errors skip --to ~/hoardy-web/mirror1 ~/hoardy-web/latest/archiveofourown.org
The latter command would also skip reqres that fail to be exported for other reasons.
By default, all the links in exported HTML files will be remapped to local files (even if source WRR files for those would-be exported files are missing in ~/hoardy-web/latest/archiveofourown.org
, see the documentation for the --remap-*
options below for more info), and those HTML files will also be stripped of all JavaScript, CSS, and other stuff of various levels of evil (see the documentation for the scrub
function below for more info).
On the plus side, the result will be completely self-contained and safe to view with a dumb unconfigured browser.
If you are unhappy with this behaviour and, for instance, want to keep the CSS and produce human-readable HTML, run the following instead:
hoardy-web export mirror \
-e 'response.body|eb|scrub response +all_refs,-actions,+styles,+pretty' \
--to ~/hoardy-web/mirror2 ~/hoardy-web/latest/archiveofourown.org
Note, however, that CSS resource filtering and remapping is not implemented yet.
If you also want to keep links that point to not yet hoarded Internet URLs to still point those URLs in the exported files instead of them pointing to non-existent local files, similarly to what wget -mpk
does, run hoardy-web export mirror
with --remap-open
, e.g.:
hoardy-web export mirror \
-e 'response.body|eb|scrub response +all_refs,-actions,+styles,+pretty' \
--remap-open \
--to ~/hoardy-web/mirror3 ~/hoardy-web/latest/archiveofourown.org
Finally, if you want a mirror made of raw files without any content censorship or link conversions, run:
hoardy-web export mirror -e 'response.body|eb' --to ~/hoardy-web/mirror-raw ~/hoardy-web/latest/archiveofourown.org
The later command will render your mirror pretty quickly, but the other above-mentioned commands will call the scrub
function, and that will be pretty slow (as in avg ~5Mb, ~3 files per second on my 2013-era laptop), mostly because html5lib
that hoardy-web
uses for paranoid HTML parsing and filtering is fairly slow.
Using --root
and --depth
As an alternative to (or in combination with) keeping a symlink hierarchy of latest versions, you can load (an index of) an assortment of WRR files into hoardy-web
's memory but then export mirror
only select URLs (and all resources needed to properly render those pages) by running something like:
hoardy-web export mirror \
--root 'https://archiveofourown.org/works/3733123?view_adult=true&view_full_work=true' \
--root 'https://archiveofourown.org/works/30186441?view_adult=true&view_full_work=true' \
--to ~/hoardy-web/mirror4 ~/hoardy-web/raw/*/2023
(hoardy-web
loads (indexes) WRR files pretty fast, so if you are running from an SSD, you can totally feed it years of WRR files and then only export a couple of URLs, and it will take a couple of seconds to finish anyway.)
There is also --depth
option, which works similarly to wget
's --level
option in that it will follow all jump (a href
) and action links accessible with no more than --depth
browser navigations from recursion --root
s and then export mirror
all those URLs (and their resources) too.
When using --root
options, --remap-open
works exactly like wget
's --convert-links
in that it will only remap the URLs that are going to be exported and will keep the rest as-is.
Similarly, --remap-closed
will consider only the URLs reachable from the --root
s in no more that --depth
jumps as available.
How to generate local offline website mirrors like wget -mpk
from you old mitmproxy
stream dumps
Assuming mitmproxy.001.dump
, mitmproxy.002.dump
, etc are files that were produced by running something like
mitmdump -w +mitmproxy.001.dump
at some point, you can generate website mirrors from them by first importing them all to WRR
hoardy-web import mitmproxy --to ~/hoardy-web/mitmproxy mitmproxy.*.dump
and then export mirror
like above, e.g. to generate mirrors for all URLs:
hoardy-web export mirror --to ~/hoardy-web/mirror ~/hoardy-web/mitmproxy
How to generate previews for WRR files, listen to them via TTS, open them with xdg-open
, etc
See script
sub-directory for examples that show how to use pandoc
and/or w3m
to turn WRR files into previews and readable plain-text that can viewed or listened to via other tools, or dump them into temporary raw data files that can then be immediately fed to xdg-open
for one-click viewing.
Usage
hoardy-web
A tool to pretty-print, compute and print values from, search, organize (programmatically rename/move/symlink/hardlink files), import, export, (WIP: check, deduplicate, and edit) Hoardy-Web
's WRR (Web Request+Response) archive files.
Terminology: a reqres
(Reqres
when a Python type) is an instance of a structure representing HTTP request+response pair with some additional metadata.
-
options:
--version
: show program's version number and exit-h, --help
: show this help message and exit--markdown
: show help messages formatted in Markdown
-
subcommands:
{pprint,get,run,stream,find,organize,import,export}
pprint
: pretty-print given WRR filesget
: print values produced by computing given expressions on a given WRR filerun
: spawn a process with generated temporary files produced by given expressions computed on given WRR files as argumentsstream
: produce a stream of structured lists containing values produced by computing given expressions on given WRR files, a generalizedhoardy-web get
find
: print paths of WRR files matching specified criteriaorganize
: programmatically rename/move/hardlink/symlink WRR files based on their contentsimport
: convert other HTTP archive formats into WRRexport
: convert WRR archives into other formats
hoardy-web pprint
Pretty-print given WRR files to stdout.
-
positional arguments:
PATH
: inputs, can be a mix of files and directories (which will be traversed recursively)
-
options:
-u, --unabridged
: print all data in full--abridged
: shorten long strings for brevity, useful when you want to visually scan through batch data dumps; default--stdin0
: read zero-terminatedPATH
s from stdin, these will be processed afterPATH
s specified as command-line arguments
-
error handling:
--errors {fail,skip,ignore}
: when an error occurs:fail
: report failure and stop the execution; defaultskip
: report failure but skip the reqres that produced it from the output and continueignore
:skip
, but don't report the failure
-
filters; both can be specified at the same time, both can be specified multiple times, both use the same expression format as
hoardy-web get --expr
(which see), the resulting logical expression that will checked is(O1 or O2 or ... or (A1 and A2 and ...))
, whereO1
,O2
, ... are the arguments to--or
s andA1
,A2
, ... are the arguments to--and
s:--or EXPR
: only print reqres which match any of these expressions--and EXPR
: only print reqres which match all of these expressions
-
MIME type sniffing:
--naive
: populate "potentially" lists likehoardy-web (get|run|export) --expr '(request|response).body|eb|scrub \2 defaults'
does; default--paranoid
: populate "potentially" lists in the output using paranoid MIME type sniffing likehoardy-web (get|run|export) --expr '(request|response).body|eb|scrub \2 +paranoid'
does; this exists to answer "Hey! Why did it censor out my data?!" questions
-
file system path ordering:
--paths-given-order
:argv
and--stdin0
PATH
s are processed in the order they are given; default--paths-sorted
:argv
and--stdin0
PATH
s are processed in lexicographic order--paths-reversed
:argv
and--stdin0
PATH
s are processed in reverse lexicographic order--walk-fs-order
: recursive file system walk is done in the orderreaddir(2)
gives results--walk-sorted
: recursive file system walk is done in lexicographic order; default--walk-reversed
: recursive file system walk is done in reverse lexicographic order
hoardy-web get
Compute output values by evaluating expressions EXPR
s on a given reqres stored at PATH
, then print them to stdout terminating each value as specified.
-
positional arguments:
PATH
: input WRR file path
-
expression evaluation:
--expr-fd INT
: file descriptor to which the results of evaluations of the following--expr
s computations should be written; can be specified multiple times, thus separating different--expr
s into different output streams; default:1
, i.e.stdout
-e EXPR, --expr EXPR
: an expression to compute; can be specified multiple times in which case computed outputs will be printed sequentially; see also "printing" options below; default:response.body|eb
, which will dump the HTTP response body; eachEXPR
describes a state-transformer (pipeline) which starts from valueNone
and evaluates a script built from the following:- constants and functions:
es
: replaceNone
value with an empty string""
eb
: replaceNone
value with an empty byte stringb""
false
: replaceNone
value withFalse
true
: replaceNone
value withTrue
missing
:True
if the value isNone
0
: replaceNone
value with0
1
: replaceNone
value with1
not
: apply logicalnot
to valuelen
: applylen
to valuestr
: cast value tostr
or failbytes
: cast value tobytes
or failbool
: cast value tobool
or failint
: cast value toint
or failfloat
: cast value tofloat
or failecho
: replace the value with the given stringquote
: URL-percent-encoding quote valuequote_plus
: URL-percent-encoding quote value and replace spaces with+
symbolsunquote
: URL-percent-encoding unquote valueunquote_plus
: URL-percent-encoding unquote value and replace+
symbols with spacesto_ascii
: encodestr
value intobytes
with "ascii" codecto_utf8
: encodestr
value intobytes
with "utf-8" codecsha256
: replacebytes
value with itssha256
hex digest (hex(sha256(value))
)~=
: check if the current value matches the regular exprissionarg
==
: apply== arg
,arg
is cast to the same type as the current value!=
: apply!= arg
, similarly<
: apply< arg
, similarly<=
: apply<= arg
, similarly>
: apply> arg
, similarly>=
: apply>= arg
, similarlyadd_prefix
: add prefix to the current valueadd_suffix
: add suffix to the current valuetake_prefix
: take firstarg
characters or list elements from the current valuetake_suffix
: take lastarg
characters or list elements from the current valueabbrev
: leave the current value as-is if if its length is less or equal thanarg
characters, otherwise take firstarg/2
followed by lastarg/2
charactersabbrev_each
:abbrev arg
each element in a valuelist
replace
: replace all occurences of the first argument in the current value with the second argument, casts arguments to the same type as the current valuepp_to_path
: encodepath_parts
list
into a POSIX path, quoting as little as neededqsl_urlencode
: encode parsedquery
list
into a URL's query componentstr
qsl_to_path
: encodequery
list
into a POSIX path, quoting as little as neededscrub
: scrub the value by optionally rewriting links and/or removing dynamic content from it; what gets done depends on--remap-*
command line options, the MIME type of the value itself, and the scrubbing options described below; this fuction takes two arguments: - the first must be either ofrequest|response
, it controls which HTTP headersscrub
should inspect to help it detect the MIME type; - the second is eitherdefaults
or ","-separated string of(+|-)(paranoid|unknown|jumps|actions|srcs|all_refs|scripts|iframes|styles|iepragmas|prefetches|tracking|dyndoc|all_dyns|verbose|whitespace|optional_tags|indent|pretty|debug)
tokens which control the scrubbing behaviour: -+paranoid
will assume the server is lying in itsContent-Type
andX-Content-Type-Options
HTTP headers, sniff the contents of(request|response).body
to determine what it actually contains regardless of what the server said, and then use the most paranoid interpretation of both the HTTP headers and the sniffed possible MIME types to decide what should be kept and what sholuld be removed by the options below; i.e., this will make-unknown
,-scripts
, and-styles
options below to censor out more things, in particular, at the moment, most plain text files will get censored out as potential JavaScript; the default is-paranoid
; -(+|-)unknown
controls if the data with unknown content types should passed to the output unchanged or censored out (respectively); the default is+unknown
, which will keep data of unknown content types as-is; -(+|-)(jumps|actions|srcs)
control which kinds of references to other documents should be remapped or censored out (respectively); i.e. it controls whether jump-links (HTMLa href
,area href
, and similar), action-links (HTMLa ping
,form action
, and similar), and/or resource references (HTMLimg src
,iframe src
, CSSurl
references, and similar) should be remapped using the specified--remap-*
option (which see) or censored out similarly to how--remap-void
will do it; the default is+jumps,-actions,-srcs
which will produce a self-contained result that can be fed into another tool --- be it a web browser orpandoc
--- without that tool trying to access the Internet; -(+|-)all_refs
is equivalent to enabling or disabling all of the above options simultaneously; -(+|-)(scripts|iframes|styles|iepragmas|prefetches|tracking)
control which things should be kept or censored out w.r.t. to HTML, CSS, and JavaScript, i.e. it controls whether JavaScript (both separate files and HTML tags and attributes),<iframe>
HTML tags, CSS (both separate files and HTML tags and attributes; why? because CSS is Turing-complete), HTML Internet-Explorer pragmas, HTML content prefetchlink
tags, and other tracking HTML tags and attributes (likea ping
attributes), should be respectively kept in or censored out from the input; the default is-scripts,-iframes,-styles,-iepragmas,-prefetches,-tracking
which ensures the result will not produce any prefetch and tracking requests when loaded in a web browser, and that the whole result is simple data, not a program in some Turing-complete language, thus making it safe to feed the result to other tools too smart for their own users' good; -(+|-)all_dyns
is equivalent to enabling or disabling all of the above (scripts|...
) options simultaneously; -(+|-)verbose
controls whether tag censoring controlled by the above options is to be reported in the output (as comments) or stuff should be wiped from existence without evidence instead; the default is-verbose
; -(+|-)whitespace
controls whether HTML renderer should keep the original HTML whitespace as-is or collapse it away (respectively); the default is-whitespace
; -(+|-)optional_tags
controls whether HTML renderer should put optional HTML tags into the output or skip them (respectively); the default is+optional_tags
(because many tools fail to parse minimized HTML properly); -(+|-)indent
controls whether HTML renderer should indent HTML elements (where whitespace placement in the original markup allows for it) or not (respectively); the default is-indent
; -+pretty
is an alias for+verbose,-whitespace,+indent
which produces the prettiest possible human-readable output that keeps the original whitespace semantics;-pretty
is an alias for+verbose,+whitespace,-indent
which produces the approximation of the original markup with censoring applied; neither is the default; -+debug
is an alias for+pretty
that also uses a much more aggressive version ofindent
that ignores the semantics of original whitespace placement, i.e. it will indent<p>not<em>sep</em>arated</p>
as if there was whitespace before and afterp
,em
,/em
, and/p
tags; this is useful for debugging custom mutations;-debug
is noop, which is the default;
- reqres fields, these work the same way as constants above, i.e. they replace current value of
None
with field's value, if reqres is missing the field in question, which could happen forresponse*
fields, the result isNone
:version
: WEBREQRES format version; intsource
:+
-separated list of applications that produced this reqres; strprotocol
: protocol; e.g."HTTP/1.1"
,"HTTP/2.0"
; strrequest.started_at
: request start time in seconds since 1970-01-01 00:00; Epochrequest.method
: request HTTP method; e.g."GET"
,"POST"
, etc; strrequest.url
: request URL, including the fragment/hash part; strrequest.headers
: request headers; list[tuple[str, bytes]]request.complete
: is request body complete?; boolrequest.body
: request body; bytesresponse.started_at
: response start time in seconds since 1970-01-01 00:00; Epochresponse.code
: HTTP response code; e.g.200
,404
, etc; intresponse.reason
: HTTP response reason; e.g."OK"
,"Not Found"
, etc; usually empty for Chromium and filled for Firefox; strresponse.headers
: response headers; list[tuple[str, bytes]]response.complete
: is response body complete?; boolresponse.body
: response body; Firefox gives raw bytes, Chromium gives UTF-8 encoded strings; bytes | strfinished_at
: request completion time in seconds since 1970-01-01 00:00; Epochwebsocket
: a list of WebSocket frames
- derived attributes:
fs_path
: file system path for the WRR file containing this reqres; str | bytes | Noneqtime
: aliast forrequest.started_at
; mnemonic: "reQuest TIME"; seconds since UNIX epoch; decimal floatqtime_ms
:qtime
in milliseconds rounded down to nearest integer; milliseconds since UNIX epoch; intqtime_msq
: three least significant digits ofqtime_ms
; intqyear
: year number ofgmtime(qtime)
(UTC year number ofqtime
); intqmonth
: month number ofgmtime(qtime)
; intqday
: day of the month ofgmtime(qtime)
; intqhour
: hour ofgmtime(qtime)
in 24h format; intqminute
: minute ofgmtime(qtime)
; intqsecond
: second ofgmtime(qtime)
; intstime
:response.started_at
if there was a response,finished_at
otherwise; mnemonic: "reSponse TIME"; seconds since UNIX epoch; decimal floatstime_ms
:stime
in milliseconds rounded down to nearest integer; milliseconds since UNIX epoch, intstime_msq
: three least significant digits ofstime_msq
; intsyear
: similar tosyear
, but forstime
; intsmonth
: similar tosmonth
, but forstime
; intsday
: similar tosday
, but forstime
; intshour
: similar toshour
, but forstime
; intsminute
: similar tosminute
, but forstime
; intssecond
: similar tossecond
, but forstime
; intftime
: aliast forfinished_at
; seconds since UNIX epoch; decimal floatftime_ms
:ftime
in milliseconds rounded down to nearest integer; milliseconds since UNIX epoch; intftime_msq
: three least significant digits offtime_msq
; intfyear
: similar tosyear
, but forftime
; intfmonth
: similar tosmonth
, but forftime
; intfday
: similar tosday
, but forftime
; intfhour
: similar toshour
, but forftime
; intfminute
: similar tosminute
, but forftime
; intfsecond
: similar tossecond
, but forftime
; intstatus
:"I"
or"C"
depending on the value ofrequest.complete
(false
ortrue
, respectively) followed by either"N"
, wheneresponse == None
, orstr(response.code)
followed by"I"
or"C"
depending on the value ofresponse.complete
; strmethod
: aliast forrequest.method
; strraw_url
: aliast forrequest.url
; strnet_url
:raw_url
with Punycode UTS46 IDNA encoded hostname, unsafe characters quoted, and without the fragment/hash part; this is the URL that actually gets sent to the server; strpretty_url
:raw_url
, but usinghostname
,mq_path
, andmq_query
; strpretty_nurl
:raw_url
, but usinghostname
,mq_path
, andmq_nquery
; strscheme
: scheme part ofraw_url
; e.g.http
,https
, etc; strraw_hostname
: hostname part ofraw_url
as it is recorded in the reqres; strnet_hostname
: hostname part ofraw_url
, encoded as Punycode UTS46 IDNA; this is what actually gets sent to the server; ASCII strhostname
:net_hostname
decoded back into UNICODE; this is the canonical hostname representation for which IDNA-encoding and decoding are bijective; UNICODE strrhostname
:hostname
with the order of its parts reversed; e.g."www.example.org"
->"com.example.www"
; strport
: port part ofraw_url
; strnetloc
: netloc part ofraw_url
; i.e., in the most general case,<username>:<password>@<hostname>:<port>
; strraw_path
: raw path part ofraw_url
as it is recorded is the reqres; e.g."https://www.example.org"
->""
,"https://www.example.org/"
->"/"
,"https://www.example.org/index.html"
->"/index.html"
; strpath_parts
: component-wise unquoted "/"-splitraw_path
with empty components removed and dots and double dots interpreted away; e.g."https://www.example.org"
->[]
,"https://www.example.org/"
->[]
,"https://www.example.org/index.html"
->["index.html"]
,"https://www.example.org/skipped/.//../used/"
->["used"]
; list[str]mq_path
:path_parts
turned back into a minimally-quoted string; strfilepath_parts
:path_parts
transformed into components usable as an exportable file name; i.e.path_parts
with an optional additional"index"
appended, depending onraw_url
andresponse
MIME type; extension will be stored separately infilepath_ext
; e.g. for HTML documents"https://www.example.org/"
->["index"]
,"https://www.example.org/test.html"
->["test"]
,"https://www.example.org/test"
->["test", "index"]
,"https://www.example.org/test.json"
->["test.json", "index"]
, but if it has a JSON MIME type then"https://www.example.org/test.json"
->["test"]
(andfilepath_ext
will be set to".json"
); this is similar to whatwget -mpk
does, but a bit smarter; list[str]filepath_ext
: extension of the last component offilepath_parts
for recognized MIME types,".data"
otherwise; strraw_query
: query part ofraw_url
(i.e. everything after the?
character and before the#
character) as it is recorded in the reqres; strquery_parts
: parsed (and component-wise unquoted)raw_query
; list[tuple[str, str]]query_ne_parts
:query_parts
with empty query parameters removed; list[tuple[str, str]]mq_query
:query_parts
turned back into a minimally-quoted string; strmq_nquery
:query_ne_parts
turned back into a minimally-quoted string; stroqm
: optional query mark:?
character ifquery
is non-empty, an empty string otherwise; strfragment
: fragment (hash) part of the url; strofm
: optional fragment mark:#
character iffragment
is non-empty, an empty string otherwise; str
- a compound expression built by piping (
|
) the above, for example:response.body|eb
(the default forget
) will print rawresponse.body
or an empty byte string, if there was no response;response.body|eb|scrub response defaults
will take the above value,scrub
it using default content scrubbing settings which will censor out all action and resource reference URLs;response.body|eb|scrub response +all_refs,-actions
(the default forexport
) will remap allhref
jump-links andsrc
resource references to local files while still censoring out all action URLs (since those don't make sense for a static mirror);response.complete
will print the value ofresponse.complete
orNone
, if there was no response;response.complete|false
will printresponse.complete
orFalse
;net_url|to_ascii|sha256
will printsha256
hash of the URL that was actually sent over the network;net_url|to_ascii|sha256|take_prefix 4
will print the first 4 characters of the above;path_parts|take_prefix 3|pp_to_path
will print first 3 path components of the URL, minimally quoted to be used as a path;query_ne_parts|take_prefix 3|qsl_to_path|abbrev 128
will print first 3 non-empty query parameters of the URL, abbreviated to 128 characters or less, minimally quoted to be used as a path;
- constants and functions:
-
URL remapping; used by
scrub
atom of--expr
:--remap-id
: remap all URLs with an identity function; i.e. don't remap anything; default--remap-void
: remap all jump-link and action URLs tojavascript:void(0)
and all resource URLs into emptydata:
URLs; resulting web pages will be self-contained
-
printing:
--not-separated
: print values without separating them with anything, just concatenate them-l, --lf-separated
: print values separated with\n
(LF) newline characters; default-z, --zero-separated
: print values separated with\0
(NUL) bytes
hoardy-web run
Compute output values by evaluating expressions EXPR
s for each of NUM
reqres stored at PATH
s, dump the results into into newly generated temporary files terminating each value as specified, spawn a given COMMAND
with given arguments ARG
s and the resulting temporary file paths appended as the last NUM
arguments, wait for it to finish, delete the temporary files, exit with the return code of the spawned process.
-
positional arguments:
COMMAND
: command to spawnARG
: additional arguments to give to theCOMMAND
PATH
: input WRR file paths to be mapped into new temporary files
-
options:
-n NUM, --num-args NUM
: number ofPATH
s; default:1
-
expression evaluation:
-e EXPR, --expr EXPR
: an expression to compute, same expression format and semantics ashoardy-web get --expr
(which see); can be specified multiple times; default:response.body|eb
, which will dump the HTTP response body
-
URL remapping; used by
scrub
atom of--expr
:--remap-id
: remap all URLs with an identity function; i.e. don't remap anything; default--remap-void
: remap all jump-link and action URLs tojavascript:void(0)
and all resource URLs into emptydata:
URLs; resulting web pages will be self-contained
-
printing:
--not-separated
: print values without separating them with anything, just concatenate them-l, --lf-separated
: print values separated with\n
(LF) newline characters; default-z, --zero-separated
: print values separated with\0
(NUL) bytes
hoardy-web stream
Compute given expressions for each of given WRR files, encode them into a requested format, and print the result to stdout.
-
positional arguments:
PATH
: inputs, can be a mix of files and directories (which will be traversed recursively)
-
options:
-u, --unabridged
: print all data in full--abridged
: shorten long strings for brevity, useful when you want to visually scan through batch data dumps; default--format {py,cbor,json,raw}
: generate output in:- py: Pythonic Object Representation aka
repr
; default - cbor: CBOR (RFC8949)
- json: JavaScript Object Notation aka JSON; binary data can't be represented, UNICODE replacement characters will be used
- raw: concatenate raw values; termination is controlled by
*-terminated
options
- py: Pythonic Object Representation aka
--stdin0
: read zero-terminatedPATH
s from stdin, these will be processed afterPATH
s specified as command-line arguments
-
error handling:
--errors {fail,skip,ignore}
: when an error occurs:fail
: report failure and stop the execution; defaultskip
: report failure but skip the reqres that produced it from the output and continueignore
:skip
, but don't report the failure
-
filters; both can be specified at the same time, both can be specified multiple times, both use the same expression format as
hoardy-web get --expr
(which see), the resulting logical expression that will checked is(O1 or O2 or ... or (A1 and A2 and ...))
, whereO1
,O2
, ... are the arguments to--or
s andA1
,A2
, ... are the arguments to--and
s:--or EXPR
: only print reqres which match any of these expressions--and EXPR
: only print reqres which match all of these expressions
-
expression evaluation:
-e EXPR, --expr EXPR
: an expression to compute, same expression format and semantics ashoardy-web get --expr
(which see); can be specified multiple times; default:.
, which will dump the whole reqres structure
-
URL remapping; used by
scrub
atom of--expr
:--remap-id
: remap all URLs with an identity function; i.e. don't remap anything; default--remap-void
: remap all jump-link and action URLs tojavascript:void(0)
and all resource URLs into emptydata:
URLs; resulting web pages will be self-contained
-
--format=raw
output printing:--not-terminated
: print--format=raw
output values without terminating them with anything, just concatenate them-l, --lf-terminated
: print--format=raw
output values terminated with\n
(LF) newline characters; default-z, --zero-terminated
: print--format=raw
output values terminated with\0
(NUL) bytes
-
file system path ordering:
--paths-given-order
:argv
and--stdin0
PATH
s are processed in the order they are given; default--paths-sorted
:argv
and--stdin0
PATH
s are processed in lexicographic order--paths-reversed
:argv
and--stdin0
PATH
s are processed in reverse lexicographic order--walk-fs-order
: recursive file system walk is done in the orderreaddir(2)
gives results--walk-sorted
: recursive file system walk is done in lexicographic order; default--walk-reversed
: recursive file system walk is done in reverse lexicographic order
hoardy-web find
Print paths of WRR files matching specified criteria.
-
positional arguments:
PATH
: inputs, can be a mix of files and directories (which will be traversed recursively)
-
options:
--stdin0
: read zero-terminatedPATH
s from stdin, these will be processed afterPATH
s specified as command-line arguments
-
error handling:
--errors {fail,skip,ignore}
: when an error occurs:fail
: report failure and stop the execution; defaultskip
: report failure but skip the reqres that produced it from the output and continueignore
:skip
, but don't report the failure
-
filters; both can be specified at the same time, both can be specified multiple times, both use the same expression format as
hoardy-web get --expr
(which see), the resulting logical expression that will checked is(O1 or O2 or ... or (A1 and A2 and ...))
, whereO1
,O2
, ... are the arguments to--or
s andA1
,A2
, ... are the arguments to--and
s:--or EXPR
: only print paths to reqres which match any of these expressions--and EXPR
: only print paths to reqres which match all of these expressions
-
found files printing:
-l, --lf-terminated
: print absolute paths of matching WRR files terminated with\n
(LF) newline characters; default-z, --zero-terminated
: print absolute paths of matching WRR files terminated with\0
(NUL) bytes
-
file system path ordering:
--paths-given-order
:argv
and--stdin0
PATH
s are processed in the order they are given; default--paths-sorted
:argv
and--stdin0
PATH
s are processed in lexicographic order--paths-reversed
:argv
and--stdin0
PATH
s are processed in reverse lexicographic order--walk-fs-order
: recursive file system walk is done in the orderreaddir(2)
gives results--walk-sorted
: recursive file system walk is done in lexicographic order; default--walk-reversed
: recursive file system walk is done in reverse lexicographic order
hoardy-web organize
Parse given WRR files into their respective reqres and then rename/move/hardlink/symlink each file to DESTINATION
with the new path derived from each reqres' metadata.
Operations that could lead to accidental data loss are not permitted.
E.g. hoardy-web organize --move
will not overwrite any files, which is why the default --output
contains %(num)d
.
-
positional arguments:
PATH
: inputs, can be a mix of files and directories (which will be traversed recursively)
-
options:
--dry-run
: perform a trial run without actually performing any changes-q, --quiet
: don't log computed updates to stderr--stdin0
: read zero-terminatedPATH
s from stdin, these will be processed afterPATH
s specified as command-line arguments
-
error handling:
--errors {fail,skip,ignore}
: when an error occurs:fail
: report failure and stop the execution; defaultskip
: report failure but skip the reqres that produced it from the output and continueignore
:skip
, but don't report the failure
-
filters; both can be specified at the same time, both can be specified multiple times, both use the same expression format as
hoardy-web get --expr
(which see), the resulting logical expression that will checked is(O1 or O2 or ... or (A1 and A2 and ...))
, whereO1
,O2
, ... are the arguments to--or
s andA1
,A2
, ... are the arguments to--and
s:--or EXPR
: only work on reqres which match any of these expressions--and EXPR
: only work on reqres which match all of these expressions
-
action:
--move
: move source files underDESTINATION
; default--copy
: copy source files to files underDESTINATION
--hardlink
: create hardlinks from source files to paths underDESTINATION
--symlink
: create symlinks from source files to paths underDESTINATION
-
file outputs:
-t DESTINATION, --to DESTINATION
: destination directory; when unset each sourcePATH
must be a directory which will be treated as its ownDESTINATION
-o FORMAT, --output FORMAT
: format describing generated output paths, an alias name or "format:" followed by a custom pythonic %-substitution string:- available aliases and corresponding %-substitutions:
default
:%(syear)d/%(smonth)02d/%(sday)02d/%(shour)02d%(sminute)02d%(ssecond)02d%(stime_msq)03d_%(qtime_ms)s_%(method)s_%(net_url|to_ascii|sha256|take_prefix 4)s_%(status)s_%(hostname)s_%(num)d
; the default -https://example.org
->1970/01/01/001640000_0_GET_50d7_C200C_example.org_0
-https://example.org/
->1970/01/01/001640000_0_GET_8198_C200C_example.org_0
-https://example.org/index.html
->1970/01/01/001640000_0_GET_f0dc_C200C_example.org_0
-https://example.org/media
->1970/01/01/001640000_0_GET_086d_C200C_example.org_0
-https://example.org/media/
->1970/01/01/001640000_0_GET_3fbb_C200C_example.org_0
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->1970/01/01/001640000_0_GET_5658_C200C_example.org_0
-https://königsgäßchen.example.org/index.html
->1970/01/01/001640000_0_GET_4f11_C200C_königsgäßchen.example.org_0
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->1970/01/01/001640000_0_GET_c4ae_C200C_ジャジェメント.ですの.example.org_0
short
:%(syear)d/%(smonth)02d/%(sday)02d/%(stime_ms)d_%(qtime_ms)s_%(num)d
-https://example.org
,https://example.org/
,https://example.org/index.html
,https://example.org/media
,https://example.org/media/
,https://example.org/view?one=1&two=2&three=&three=3#fragment
,https://königsgäßchen.example.org/index.html
,https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->1970/01/01/1000000_0_0
surl
:%(scheme)s/%(netloc)s/%(mq_path)s%(oqm)s%(mq_query)s
-https://example.org
,https://example.org/
->https/example.org/
-https://example.org/index.html
->https/example.org/index.html
-https://example.org/media
,https://example.org/media/
->https/example.org/media
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/example.org/view?one=1&two=2&three&three=3
-https://königsgäßchen.example.org/index.html
->https/königsgäßchen.example.org/index.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/ジャジェメント.ですの.example.org/испытание/is
surl_msn
:%(scheme)s/%(netloc)s/%(mq_path)s%(oqm)s%(mq_query)s__%(method)s_%(status)s_%(num)d
-https://example.org
,https://example.org/
->https/example.org/__GET_C200C_0
-https://example.org/index.html
->https/example.org/index.html__GET_C200C_0
-https://example.org/media
,https://example.org/media/
->https/example.org/media__GET_C200C_0
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/example.org/view?one=1&two=2&three&three=3__GET_C200C_0
-https://königsgäßchen.example.org/index.html
->https/königsgäßchen.example.org/index.html__GET_C200C_0
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/ジャジェメント.ですの.example.org/испытание/is__GET_C200C_0
shupq
:%(scheme)s/%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_query|abbrev 120)s%(filepath_ext)s
-https://example.org
,https://example.org/
->https/example.org/index.htm
-https://example.org/index.html
->https/example.org/index.html
-https://example.org/media
,https://example.org/media/
->https/example.org/media/index.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/example.org/view/index?one=1&two=2&three&three=3.htm
-https://königsgäßchen.example.org/index.html
->https/königsgäßchen.example.org/index.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/ジャジェメント.ですの.example.org/испытание/is/index.htm
shupq_n
:%(scheme)s/%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_query|abbrev 120)s.%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->https/example.org/index.0.htm
-https://example.org/index.html
->https/example.org/index.0.html
-https://example.org/media
,https://example.org/media/
->https/example.org/media/index.0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/example.org/view/index?one=1&two=2&three&three=3.0.htm
-https://königsgäßchen.example.org/index.html
->https/königsgäßchen.example.org/index.0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/ジャジェメント.ですの.example.org/испытание/is/index.0.htm
shupq_msn
:%(scheme)s/%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_query|abbrev 100)s.%(method)s_%(status)s_%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->https/example.org/index.GET_C200C_0.htm
-https://example.org/index.html
->https/example.org/index.GET_C200C_0.html
-https://example.org/media
,https://example.org/media/
->https/example.org/media/index.GET_C200C_0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/example.org/view/index?one=1&two=2&three&three=3.GET_C200C_0.htm
-https://königsgäßchen.example.org/index.html
->https/königsgäßchen.example.org/index.GET_C200C_0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/ジャジェメント.ですの.example.org/испытание/is/index.GET_C200C_0.htm
shupnq
:%(scheme)s/%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 120)s%(filepath_ext)s
-https://example.org
,https://example.org/
->https/example.org/index.htm
-https://example.org/index.html
->https/example.org/index.html
-https://example.org/media
,https://example.org/media/
->https/example.org/media/index.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/example.org/view/index?one=1&two=2&three=3.htm
-https://königsgäßchen.example.org/index.html
->https/königsgäßchen.example.org/index.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/ジャジェメント.ですの.example.org/испытание/is/index.htm
shupnq_n
:%(scheme)s/%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 120)s.%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->https/example.org/index.0.htm
-https://example.org/index.html
->https/example.org/index.0.html
-https://example.org/media
,https://example.org/media/
->https/example.org/media/index.0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/example.org/view/index?one=1&two=2&three=3.0.htm
-https://königsgäßchen.example.org/index.html
->https/königsgäßchen.example.org/index.0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/ジャジェメント.ですの.example.org/испытание/is/index.0.htm
shupnq_msn
:%(scheme)s/%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 100)s.%(method)s_%(status)s_%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->https/example.org/index.GET_C200C_0.htm
-https://example.org/index.html
->https/example.org/index.GET_C200C_0.html
-https://example.org/media
,https://example.org/media/
->https/example.org/media/index.GET_C200C_0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/example.org/view/index?one=1&two=2&three=3.GET_C200C_0.htm
-https://königsgäßchen.example.org/index.html
->https/königsgäßchen.example.org/index.GET_C200C_0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/ジャジェメント.ですの.example.org/испытание/is/index.GET_C200C_0.htm
shupnq_mhs
:%(scheme)s/%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 120)s.%(method)s_%(net_url|to_ascii|sha256|take_prefix 4)s_%(status)s%(filepath_ext)s
-https://example.org
->https/example.org/index.GET_50d7_C200C.htm
-https://example.org/
->https/example.org/index.GET_8198_C200C.htm
-https://example.org/index.html
->https/example.org/index.GET_f0dc_C200C.html
-https://example.org/media
->https/example.org/media/index.GET_086d_C200C.htm
-https://example.org/media/
->https/example.org/media/index.GET_3fbb_C200C.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/example.org/view/index?one=1&two=2&three=3.GET_5658_C200C.htm
-https://königsgäßchen.example.org/index.html
->https/königsgäßchen.example.org/index.GET_4f11_C200C.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/ジャジェメント.ですの.example.org/испытание/is/index.GET_c4ae_C200C.htm
shupnq_mhsn
:%(scheme)s/%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 100)s.%(method)s_%(net_url|to_ascii|sha256|take_prefix 4)s_%(status)s_%(num)d%(filepath_ext)s
-https://example.org
->https/example.org/index.GET_50d7_C200C_0.htm
-https://example.org/
->https/example.org/index.GET_8198_C200C_0.htm
-https://example.org/index.html
->https/example.org/index.GET_f0dc_C200C_0.html
-https://example.org/media
->https/example.org/media/index.GET_086d_C200C_0.htm
-https://example.org/media/
->https/example.org/media/index.GET_3fbb_C200C_0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/example.org/view/index?one=1&two=2&three=3.GET_5658_C200C_0.htm
-https://königsgäßchen.example.org/index.html
->https/königsgäßchen.example.org/index.GET_4f11_C200C_0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/ジャジェメント.ですの.example.org/испытание/is/index.GET_c4ae_C200C_0.htm
srhupq
:%(scheme)s/%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_query|abbrev 120)s%(filepath_ext)s
-https://example.org
,https://example.org/
->https/org.example/index.htm
-https://example.org/index.html
->https/org.example/index.html
-https://example.org/media
,https://example.org/media/
->https/org.example/media/index.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/org.example/view/index?one=1&two=2&three&three=3.htm
-https://königsgäßchen.example.org/index.html
->https/org.example.königsgäßchen/index.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/org.example.ですの.ジャジェメント/испытание/is/index.htm
srhupq_n
:%(scheme)s/%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_query|abbrev 120)s.%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->https/org.example/index.0.htm
-https://example.org/index.html
->https/org.example/index.0.html
-https://example.org/media
,https://example.org/media/
->https/org.example/media/index.0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/org.example/view/index?one=1&two=2&three&three=3.0.htm
-https://königsgäßchen.example.org/index.html
->https/org.example.königsgäßchen/index.0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/org.example.ですの.ジャジェメント/испытание/is/index.0.htm
srhupq_msn
:%(scheme)s/%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_query|abbrev 100)s.%(method)s_%(status)s_%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->https/org.example/index.GET_C200C_0.htm
-https://example.org/index.html
->https/org.example/index.GET_C200C_0.html
-https://example.org/media
,https://example.org/media/
->https/org.example/media/index.GET_C200C_0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/org.example/view/index?one=1&two=2&three&three=3.GET_C200C_0.htm
-https://königsgäßchen.example.org/index.html
->https/org.example.königsgäßchen/index.GET_C200C_0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/org.example.ですの.ジャジェメント/испытание/is/index.GET_C200C_0.htm
srhupnq
:%(scheme)s/%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 120)s%(filepath_ext)s
-https://example.org
,https://example.org/
->https/org.example/index.htm
-https://example.org/index.html
->https/org.example/index.html
-https://example.org/media
,https://example.org/media/
->https/org.example/media/index.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/org.example/view/index?one=1&two=2&three=3.htm
-https://königsgäßchen.example.org/index.html
->https/org.example.königsgäßchen/index.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/org.example.ですの.ジャジェメント/испытание/is/index.htm
srhupnq_n
:%(scheme)s/%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 120)s.%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->https/org.example/index.0.htm
-https://example.org/index.html
->https/org.example/index.0.html
-https://example.org/media
,https://example.org/media/
->https/org.example/media/index.0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/org.example/view/index?one=1&two=2&three=3.0.htm
-https://königsgäßchen.example.org/index.html
->https/org.example.königsgäßchen/index.0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/org.example.ですの.ジャジェメント/испытание/is/index.0.htm
srhupnq_msn
:%(scheme)s/%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 100)s.%(method)s_%(status)s_%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->https/org.example/index.GET_C200C_0.htm
-https://example.org/index.html
->https/org.example/index.GET_C200C_0.html
-https://example.org/media
,https://example.org/media/
->https/org.example/media/index.GET_C200C_0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/org.example/view/index?one=1&two=2&three=3.GET_C200C_0.htm
-https://königsgäßchen.example.org/index.html
->https/org.example.königsgäßchen/index.GET_C200C_0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/org.example.ですの.ジャジェメント/испытание/is/index.GET_C200C_0.htm
srhupnq_mhs
:%(scheme)s/%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 120)s.%(method)s_%(net_url|to_ascii|sha256|take_prefix 4)s_%(status)s%(filepath_ext)s
-https://example.org
->https/org.example/index.GET_50d7_C200C.htm
-https://example.org/
->https/org.example/index.GET_8198_C200C.htm
-https://example.org/index.html
->https/org.example/index.GET_f0dc_C200C.html
-https://example.org/media
->https/org.example/media/index.GET_086d_C200C.htm
-https://example.org/media/
->https/org.example/media/index.GET_3fbb_C200C.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/org.example/view/index?one=1&two=2&three=3.GET_5658_C200C.htm
-https://königsgäßchen.example.org/index.html
->https/org.example.königsgäßchen/index.GET_4f11_C200C.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/org.example.ですの.ジャジェメント/испытание/is/index.GET_c4ae_C200C.htm
srhupnq_mhsn
:%(scheme)s/%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 100)s.%(method)s_%(net_url|to_ascii|sha256|take_prefix 4)s_%(status)s_%(num)d%(filepath_ext)s
-https://example.org
->https/org.example/index.GET_50d7_C200C_0.htm
-https://example.org/
->https/org.example/index.GET_8198_C200C_0.htm
-https://example.org/index.html
->https/org.example/index.GET_f0dc_C200C_0.html
-https://example.org/media
->https/org.example/media/index.GET_086d_C200C_0.htm
-https://example.org/media/
->https/org.example/media/index.GET_3fbb_C200C_0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->https/org.example/view/index?one=1&two=2&three=3.GET_5658_C200C_0.htm
-https://königsgäßchen.example.org/index.html
->https/org.example.königsgäßchen/index.GET_4f11_C200C_0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->https/org.example.ですの.ジャジェメント/испытание/is/index.GET_c4ae_C200C_0.htm
url
:%(netloc)s/%(mq_path)s%(oqm)s%(mq_query)s
-https://example.org
,https://example.org/
->example.org/
-https://example.org/index.html
->example.org/index.html
-https://example.org/media
,https://example.org/media/
->example.org/media
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view?one=1&two=2&three&three=3
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание/is
url_msn
:%(netloc)s/%(mq_path)s%(oqm)s%(mq_query)s__%(method)s_%(status)s_%(num)d
-https://example.org
,https://example.org/
->example.org/__GET_C200C_0
-https://example.org/index.html
->example.org/index.html__GET_C200C_0
-https://example.org/media
,https://example.org/media/
->example.org/media__GET_C200C_0
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view?one=1&two=2&three&three=3__GET_C200C_0
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index.html__GET_C200C_0
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание/is__GET_C200C_0
hupq
:%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_query|abbrev 120)s%(filepath_ext)s
-https://example.org
,https://example.org/
->example.org/index.htm
-https://example.org/index.html
->example.org/index.html
-https://example.org/media
,https://example.org/media/
->example.org/media/index.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view/index?one=1&two=2&three&three=3.htm
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание/is/index.htm
hupq_n
:%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_query|abbrev 120)s.%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->example.org/index.0.htm
-https://example.org/index.html
->example.org/index.0.html
-https://example.org/media
,https://example.org/media/
->example.org/media/index.0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view/index?one=1&two=2&three&three=3.0.htm
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index.0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание/is/index.0.htm
hupq_msn
:%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_query|abbrev 100)s.%(method)s_%(status)s_%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->example.org/index.GET_C200C_0.htm
-https://example.org/index.html
->example.org/index.GET_C200C_0.html
-https://example.org/media
,https://example.org/media/
->example.org/media/index.GET_C200C_0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view/index?one=1&two=2&three&three=3.GET_C200C_0.htm
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index.GET_C200C_0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание/is/index.GET_C200C_0.htm
hupnq
:%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 120)s%(filepath_ext)s
-https://example.org
,https://example.org/
->example.org/index.htm
-https://example.org/index.html
->example.org/index.html
-https://example.org/media
,https://example.org/media/
->example.org/media/index.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view/index?one=1&two=2&three=3.htm
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание/is/index.htm
hupnq_n
:%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 120)s.%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->example.org/index.0.htm
-https://example.org/index.html
->example.org/index.0.html
-https://example.org/media
,https://example.org/media/
->example.org/media/index.0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view/index?one=1&two=2&three=3.0.htm
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index.0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание/is/index.0.htm
hupnq_msn
:%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 100)s.%(method)s_%(status)s_%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->example.org/index.GET_C200C_0.htm
-https://example.org/index.html
->example.org/index.GET_C200C_0.html
-https://example.org/media
,https://example.org/media/
->example.org/media/index.GET_C200C_0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view/index?one=1&two=2&three=3.GET_C200C_0.htm
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index.GET_C200C_0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание/is/index.GET_C200C_0.htm
hupnq_mhs
:%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 120)s.%(method)s_%(net_url|to_ascii|sha256|take_prefix 4)s_%(status)s%(filepath_ext)s
-https://example.org
->example.org/index.GET_50d7_C200C.htm
-https://example.org/
->example.org/index.GET_8198_C200C.htm
-https://example.org/index.html
->example.org/index.GET_f0dc_C200C.html
-https://example.org/media
->example.org/media/index.GET_086d_C200C.htm
-https://example.org/media/
->example.org/media/index.GET_3fbb_C200C.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view/index?one=1&two=2&three=3.GET_5658_C200C.htm
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index.GET_4f11_C200C.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание/is/index.GET_c4ae_C200C.htm
hupnq_mhsn
:%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 100)s.%(method)s_%(net_url|to_ascii|sha256|take_prefix 4)s_%(status)s_%(num)d%(filepath_ext)s
-https://example.org
->example.org/index.GET_50d7_C200C_0.htm
-https://example.org/
->example.org/index.GET_8198_C200C_0.htm
-https://example.org/index.html
->example.org/index.GET_f0dc_C200C_0.html
-https://example.org/media
->example.org/media/index.GET_086d_C200C_0.htm
-https://example.org/media/
->example.org/media/index.GET_3fbb_C200C_0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view/index?one=1&two=2&three=3.GET_5658_C200C_0.htm
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index.GET_4f11_C200C_0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание/is/index.GET_c4ae_C200C_0.htm
rhupq
:%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_query|abbrev 120)s%(filepath_ext)s
-https://example.org
,https://example.org/
->org.example/index.htm
-https://example.org/index.html
->org.example/index.html
-https://example.org/media
,https://example.org/media/
->org.example/media/index.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->org.example/view/index?one=1&two=2&three&three=3.htm
-https://königsgäßchen.example.org/index.html
->org.example.königsgäßchen/index.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->org.example.ですの.ジャジェメント/испытание/is/index.htm
rhupq_n
:%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_query|abbrev 120)s.%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->org.example/index.0.htm
-https://example.org/index.html
->org.example/index.0.html
-https://example.org/media
,https://example.org/media/
->org.example/media/index.0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->org.example/view/index?one=1&two=2&three&three=3.0.htm
-https://königsgäßchen.example.org/index.html
->org.example.königsgäßchen/index.0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->org.example.ですの.ジャジェメント/испытание/is/index.0.htm
rhupq_msn
:%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_query|abbrev 100)s.%(method)s_%(status)s_%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->org.example/index.GET_C200C_0.htm
-https://example.org/index.html
->org.example/index.GET_C200C_0.html
-https://example.org/media
,https://example.org/media/
->org.example/media/index.GET_C200C_0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->org.example/view/index?one=1&two=2&three&three=3.GET_C200C_0.htm
-https://königsgäßchen.example.org/index.html
->org.example.königsgäßchen/index.GET_C200C_0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->org.example.ですの.ジャジェメント/испытание/is/index.GET_C200C_0.htm
rhupnq
:%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 120)s%(filepath_ext)s
-https://example.org
,https://example.org/
->org.example/index.htm
-https://example.org/index.html
->org.example/index.html
-https://example.org/media
,https://example.org/media/
->org.example/media/index.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->org.example/view/index?one=1&two=2&three=3.htm
-https://königsgäßchen.example.org/index.html
->org.example.königsgäßchen/index.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->org.example.ですの.ジャジェメント/испытание/is/index.htm
rhupnq_n
:%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 120)s.%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->org.example/index.0.htm
-https://example.org/index.html
->org.example/index.0.html
-https://example.org/media
,https://example.org/media/
->org.example/media/index.0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->org.example/view/index?one=1&two=2&three=3.0.htm
-https://königsgäßchen.example.org/index.html
->org.example.königsgäßchen/index.0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->org.example.ですの.ジャジェメント/испытание/is/index.0.htm
rhupnq_msn
:%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 100)s.%(method)s_%(status)s_%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->org.example/index.GET_C200C_0.htm
-https://example.org/index.html
->org.example/index.GET_C200C_0.html
-https://example.org/media
,https://example.org/media/
->org.example/media/index.GET_C200C_0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->org.example/view/index?one=1&two=2&three=3.GET_C200C_0.htm
-https://königsgäßchen.example.org/index.html
->org.example.königsgäßchen/index.GET_C200C_0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->org.example.ですの.ジャジェメント/испытание/is/index.GET_C200C_0.htm
rhupnq_mhs
:%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 120)s.%(method)s_%(net_url|to_ascii|sha256|take_prefix 4)s_%(status)s%(filepath_ext)s
-https://example.org
->org.example/index.GET_50d7_C200C.htm
-https://example.org/
->org.example/index.GET_8198_C200C.htm
-https://example.org/index.html
->org.example/index.GET_f0dc_C200C.html
-https://example.org/media
->org.example/media/index.GET_086d_C200C.htm
-https://example.org/media/
->org.example/media/index.GET_3fbb_C200C.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->org.example/view/index?one=1&two=2&three=3.GET_5658_C200C.htm
-https://königsgäßchen.example.org/index.html
->org.example.königsgäßchen/index.GET_4f11_C200C.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->org.example.ですの.ジャジェメント/испытание/is/index.GET_c4ae_C200C.htm
rhupnq_mhsn
:%(rhostname)s/%(filepath_parts|abbrev_each 120|pp_to_path)s%(oqm)s%(mq_nquery|abbrev 100)s.%(method)s_%(net_url|to_ascii|sha256|take_prefix 4)s_%(status)s_%(num)d%(filepath_ext)s
-https://example.org
->org.example/index.GET_50d7_C200C_0.htm
-https://example.org/
->org.example/index.GET_8198_C200C_0.htm
-https://example.org/index.html
->org.example/index.GET_f0dc_C200C_0.html
-https://example.org/media
->org.example/media/index.GET_086d_C200C_0.htm
-https://example.org/media/
->org.example/media/index.GET_3fbb_C200C_0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->org.example/view/index?one=1&two=2&three=3.GET_5658_C200C_0.htm
-https://königsgäßchen.example.org/index.html
->org.example.königsgäßchen/index.GET_4f11_C200C_0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->org.example.ですの.ジャジェメント/испытание/is/index.GET_c4ae_C200C_0.htm
flat
:%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path|replace / __|abbrev 120)s%(oqm)s%(mq_nquery|abbrev 100)s%(filepath_ext)s
-https://example.org
,https://example.org/
->example.org/index.htm
-https://example.org/index.html
->example.org/index.html
-https://example.org/media
,https://example.org/media/
->example.org/media__index.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view__index?one=1&two=2&three=3.htm
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание__is__index.htm
flat_n
:%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path|replace / __|abbrev 120)s%(oqm)s%(mq_nquery|abbrev 100)s.%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->example.org/index.0.htm
-https://example.org/index.html
->example.org/index.0.html
-https://example.org/media
,https://example.org/media/
->example.org/media__index.0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view__index?one=1&two=2&three=3.0.htm
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index.0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание__is__index.0.htm
flat_ms
:%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path|replace / __|abbrev 120)s%(oqm)s%(mq_nquery|abbrev 100)s.%(method)s_%(status)s%(filepath_ext)s
-https://example.org
,https://example.org/
->example.org/index.GET_C200C.htm
-https://example.org/index.html
->example.org/index.GET_C200C.html
-https://example.org/media
,https://example.org/media/
->example.org/media__index.GET_C200C.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view__index?one=1&two=2&three=3.GET_C200C.htm
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index.GET_C200C.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание__is__index.GET_C200C.htm
flat_msn
:%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path|replace / __|abbrev 120)s%(oqm)s%(mq_nquery|abbrev 100)s.%(method)s_%(status)s_%(num)d%(filepath_ext)s
-https://example.org
,https://example.org/
->example.org/index.GET_C200C_0.htm
-https://example.org/index.html
->example.org/index.GET_C200C_0.html
-https://example.org/media
,https://example.org/media/
->example.org/media__index.GET_C200C_0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view__index?one=1&two=2&three=3.GET_C200C_0.htm
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index.GET_C200C_0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание__is__index.GET_C200C_0.htm
flat_mhs
:%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path|replace / __|abbrev 120)s%(oqm)s%(mq_nquery|abbrev 100)s.%(method)s_%(net_url|to_ascii|sha256|take_prefix 4)s_%(status)s%(filepath_ext)s
-https://example.org
->example.org/index.GET_50d7_C200C.htm
-https://example.org/
->example.org/index.GET_8198_C200C.htm
-https://example.org/index.html
->example.org/index.GET_f0dc_C200C.html
-https://example.org/media
->example.org/media__index.GET_086d_C200C.htm
-https://example.org/media/
->example.org/media__index.GET_3fbb_C200C.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view__index?one=1&two=2&three=3.GET_5658_C200C.htm
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index.GET_4f11_C200C.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание__is__index.GET_c4ae_C200C.htm
flat_mhsn
:%(hostname)s/%(filepath_parts|abbrev_each 120|pp_to_path|replace / __|abbrev 120)s%(oqm)s%(mq_nquery|abbrev 100)s.%(method)s_%(net_url|to_ascii|sha256|take_prefix 4)s_%(status)s_%(num)d%(filepath_ext)s
-https://example.org
->example.org/index.GET_50d7_C200C_0.htm
-https://example.org/
->example.org/index.GET_8198_C200C_0.htm
-https://example.org/index.html
->example.org/index.GET_f0dc_C200C_0.html
-https://example.org/media
->example.org/media__index.GET_086d_C200C_0.htm
-https://example.org/media/
->example.org/media__index.GET_3fbb_C200C_0.htm
-https://example.org/view?one=1&two=2&three=&three=3#fragment
->example.org/view__index?one=1&two=2&three=3.GET_5658_C200C_0.htm
-https://königsgäßchen.example.org/index.html
->königsgäßchen.example.org/index.GET_4f11_C200C_0.html
-https://ジャジェメント.ですの.example.org/испытание/is/
,https://xn--hck7aa9d8fj9i.xn--88j1aw.example.org/%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/is/
->ジャジェメント.ですの.example.org/испытание__is__index.GET_c4ae_C200C_0.htm
- available substitutions:
- all expressions of
hoardy-web get --expr
(which see); num
: number of times the resulting output path was encountered before; adding this parameter to your--output
format will ensure all generated file names will be unique
- all expressions of
- available aliases and corresponding %-substitutions:
-
new
--output
s printing:--no-print
: don't print anything; default-l, --lf-terminated
: print absolute paths of newly produced or replaced files terminated with\n
(LF) newline characters-z, --zero-terminated
: print absolute paths of newly produced or replaced files terminated with\0
(NUL) bytes
-
updates to
--output
s:--no-overwrites
: disallow overwrites and replacements of any existing--output
files underDESTINATION
, i.e. only ever create new files underDESTINATION
, producing errors instead of attempting any other updates; default;--output
targets that are broken symlinks will be considered to be non-existent and will be replaced; when the operation's source is binary-eqivalent to the--output
target, the operation will be permitted, but the disk write will be reduced to a noop, i.e. the results will be deduplicated; thedirname
of a source file and the--to
target directories can be the same, in that case the source file will be renamed to use new--output
name, though renames that attempt to swap source file names will still fail--latest
: replace files underDESTINATION
with their latest version; this is only allowed in combination with--symlink
at the moment; for each sourcePATH
file, the destination--output
file will be replaced with a symlink to the source if and only ifstime_ms
of the source reqres is newer thanstime_ms
of the reqres stored at the destination file
-
caching, deferring, and batching:
--seen-number INT
: track at most this many distinct generated--output
values; default:16384
; making this larger improves disk performance at the cost of increased memory consumption; setting it to zero will force forcehoardy-web
to constantly re-check existence of--output
files and forcehoardy-web
to execute all IO actions immediately, disregarding--defer-number
setting--cache-number INT
: cachestat(2)
information about this many files in memory; default:8192
; making this larger improves performance at the cost of increased memory consumption; setting this to a too small number will likely forcehoardy-web
into repeatedly performing lots ofstat(2)
system calls on the same files; setting this to a value smaller than--defer-number
will not improve memory consumption very much since deferred IO actions also cache information about their own files--defer-number INT
: defer at most this many IO actions; default:1024
; making this larger improves performance at the cost of increased memory consumption; setting it to zero will force all IO actions to be applied immediately--batch-number INT
: queue at most this many deferred IO actions to be applied together in a batch; this queue will only be used if all other resource constraints are met; default: 128--max-memory INT
: the caches, the deferred actions queue, and the batch queue, all taken together, must not take more than this much memory in MiB; default:1024
; making this larger improves performance; the actual maximum whole-program memory consumption isO(<size of the largest reqres> + <--seen-number> + <sum of lengths of the last --seen-number generated --output paths> + <--cache-number> + <--defer-number> + <--batch-number> + <--max-memory>)
--lazy
: sets all of the above options to positive infinity; most useful when doinghoardy-web organize --symlink --latest --output flat
or similar, where the number of distinct generated--output
values and the amount of other datahoardy-web
needs to keep in memory is small, in which case it will forcehoardy-web
to compute the desired file system state first and then perform all disk writes in a single batch
-
file system path ordering:
--paths-given-order
:argv
and--stdin0
PATH
s are processed in the order they are given; default when--keep
--paths-sorted
:argv
and--stdin0
PATH
s are processed in lexicographic order--paths-reversed
:argv
and--stdin0
PATH
s are processed in reverse lexicographic order; default when--latest
--walk-fs-order
: recursive file system walk is done in the orderreaddir(2)
gives results--walk-sorted
: recursive file system walk is done in lexicographic order; default when--keep
--walk-reversed
: recursive file system walk is done in reverse lexicographic order; default when--latest
hoardy-web import
Use specified parser to parse data in each INPUT
PATH
into (a sequence of) reqres and then generate and place their WRR-dumps into separate WRR files under DESTINATION
with paths derived from their metadata.
In short, this is hoardy-web organize --copy
for INPUT
files that use different files formats.
- file formats:
{bundle,mitmproxy}
bundle
: convert WRR-bundles into separate WRR filesmitmproxy
: convertmitmproxy
stream dumps into WRR files
hoardy-web import bundle
Parse each INPUT
PATH
as a WRR-bundle (an optionally compressed sequence of WRR-dumps) and then generate and place their WRR-dumps into separate WRR files under DESTINATION
with paths derived from their metadata.
-
positional arguments:
PATH
: inputs, can be a mix of files and directories (which will be traversed recursively)
-
options:
--dry-run
: perform a trial run without actually performing any changes-q, --quiet
: don't log computed updates to stderr--stdin0
: read zero-terminatedPATH
s from stdin, these will be processed afterPATH
s specified as command-line arguments
-
error handling:
--errors {fail,skip,ignore}
: when an error occurs:fail
: report failure and stop the execution; defaultskip
: report failure but skip the reqres that produced it from the output and continueignore
:skip
, but don't report the failure
-
filters; both can be specified at the same time, both can be specified multiple times, both use the same expression format as
hoardy-web get --expr
(which see), the resulting logical expression that will checked is(O1 or O2 or ... or (A1 and A2 and ...))
, whereO1
,O2
, ... are the arguments to--or
s andA1
,A2
, ... are the arguments to--and
s:--or EXPR
: only import reqres which match any of these expressions--and EXPR
: only import reqres which match all of these expressions
-
file outputs:
-t DESTINATION, --to DESTINATION
: destination directory-o FORMAT, --output FORMAT
: format describing generated output paths, an alias name or "format:" followed by a custom pythonic %-substitution string; same expression format ashoardy-web organize --output
(which see); default: default
-
new
--output
s printing:--no-print
: don't print anything; default-l, --lf-terminated
: print absolute paths of newly produced or replaced files terminated with\n
(LF) newline characters-z, --zero-terminated
: print absolute paths of newly produced or replaced files terminated with\0
(NUL) bytes
-
updates to
--output
s:--no-overwrites
: disallow overwrites and replacements of any existing--output
files underDESTINATION
, i.e. only ever create new files underDESTINATION
, producing errors instead of attempting any other updates; default--overwrite-dangerously
: permit overwriting of old--output
files underDESTINATION
; DANGEROUS! not recommended, importing to a newDESTINATION
with the default--no-overwrites
and thenrsync
ing some of the files over to the oldDESTINATION
is a safer way to do this
-
caching, deferring, and batching:
--seen-number INT
: track at most this many distinct generated--output
values; default:16384
; making this larger improves disk performance at the cost of increased memory consumption; setting it to zero will force forcehoardy-web
to constantly re-check existence of--output
files and forcehoardy-web
to execute all IO actions immediately, disregarding--defer-number
setting--cache-number INT
: cachestat(2)
information about this many files in memory; default:8192
; making this larger improves performance at the cost of increased memory consumption; setting this to a too small number will likely forcehoardy-web
into repeatedly performing lots ofstat(2)
system calls on the same files; setting this to a value smaller than--defer-number
will not improve memory consumption very much since deferred IO actions also cache information about their own files--defer-number INT
: defer at most this many IO actions; default:0
; making this larger improves performance at the cost of increased memory consumption; setting it to zero will force all IO actions to be applied immediately--batch-number INT
: queue at most this many deferred IO actions to be applied together in a batch; this queue will only be used if all other resource constraints are met; default: 1024--max-memory INT
: the caches, the deferred actions queue, and the batch queue, all taken together, must not take more than this much memory in MiB; default:1024
; making this larger improves performance; the actual maximum whole-program memory consumption isO(<size of the largest reqres> + <--seen-number> + <sum of lengths of the last --seen-number generated --output paths> + <--cache-number> + <--defer-number> + <--batch-number> + <--max-memory>)
--lazy
: sets all of the above options to positive infinity; most useful when doinghoardy-web organize --symlink --latest --output flat
or similar, where the number of distinct generated--output
values and the amount of other datahoardy-web
needs to keep in memory is small, in which case it will forcehoardy-web
to compute the desired file system state first and then perform all disk writes in a single batch
-
file system path ordering:
--paths-given-order
:argv
and--stdin0
PATH
s are processed in the order they are given; default--paths-sorted
:argv
and--stdin0
PATH
s are processed in lexicographic order--paths-reversed
:argv
and--stdin0
PATH
s are processed in reverse lexicographic order--walk-fs-order
: recursive file system walk is done in the orderreaddir(2)
gives results--walk-sorted
: recursive file system walk is done in lexicographic order; default--walk-reversed
: recursive file system walk is done in reverse lexicographic order
hoardy-web import mitmproxy
Parse each INPUT
PATH
as mitmproxy
stream dump (by using mitmproxy
's own parser) into a sequence of reqres and then generate and place their WRR-dumps into separate WRR files under DESTINATION
with paths derived from their metadata.
-
positional arguments:
PATH
: inputs, can be a mix of files and directories (which will be traversed recursively)
-
options:
--dry-run
: perform a trial run without actually performing any changes-q, --quiet
: don't log computed updates to stderr--stdin0
: read zero-terminatedPATH
s from stdin, these will be processed afterPATH
s specified as command-line arguments
-
error handling:
--errors {fail,skip,ignore}
: when an error occurs:fail
: report failure and stop the execution; defaultskip
: report failure but skip the reqres that produced it from the output and continueignore
:skip
, but don't report the failure
-
filters; both can be specified at the same time, both can be specified multiple times, both use the same expression format as
hoardy-web get --expr
(which see), the resulting logical expression that will checked is(O1 or O2 or ... or (A1 and A2 and ...))
, whereO1
,O2
, ... are the arguments to--or
s andA1
,A2
, ... are the arguments to--and
s:--or EXPR
: only import reqres which match any of these expressions--and EXPR
: only import reqres which match all of these expressions
-
file outputs:
-t DESTINATION, --to DESTINATION
: destination directory-o FORMAT, --output FORMAT
: format describing generated output paths, an alias name or "format:" followed by a custom pythonic %-substitution string; same expression format ashoardy-web organize --output
(which see); default: default
-
new
--output
s printing:--no-print
: don't print anything; default-l, --lf-terminated
: print absolute paths of newly produced or replaced files terminated with\n
(LF) newline characters-z, --zero-terminated
: print absolute paths of newly produced or replaced files terminated with\0
(NUL) bytes
-
updates to
--output
s:--no-overwrites
: disallow overwrites and replacements of any existing--output
files underDESTINATION
, i.e. only ever create new files underDESTINATION
, producing errors instead of attempting any other updates; default--overwrite-dangerously
: permit overwriting of old--output
files underDESTINATION
; DANGEROUS! not recommended, importing to a newDESTINATION
with the default--no-overwrites
and thenrsync
ing some of the files over to the oldDESTINATION
is a safer way to do this
-
caching, deferring, and batching:
--seen-number INT
: track at most this many distinct generated--output
values; default:16384
; making this larger improves disk performance at the cost of increased memory consumption; setting it to zero will force forcehoardy-web
to constantly re-check existence of--output
files and forcehoardy-web
to execute all IO actions immediately, disregarding--defer-number
setting--cache-number INT
: cachestat(2)
information about this many files in memory; default:8192
; making this larger improves performance at the cost of increased memory consumption; setting this to a too small number will likely forcehoardy-web
into repeatedly performing lots ofstat(2)
system calls on the same files; setting this to a value smaller than--defer-number
will not improve memory consumption very much since deferred IO actions also cache information about their own files--defer-number INT
: defer at most this many IO actions; default:0
; making this larger improves performance at the cost of increased memory consumption; setting it to zero will force all IO actions to be applied immediately--batch-number INT
: queue at most this many deferred IO actions to be applied together in a batch; this queue will only be used if all other resource constraints are met; default: 1024--max-memory INT
: the caches, the deferred actions queue, and the batch queue, all taken together, must not take more than this much memory in MiB; default:1024
; making this larger improves performance; the actual maximum whole-program memory consumption isO(<size of the largest reqres> + <--seen-number> + <sum of lengths of the last --seen-number generated --output paths> + <--cache-number> + <--defer-number> + <--batch-number> + <--max-memory>)
--lazy
: sets all of the above options to positive infinity; most useful when doinghoardy-web organize --symlink --latest --output flat
or similar, where the number of distinct generated--output
values and the amount of other datahoardy-web
needs to keep in memory is small, in which case it will forcehoardy-web
to compute the desired file system state first and then perform all disk writes in a single batch
-
file system path ordering:
--paths-given-order
:argv
and--stdin0
PATH
s are processed in the order they are given; default--paths-sorted
:argv
and--stdin0
PATH
s are processed in lexicographic order--paths-reversed
:argv
and--stdin0
PATH
s are processed in reverse lexicographic order--walk-fs-order
: recursive file system walk is done in the orderreaddir(2)
gives results--walk-sorted
: recursive file system walk is done in lexicographic order; default--walk-reversed
: recursive file system walk is done in reverse lexicographic order
hoardy-web export
Parse given WRR files into their respective reqres, convert to another file format, and then dump the result under DESTINATION
with the new path derived from each reqres' metadata.
- file formats:
{mirror}
mirror
: convert given WRR files into a local website mirror stored in interlinked plain files
hoardy-web export mirror
Parse given WRR files, filter out those that have no responses, transform and then dump their response bodies into separate files under DESTINATION
with the new path derived from each reqres' metadata.
In short, this is a combination of hoardy-web organize --copy
followed by in-place hoardy-web get
.
In other words, this generates static offline website mirrors, producing results similar to those of wget -mpk
.
-
positional arguments:
PATH
: inputs, can be a mix of files and directories (which will be traversed recursively)
-
options:
--dry-run
: perform a trial run without actually performing any changes-q, --quiet
: don't log computed updates to stderr--stdin0
: read zero-terminatedPATH
s from stdin, these will be processed afterPATH
s specified as command-line arguments
-
error handling:
--errors {fail,skip,ignore}
: when an error occurs:fail
: report failure and stop the execution; defaultskip
: report failure but skip the reqres that produced it from the output and continueignore
:skip
, but don't report the failure
-
filters; both can be specified at the same time, both can be specified multiple times, both use the same expression format as
hoardy-web get --expr
(which see), the resulting logical expression that will checked is(O1 or O2 or ... or (A1 and A2 and ...))
, whereO1
,O2
, ... are the arguments to--or
s andA1
,A2
, ... are the arguments to--and
s:--or EXPR
: only export reqres which match any of these expressions--and EXPR
: only export reqres which match all of these expressions
-
expression evaluation:
-e EXPR, --expr EXPR
: an expression to compute, same expression format and semantics ashoardy-web get --expr
(which see); can be specified multiple times; default:response.body|eb|scrub response +all_refs,-actions
, which will export safe scrubbed versions of all files
-
URL remapping; used by
scrub
atom of--expr
:--remap-id
: remap all URLs with an identity function; i.e. don't remap anything--remap-void
: remap all jump-link and action URLs tojavascript:void(0)
and all resource URLs into emptydata:
URLs; resulting web pages will be self-contained--remap-open, -k, --convert-links
: point all URLs present in inputPATH
s and reachable from--root
s in no more that--depth
steps to their corresponding output paths, remap all other URLs like--remap-id
does; this is similar towget (-k|--convert-links)
--remap-closed
: remap all reachable URLs like--remap-open
does, remap all other URLs like--remap-void
does;export
edmirror
s will be self-contained--remap-all
: remap all reachable URLs like--remap-open
does, remap other URLs as if for each missing URL a trivialGET <URL> -> 200 OK
reqres is present among inputPATH
s; this will produce broken links if the--output
format depends on anything but the URL itself, but for a simple--output
(like the defaulthupq
) this will remap missing URLs to--output
paths that they would occupy if they were present; this allowshoardy-web export
to be used incrementally;export
edmirror
s will be self-contained; default
-
exporting:
--not-separated
: export values without separating them with anything, just concatenate them--lf-separated
: export values separated with\n
(LF) newline characters; default--zero-separated
: export values separated with\0
(NUL) bytes
-
file outputs:
-t DESTINATION, --to DESTINATION
: destination directory-o FORMAT, --output FORMAT
: format describing generated output paths, an alias name or "format:" followed by a custom pythonic %-substitution string; same expression format ashoardy-web organize --output
(which see); default: hupq
-
new
--output
s printing:--no-print
: don't print anything; default-l, --lf-terminated
: print absolute paths of newly produced or replaced files terminated with\n
(LF) newline characters-z, --zero-terminated
: print absolute paths of newly produced or replaced files terminated with\0
(NUL) bytes
-
updates to
--output
s:--no-overwrites
: disallow overwrites and replacements of any existing--output
files underDESTINATION
, i.e. only ever create new files underDESTINATION
, producing errors instead of attempting any other updates; default; repeated exports of the same export targets with the same parameters (which, therefore, will produce the same--output
data) are allowed and will be reduced to noops; however, trying to overwrite existing--output
files underDESTINATION
with any new data will produce errors; this allows reusing theDESTINATION
between unrelated exports and between exports that produce the same data on disk in their common parts--skip-existing, --partial
: skip exporting of targets which have a corresponding--output
file underDESTINATION
; using this together with--depth
is likely to produce a partially broken result, since skipping an export target will also skip all the documents it references; on the other hand, this is quite useful when growing a partial mirror generated with--remap-all
--overwrite-dangerously
: export all targets while permitting overwriting of old--output
files underDESTINATION
; DANGEROUS! not recommended, exporting to a newDESTINATION
with the default--no-overwrites
and thenrsync
ing some of the files over to the oldDESTINATION
is a safer way to do this
-
export targets:
-r URL, --root URL
: recursion root; a URL which will be used as a root for recursive export; can be specified multiple times; if none are specified, then all (net_url
) URLs available from inputPATH
s will be treated as roots-d DEPTH, --depth DEPTH
: maximum recursion depth level; the default is0
, which means "--root
documents and their resources only"; setting this to1
will also export one level of documents referenced via jump and action links, if those are being remapped to local files with--remap-*
; higher values will mean even more recursion
-
file system path ordering:
--paths-given-order
:argv
and--stdin0
PATH
s are processed in the order they are given; default--paths-sorted
:argv
and--stdin0
PATH
s are processed in lexicographic order--paths-reversed
:argv
and--stdin0
PATH
s are processed in reverse lexicographic order--walk-fs-order
: recursive file system walk is done in the orderreaddir(2)
gives results--walk-sorted
: recursive file system walk is done in lexicographic order; default--walk-reversed
: recursive file system walk is done in reverse lexicographic order
Examples
-
Pretty-print all reqres in
../simple_server/pwebarc-dump
using an abridged (for ease of reading and rendering) verbose textual representation:hoardy-web pprint ../simple_server/pwebarc-dump
-
Pipe raw response body from a given WRR file to stdout:
hoardy-web get ../simple_server/pwebarc-dump/path/to/file.wrr
-
Pipe response body scrubbed of dynamic content from a given WRR file to stdout:
hoardy-web get -e "response.body|eb|scrub response defaults" ../simple_server/pwebarc-dump/path/to/file.wrr
-
Get first 4 characters of a hex digest of sha256 hash computed on the URL without the fragment/hash part:
hoardy-web get -e "net_url|to_ascii|sha256|take_prefix 4" ../simple_server/pwebarc-dump/path/to/file.wrr
-
Pipe response body from a given WRR file to stdout, but less efficiently, by generating a temporary file and giving it to
cat
:hoardy-web run cat ../simple_server/pwebarc-dump/path/to/file.wrr
Thus
hoardy-web run
can be used to do almost anything you want, e.g.hoardy-web run less ../simple_server/pwebarc-dump/path/to/file.wrr
hoardy-web run -- sort -R ../simple_server/pwebarc-dump/path/to/file.wrr
hoardy-web run -n 2 -- diff -u ../simple_server/pwebarc-dump/path/to/file-v1.wrr ../simple_server/pwebarc-dump/path/to/file-v2.wrr
-
List paths of all WRR files from
../simple_server/pwebarc-dump
that contain only complete200 OK
responses with bodies larger than 1K:hoardy-web find --and "status|~= .200C" --and "response.body|len|> 1024" ../simple_server/pwebarc-dump
-
Rename all WRR files in
../simple_server/pwebarc-dump/default
according to their metadata using--output default
(see thehoardy-web organize
section for its definition, thedefault
format is designed to be human-readable while causing almost no collisions, thus makingnum
substitution parameter to almost always stay equal to0
, making things nice and deterministic):hoardy-web organize ../simple_server/pwebarc-dump/default
alternatively, just show what would be done
hoardy-web organize --dry-run ../simple_server/pwebarc-dump/default
Advanced examples
-
Pretty-print all reqres in
../simple_server/pwebarc-dump
by dumping their whole structure into an abridged Pythonic Object Representation (repr):hoardy-web stream --expr . ../simple_server/pwebarc-dump
hoardy-web stream -e . ../simple_server/pwebarc-dump
-
Pretty-print all reqres in
../simple_server/pwebarc-dump
using the unabridged verbose textual representation:hoardy-web pprint --unabridged ../simple_server/pwebarc-dump
hoardy-web pprint -u ../simple_server/pwebarc-dump
-
Pretty-print all reqres in
../simple_server/pwebarc-dump
by dumping their whole structure into the unabridged Pythonic Object Representation (repr) format:hoardy-web stream --unabridged --expr . ../simple_server/pwebarc-dump
hoardy-web stream -ue . ../simple_server/pwebarc-dump
-
Produce a JSON list of
[<file path>, <time it finished loading in seconds since UNIX epoch>, <URL>]
tuples (one per reqres) and pipe it intojq
for indented and colored output:hoardy-web stream --format=json -ue fs_path -e finished_at -e request.url ../simple_server/pwebarc-dump | jq .
-
Similarly, but produce a CBOR output:
hoardy-web stream --format=cbor -ue fs_path -e finished_at -e request.url ../simple_server/pwebarc-dump | less
-
Concatenate all response bodies of all the requests in
../simple_server/pwebarc-dump
:hoardy-web stream --format=raw --not-terminated -ue "response.body|es" ../simple_server/pwebarc-dump | less
-
Print all unique visited URLs, one per line:
hoardy-web stream --format=raw --lf-terminated -ue request.url ../simple_server/pwebarc-dump | sort | uniq
-
Same idea, but using NUL bytes while processing, and prints two URLs per line:
hoardy-web stream --format=raw --zero-terminated -ue request.url ../simple_server/pwebarc-dump | sort -z | uniq -z | xargs -0 -n2 echo
How to handle binary data
Trying to use response bodies produced by hoardy-web stream --format=json
is likely to result garbled data as JSON can't represent raw sequences of bytes, thus binary data will have to be encoded into UNICODE using replacement characters:
hoardy-web stream --format=json -ue . ../simple_server/pwebarc-dump/path/to/file.wrr | jq .
The most generic solution to this is to use --format=cbor
instead, which would produce a verbose CBOR representation equivalent to the one used by --format=json
but with binary data preserved as-is:
hoardy-web stream --format=cbor -ue . ../simple_server/pwebarc-dump/path/to/file.wrr | less
Or you could just dump raw response bodies separately:
hoardy-web stream --format=raw -ue response.body ../simple_server/pwebarc-dump/path/to/file.wrr | less
hoardy-web get ../simple_server/pwebarc-dump/path/to/file.wrr | less
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for hoardy_web-0.14.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c9b284e658d6eab71f15922cec8090e70a4dae53e900b625c7e6d1756a14dd1 |
|
MD5 | 08341002370b23386c148f6badc7698c |
|
BLAKE2b-256 | 71691277b1c6df5f0f2c3286c6839f66b37817aeea43379213fffcc26896327f |