Web gallery downloader
img-lurker is a gallery downloader.
img-lurker takes a URL of a (HTML) web page and downloads linked images on it. If the page contains only thumbnails, linking to a the full size version of the image, img-lurker will rather take the bigger one. If there are links to other HTML pages (themselves containing a the full size image), img-lurker will follow those links to find the bigger size.
img-lurker has a "minimum image size" for considering an image is worthy of being downloaded and isn't UI stuff like buttons/separators. img-lurker will not follow links if the link tag doesn't contain an image tag (assumed to be the thumbnail).
Consider a site with following HTML:
<a href="fullimage1.jpg"> <img src="thumbnail1.jpg" /> </a> <a href="fullimage2.jpg"> <img src="thumbnail2.jpg" /> </a>
img-lurker would download "fullimage1.jpg" and "fullimage2.jpg". If instead the links point to other HTML pages containing the full size version of the images (for example "fullimage1.html" containing "fullimage1.jpg"), img-lurker would still find fullimage1.jpg by following the page links.
Inject a specific cookie, which might be required to visit some restricted access pages. For example, some subreddits require you to pass the cookie "over18=1".
The option can be passed several times to inject multiple cookies.
img-lurker can handle pagination for sites where a gallery contains so many
images that the site is split in numbered pages.
HTML_XPATH should be an XPath expression locating the HTML link to the "next
If this argument is given, after downloading all images of a "page", img-lurker
will follow the link pointed to by
HTML_XPATH and repeat on the next page.
Warning: this can issue a lot of traffic for huge galleries. Be cautious or you might get blocked by the website.
Mark all downloaded images URLs in this file and avoid redownloading URLs
present in this file.
Useful when running img-lurker multiple times on the same gallery, typically if
the gallery has received fresh images. Also useful if you use
--next-page-xpath option and kill img-lurker to avoid flooding the site, make
a pause (minutes? hours? days?) then restart img-lurker: the history file will
help it resume where it was interrupted.
This makes the assumption that each image has a URL that is unique (the image URL always have the same URL, e.g. no varying tokens, etc.) and conversely (the URLs will not point to another image at some point, e.g. the images are NOT numbered in ascending order (else "1.jpg" would point to different images over time)).
--min-thumb-size WIDTHxHEIGHT --min-image-size WIDTHxHEIGHT
Minimum size for an image to be considered a thumbnail worth following or an
image worth downloading. Useful not to download navigation buttons, logos, etc.
Default values are
Maximum ratio between WIDTH and HEIGHT (or HEIGHT on WIDTH, img-lurker is smart enough to figure out) to consider an image is worth downloading.
For example, pass "16:9" and img-lurker will accept images with dimensions 1920x1080 or 1080x1920 as they are respectively 16:9 and 9:16 but also 1600x1200 or 1200x1600 because they are 4:3 (and 3:4) which is lower (more looking like a square) than the max "16:9". Ratios of portrait and landscape are considered equivalent. However, passing "16:9" would discard a banner with dimensions 1200x300 because its ratio is 4:1 which is way more distorted (very thin rectangle) than 16:9. It would also reject a banner with dimensions 300x1200 because it is 1:4, equivalent to 4:1.
A photo is rarely square but is almost never thin like 4:1, except panoramas, so
configure this option if you intend to download panoramas for example.
The default value is
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Hashes for img_lurker-1.0.1-py3-none-any.whl