Fetch public domain artwork from Artvee
Project description
artvee-scraper-cli
artvee-scraper-cli is an easy to use command line utility for fetching public domain artwork from Artvee (https://www.artvee.com).
Installation
Using PyPI
$ python -m pip install artvee-scraper-cli
Python 3.10+ is officially supported.
Synopsis
artvee-scraper-cli <command> [optional arguments] [positional arguments]
Examples
View help
$ artvee-scraper-cli -h
usage: artvee-scraper-cli [-h] {log-json,file-json,file-multi} ...
Scrape artwork from https://www.artvee.com
positional arguments:
{log-json,file-json,file-multi}
log-json Artwork is output to the log as a JSON object
file-json Artwork is represented as a JSON object and written to a file
file-multi Artwork image and metadata are written as separate files
optional arguments:
-h, --help show this help message and exit
View help for the file-json command
$ artvee-scraper-cli file-json -h
usage: artvee-scraper-cli file-json [-h] [-t [1-10]] [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
[-c {abstract,figurative,landscape,religion,mythology,posters,animals,illustration,still-life,botanical,drawings,asian-art}]
[--log-dir LOG_DIR] [--log-max-size [1-10240]] [--log-max-backups [0-100]]
[--space-level [2-6]] [--sort-keys] [--overwrite-existing]
dir_path
positional arguments:
dir_path JSON file output directory
optional arguments:
-h, --help show this help message and exit
-t [1-10], --worker-threads [1-10]
Number of worker threads (1-10)
-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
Set the application log level
-c {abstract,figurative,landscape,religion,mythology,posters,animals,illustration,still-life,botanical,drawings,asian-art}, --category {abstract,figurative,landscape,religion,mythology,posters,animals,illustration,still-life,botanical,drawings,asian-art}
Category of artwork to scrape
--space-level [2-6] Enable pretty-printing; number of spaces to indent (2-6)
--sort-keys Sort JSON keys in alphabetical order
--overwrite-existing Overwrite existing files
optional log file arguments:
--log-dir LOG_DIR Log file output directory
--log-max-size [1-10240]
Maximum log file size in MB (1-10,240)
--log-max-backups [0-100]
Maximum number of log files to keep (0-100)
Download artwork from artvee.com and save each as individal files (JSON format) in the directory ~/artvee/downloads
$ artvee-scraper-cli file-json ~/artvee/downloads
Available Commands
log-json
Download artwork and output each to the log as a JSON objects. Note: This command is intended for development test usage; typically it is not desirable to dump the data to the log.
$ artvee-scraper-cli log-json [optional arguments]
Optional arguments
-h
|--help
(boolean)Display help message.
-t
|--worker-threads
(integer)The number of worker threads used for processing. Range of values is [1-10]. The default value is 3.
-l
|--log-level
(string)Application log level. One of: DEBUG, INFO, WARNING, ERROR, CRITICAL. The default value is INFO.
-c
|--category
(string)Category of artwork to fetch. One of: abstract, figurative, landscape, religion, mythology, posters, animals, illustration, still-life, botanical, drawings, asian-art. May be repeatedly used to specify multiple categories (-c animals, -c drawings). The default value is ALL categories.
Optional log file arguments
--log-dir
(string)Path to existing directory used to store artvee_scraper.log log files. Disabled by default.
--log-max-size
(integer)Maximum size in MB the log file should reach before triggering a rollover. Only applies if --log-dir has been specified. Range of values is [1-10240]. The default value is 1024MB (1GB).
--log-max-backups
(integer)Maximum number of log file archives to keep. Only applies if --log-dir has been specified. The actively written file is artvee_scraper.log. Backup files will have an incrementing numerical suffix; artvee_scraper.log.1 ... artvee_scraper.log.N. If this value is zero, rollovers will be disabled. Range of values is [0-100]. The default value is 10.
Optional writer arguments
--space-level
(integer)Pretty print JSON; number of spaces to indent. Range of values is [2-6]. Disabled by default.
--sort-keys
(boolean)Sort JSON keys in alphabetical order. Disabled by default.
--include-image
(boolean)Image will be included in output. Excessive output warning! Disabled by default.
Basic Example
$ artvee-scraper-cli log-json
Output:
...
2038-01-19 18:34:38.941 INFO [ThreadPoolExecutor-0_0] runner.<lambda>(79) | Processing 'Komposition' by Otto Freundlich
2038-01-19 18:34:38.943 INFO [ThreadPoolExecutor-0_0] log_writer.write(45) | {"url": "https://artvee.com/dl/komposition-2/", "resource": "komposition-2", "title": "Komposition", "category": "Abstract", "artist": "Otto Freundlich", "date": "1938", "origin": "German, 1878-1943", "image": {"source_url": "https://mdl.artvee.com/sdl/102399absdl.jpg", "width": 1423, "height": 1800, "file_size": 1.1, "file_size_unit": "MB", "format_name": "jpg"}}
...
Advanced Example
$ artvee-scraper-cli log-json --worker-threads 2 --log-level DEBUG --category abstract --log-dir /var/log/artvee --log-max-size 2048 --log-max-backups 10 --space-level 2 --sort-keys --include-image
Output:
$ cat /var/log/artvee/artvee_scraper_cli.log
...
2038-01-19 18:40:11.772 DEBUG [ThreadPoolExecutor-0_0] artvee_client.get_image(132) | Retrieving image; url=https://mdl.artvee.com/sdl/105042absdl.jpg
2038-01-19 18:40:11.772 DEBUG [ThreadPoolExecutor-0_0] connectionpool._new_conn(1051) | Starting new HTTPS connection (1): mdl.artvee.com:443
2038-01-19 18:40:11.853 DEBUG [ThreadPoolExecutor-0_0] connectionpool._make_request(546) | https://mdl.artvee.com:443 "GET /sdl/105042absdl.jpg HTTP/11" 200 2011451
2038-01-19 18:40:11.941 INFO [ThreadPoolExecutor-0_0] runner.<lambda>(79) | Processing 'Gare' by Joaquín Torres-García
2038-01-19 18:40:11.967 INFO [ThreadPoolExecutor-0_0] log_writer.write(45) | {
"artist": "Joaquín Torres-García",
"category": "Abstract",
"date": "1928",
"image": {
"file_size": 1.92,
"file_size_unit": "MB",
"format_name": "jpg",
"height": 1259,
"raw": "/9j/4AAQSkZJRgABA ... o4xSSSVkumh//9k=",
"source_url": "https://mdl.artvee.com/sdl/105042absdl.jpg",
"width": 1800
},
"origin": "Uruguayan, 1874-1949",
"resource": "gare",
"title": "Gare",
"url": "https://artvee.com/dl/gare/"
}
...
file-json
Download artwork and write each to the filesystem. Each artwork is stored as a JSON object.
$ artvee-scraper-cli file-json [optional arguments] <dir_path>
Positional arguments
dir_path
(string) Position 0.Path to existing directory used to store output files.
Optional arguments
-h
|--help
(boolean)Display help message.
-t
|--worker-threads
(integer)The number of worker threads used for processing. Range of values is [1-10]. The default value is 3.
-l
|--log-level
(string)Application log level. One of: DEBUG, INFO, WARNING, ERROR, CRITICAL. The default value is INFO.
-c
|--category
(string)Category of artwork to fetch. One of: abstract, figurative, landscape, religion, mythology, posters, animals, illustration, still-life, botanical, drawings, asian-art. May be repeatedly used to specify multiple categories (-c animals, -c drawings). The default value is ALL categories.
Optional log file arguments
--log-dir
(string)Path to existing directory used to store artvee_scraper.log log files. Disabled by default.
--log-max-size
(integer)Maximum size in MB the log file should reach before triggering a rollover. Only enabled if --log-dir has been specified. Range of values is [1-10240]. The default value is 1024MB (1GB).
--log-max-backups
(integer)Maximum number of log file archives to keep. Only enabled if --log-dir has been specified. The actively written file is artvee_scraper.log. Backup files will have an incrementing numerical suffix; artvee_scraper.log.1 ... artvee_scraper.log.N. If this value is zero, rollovers will be disabled. Range of values is [0-100]. The default value is 10.
Optional writer arguments
--space-level
(integer)Pretty print JSON; number of spaces to indent. Range of values is [2-6]. Disabled by default.
--sort-keys
(boolean)Sort JSON keys in alphabetical order. Disabled by default.
--overwrite-existing
(boolean)Allow existing duplicate files to be overwritten. Disabled by default.
Basic Example
$ artvee-scraper-cli file-json ~/artvee/downloads
Output:
$ cat ~/artvee/downloads/woman-by-the-window.json
{"url": "https://artvee.com/dl/woman-by-the-window/", "resource": "woman-by-the-window", "title": "Woman by the window", "category": "Abstract", "artist": "Mikuláš Galanda", "date": "1928", "origin": "Slovak, 1895 – 1938", "image": {"source_url": "https://mdl.artvee.com/sdl/101518absdl.jpg", "width": 1317, "height": 1800, "file_size": 2.48, "file_size_unit": "MB", "raw": "/9j/4AAQSkZJRgAB ... aK1lZLTp7i/Vn//Z", "format_name": "jpg"}}
Advanced Example
$ artvee-scraper-cli file-json ~/artvee/downloads --worker-threads 1 --log-level INFO --category mythology --log-dir /var/log/artvee --log-max-size 512 --log-max-backups 10 --space-level 4 --sort-keys --overwrite-existing
Output:
$ cat ~/artvee/downloads/the-judgment-of-paris-3.json
{
"artist": "Joachim Wtewael",
"category": "Mythology",
"date": "1602",
"image": {
"file_size": 7.42,
"file_size_unit": "MB",
"format_name": "jpg",
"height": 2138,
"raw": "/9j/4R8FRXhpZgAASUkq ... /pNfu/+89V/wB46//Z",
"source_url": "https://mdl.artvee.com/sdl/400408mtsdl.jpg",
"width": 2833
},
"origin": "Dutch, 1566 - 1638",
"resource": "the-judgment-of-paris-3",
"title": "The Judgment of Paris",
"url": "https://artvee.com/dl/the-judgment-of-paris-3/"
}
file-multi
Download artwork and write each to the filesystem. Each artwork is stored as two files: metadata (JSON) & image (JPG).
$ artvee-scraper-cli file-multi [optional arguments] <metadata_dir_path> <image_dir_path>
Positional arguments
metadata_dir_path
(string) Position 0.Path to existing directory used to store output metadata files.
image_dir_path
(string) Position 1.Path to existing directory used to store output image files.
Optional arguments
-h
|--help
(boolean)Display help message.
-t
|--worker-threads
(integer)The number of worker threads used for processing. Range of values is [1-10]. The default value is 3.
-l
|--log-level
(string)Application log level. One of: DEBUG, INFO, WARNING, ERROR, CRITICAL. The default value is INFO.
-c
|--category
(string)Category of artwork to fetch. One of: abstract, figurative, landscape, religion, mythology, posters, animals, illustration, still-life, botanical, drawings, asian-art. May be repeatedly used to specify multiple categories (-c animals -c drawings). The default value is ALL categories.
Optional log file arguments
--log-dir
(string)Path to existing directory used to store artvee_scraper.log log files. Disabled by default.
--log-max-size
(integer)Maximum size in MB the log file should reach before triggering a rollover. Only enabled if --log-dir has been specified. Range of values is [1-10240]. The default value is 1024MB (1GB).
--log-max-backups
(integer)Maximum number of log file archives to keep. Only enabled if --log-dir has been specified. The actively written file is artvee_scraper.log. Backup files will have an incrementing numerical suffix; artvee_scraper.log.1 ... artvee_scraper.log.N. If this value is zero, rollovers will be disabled. Range of values is [0-100]. The default value is 10.
Optional writer arguments
--space-level
(integer)Pretty print JSON; number of spaces to indent. Range of values is [2-6]. Disabled by default.
--sort-keys
(boolean)Sort JSON keys in alphabetical order. Disabled by default.
--overwrite-existing
(boolean)Allow existing duplicate files to be overwritten. Disabled by default.
Basic Example
$ artvee-scraper-cli file-multi ~/artvee/downloads/metadata ~/artvee/downloads/images
Output:
$ cat ~/artvee/downloads/metadata/the-pet-pig.json
{"url": "https://artvee.com/dl/the-pet-pig/", "resource": "the-pet-pig", "title": "The pet pig", "category": "Abstract", "artist": "Edvard Munch", "date": "1908-1910", "origin": "Norwegian, 1863 - 1944", "image": {"source_url": "https://mdl.artvee.com/sdl/103755absdl.jpg", "width": 1800, "height": 1320, "file_size": 1.67, "file_size_unit": "MB", "format_name": "jpg"}}
$ hexdump -C ~/artvee/downloads/images/the-pet-pig.jpg
00000000 ff d8 ff e0 00 10 4a 46 49 46 00 01 01 01 01 2c |......JFIF.....,|
...
001aa430 40 2b 9c 02 8a 2b 48 b6 d6 bd ff 00 c8 0f ff d9 |@+...+H.........|
001aa440
Advanced Example
$ artvee-scraper-cli file-multi --worker-threads 1 --log-level INFO --category asian-art --log-dir /var/log/artvee --log-max-size 512 --log-max-backups 10 --space-level 2 --sort-keys --overwrite-existing ~/artvee/downloads/metadata ~/artvee/downloads/images
Output:
$ cat ~/artvee/downloads/metadata/two-ronin-looking-into-yoshiwara.json
{
"artist": "Andō Hiroshige",
"category": "Asian-art",
"date": "19th century",
"image": {
"file_size": 2.29,
"file_size_unit": "MB",
"format_name": "jpg",
"height": 1179,
"source_url": "https://mdl.artvee.com/sdl/52015jpsdl.jpg",
"width": 1800
},
"origin": "Japanese, 1797 – 1858",
"resource": "two-ronin-looking-into-yoshiwara",
"title": "Two Ronin Looking into Yoshiwara",
"url": "https://artvee.com/dl/two-ronin-looking-into-yoshiwara/"
}
$ hexdump -C ~/artvee/downloads/images/two-ronin-looking-into-yoshiwara.jpg
00000000 ff d8 ff e0 00 10 4a 46 49 46 00 01 01 01 01 2c |......JFIF.....,|
...
002499c0 a2 b4 fe bf ad cc 4f ff d9 |......O..|
002499c9
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file artvee-scraper-cli-1.0.1.tar.gz
.
File metadata
- Download URL: artvee-scraper-cli-1.0.1.tar.gz
- Upload date:
- Size: 16.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f8cee9ebdfc91843b24334365701dc713dcfbca0ba93f296cbed0658938d8e65 |
|
MD5 | fe3c91e5eee5651c6e847a85bf238917 |
|
BLAKE2b-256 | b2587bcfcc141ba8eecb22129e26bdf86bd02796d83c8c0a55821d279c9e9203 |
File details
Details for the file artvee_scraper_cli-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: artvee_scraper_cli-1.0.1-py3-none-any.whl
- Upload date:
- Size: 17.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 111d3a04cc70097e5b21c2fd85f8564cc5f97ab96b05fadf2b6126a0b9dbb7ce |
|
MD5 | 22c649fc9c3003e96d9bd8d7d9d1137f |
|
BLAKE2b-256 | 2cec3e6af44d51467a1fe5971aea72412ed56dca3b1495d10ab8b3b0211aa205 |