Skip to main content

Fetch public domain artwork from Artvee

Project description

artvee-scraper-cli

artvee-scraper-cli is an easy to use command line utility for fetching public domain artwork from Artvee (https://www.artvee.com).

Installation

Using PyPI

$ python -m pip install artvee-scraper-cli

Python 3.10+ is officially supported.

Synopsis

artvee-scraper-cli <command> [optional arguments] [positional arguments]

Examples

View help

$ artvee-scraper-cli -h
usage: artvee-scraper-cli [-h] {log-json,file-json,file-multi} ...

Scrape artwork from https://www.artvee.com

positional arguments:
  {log-json,file-json,file-multi}
    log-json            Artwork is output to the log as a JSON object
    file-json           Artwork is represented as a JSON object and written to a file
    file-multi          Artwork image and metadata are written as separate files

optional arguments:
  -h, --help            show this help message and exit

View help for the file-json command

$ artvee-scraper-cli file-json -h
usage: artvee-scraper-cli file-json [-h] [-t [1-10]] [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                    [-c {abstract,figurative,landscape,religion,mythology,posters,animals,illustration,still-life,botanical,drawings,asian-art}]
                    [--log-dir LOG_DIR] [--log-max-size [1-10240]] [--log-max-backups [0-100]]
                    [--space-level [2-6]] [--sort-keys] [--overwrite-existing]
                    dir_path

positional arguments:
  dir_path              JSON file output directory

optional arguments:
  -h, --help            show this help message and exit
  -t [1-10], --worker-threads [1-10]
                        Number of worker threads (1-10)
  -l {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                        Set the application log level
  -c {abstract,figurative,landscape,religion,mythology,posters,animals,illustration,still-life,botanical,drawings,asian-art}, --category {abstract,figurative,landscape,religion,mythology,posters,animals,illustration,still-life,botanical,drawings,asian-art}
                        Category of artwork to scrape
  --space-level [2-6]   Enable pretty-printing; number of spaces to indent (2-6)
  --sort-keys           Sort JSON keys in alphabetical order
  --overwrite-existing  Overwrite existing files

optional log file arguments:
  --log-dir LOG_DIR     Log file output directory
  --log-max-size [1-10240]
                        Maximum log file size in MB (1-10,240)
  --log-max-backups [0-100]
                        Maximum number of log files to keep (0-100)

Download artwork from artvee.com and save each as individal files (JSON format) in the directory ~/artvee/downloads

$ artvee-scraper-cli file-json ~/artvee/downloads

Available Commands

log-json

Download artwork and output each to the log as a JSON objects. Note: This command is intended for development test usage; typically it is not desirable to dump the data to the log.

$ artvee-scraper-cli log-json [optional arguments]
Optional arguments

-h | --help (boolean)

Display help message.

-t | --worker-threads (integer)

The number of worker threads used for processing. Range of values is [1-10]. The default value is 3.

-l | --log-level (string)

Application log level. One of: DEBUG, INFO, WARNING, ERROR, CRITICAL. The default value is INFO.

-c | --category (string)

Category of artwork to fetch. One of: abstract, figurative, landscape, religion, mythology, posters, animals, illustration, still-life, botanical, drawings, asian-art. May be repeatedly used to specify multiple categories (-c animals, -c drawings). The default value is ALL categories.

Optional log file arguments

--log-dir (string)

Path to existing directory used to store artvee_scraper.log log files. Disabled by default.

--log-max-size (integer)

Maximum size in MB the log file should reach before triggering a rollover. Only applies if --log-dir has been specified. Range of values is [1-10240]. The default value is 1024MB (1GB).

--log-max-backups (integer)

Maximum number of log file archives to keep. Only applies if --log-dir has been specified. The actively written file is artvee_scraper.log. Backup files will have an incrementing numerical suffix; artvee_scraper.log.1 ... artvee_scraper.log.N. If this value is zero, rollovers will be disabled. Range of values is [0-100]. The default value is 10.

Optional writer arguments

--space-level (integer)

Pretty print JSON; number of spaces to indent. Range of values is [2-6]. Disabled by default.

--sort-keys (boolean)

Sort JSON keys in alphabetical order. Disabled by default.

--include-image (boolean)

Image will be included in output. Excessive output warning! Disabled by default.

Basic Example
$ artvee-scraper-cli log-json
Output:
  ...
2038-01-19 18:34:38.941 INFO [ThreadPoolExecutor-0_0] runner.<lambda>(79) | Processing 'Komposition' by Otto Freundlich
2038-01-19 18:34:38.943 INFO [ThreadPoolExecutor-0_0] log_writer.write(45) | {"url": "https://artvee.com/dl/komposition-2/", "resource": "komposition-2", "title": "Komposition", "category": "Abstract", "artist": "Otto Freundlich", "date": "1938", "origin": "German, 1878-1943", "image": {"source_url": "https://mdl.artvee.com/sdl/102399absdl.jpg", "width": 1423, "height": 1800, "file_size": 1.1, "file_size_unit": "MB", "format_name": "jpg"}}
  ...
Advanced Example
$ artvee-scraper-cli log-json --worker-threads 2 --log-level DEBUG --category abstract --log-dir /var/log/artvee --log-max-size 2048 --log-max-backups 10 --space-level 2 --sort-keys --include-image
Output:
$ cat /var/log/artvee/artvee_scraper_cli.log
  ...
2038-01-19 18:40:11.772 DEBUG [ThreadPoolExecutor-0_0] artvee_client.get_image(132) | Retrieving image; url=https://mdl.artvee.com/sdl/105042absdl.jpg
2038-01-19 18:40:11.772 DEBUG [ThreadPoolExecutor-0_0] connectionpool._new_conn(1051) | Starting new HTTPS connection (1): mdl.artvee.com:443
2038-01-19 18:40:11.853 DEBUG [ThreadPoolExecutor-0_0] connectionpool._make_request(546) | https://mdl.artvee.com:443 "GET /sdl/105042absdl.jpg HTTP/11" 200 2011451
2038-01-19 18:40:11.941 INFO [ThreadPoolExecutor-0_0] runner.<lambda>(79) | Processing 'Gare' by Joaquín Torres-García
2038-01-19 18:40:11.967 INFO [ThreadPoolExecutor-0_0] log_writer.write(45) | {
  "artist": "Joaquín Torres-García",
  "category": "Abstract",
  "date": "1928",
  "image": {
    "file_size": 1.92,
    "file_size_unit": "MB",
    "format_name": "jpg",
    "height": 1259,
    "raw": "/9j/4AAQSkZJRgABA ... o4xSSSVkumh//9k=",
    "source_url": "https://mdl.artvee.com/sdl/105042absdl.jpg",
    "width": 1800
  },
  "origin": "Uruguayan, 1874-1949",
  "resource": "gare",
  "title": "Gare",
  "url": "https://artvee.com/dl/gare/"
}
  ...

file-json

Download artwork and write each to the filesystem. Each artwork is stored as a JSON object.

$ artvee-scraper-cli file-json [optional arguments] <dir_path>
Positional arguments

dir_path (string) Position 0.

Path to existing directory used to store output files.

Optional arguments

-h | --help (boolean)

Display help message.

-t | --worker-threads (integer)

The number of worker threads used for processing. Range of values is [1-10]. The default value is 3.

-l | --log-level (string)

Application log level. One of: DEBUG, INFO, WARNING, ERROR, CRITICAL. The default value is INFO.

-c | --category (string)

Category of artwork to fetch. One of: abstract, figurative, landscape, religion, mythology, posters, animals, illustration, still-life, botanical, drawings, asian-art. May be repeatedly used to specify multiple categories (-c animals, -c drawings). The default value is ALL categories.

Optional log file arguments

--log-dir (string)

Path to existing directory used to store artvee_scraper.log log files. Disabled by default.

--log-max-size (integer)

Maximum size in MB the log file should reach before triggering a rollover. Only enabled if --log-dir has been specified. Range of values is [1-10240]. The default value is 1024MB (1GB).

--log-max-backups (integer)

Maximum number of log file archives to keep. Only enabled if --log-dir has been specified. The actively written file is artvee_scraper.log. Backup files will have an incrementing numerical suffix; artvee_scraper.log.1 ... artvee_scraper.log.N. If this value is zero, rollovers will be disabled. Range of values is [0-100]. The default value is 10.

Optional writer arguments

--space-level (integer)

Pretty print JSON; number of spaces to indent. Range of values is [2-6]. Disabled by default.

--sort-keys (boolean)

Sort JSON keys in alphabetical order. Disabled by default.

--overwrite-existing (boolean)

Allow existing duplicate files to be overwritten. Disabled by default.

Basic Example
$ artvee-scraper-cli file-json ~/artvee/downloads
Output:
$ cat ~/artvee/downloads/woman-by-the-window.json
{"url": "https://artvee.com/dl/woman-by-the-window/", "resource": "woman-by-the-window", "title": "Woman by the window", "category": "Abstract", "artist": "Mikuláš Galanda", "date": "1928", "origin": "Slovak, 1895 – 1938", "image": {"source_url": "https://mdl.artvee.com/sdl/101518absdl.jpg", "width": 1317, "height": 1800, "file_size": 2.48, "file_size_unit": "MB", "raw": "/9j/4AAQSkZJRgAB ... aK1lZLTp7i/Vn//Z", "format_name": "jpg"}}
Advanced Example
$ artvee-scraper-cli file-json ~/artvee/downloads --worker-threads 1 --log-level INFO --category mythology --log-dir /var/log/artvee --log-max-size 512 --log-max-backups 10 --space-level 4 --sort-keys --overwrite-existing
Output:
$ cat ~/artvee/downloads/the-judgment-of-paris-3.json
{
    "artist": "Joachim Wtewael",
    "category": "Mythology",
    "date": "1602",
    "image": {
        "file_size": 7.42,
        "file_size_unit": "MB",
        "format_name": "jpg",
        "height": 2138,
        "raw": "/9j/4R8FRXhpZgAASUkq ... /pNfu/+89V/wB46//Z",
        "source_url": "https://mdl.artvee.com/sdl/400408mtsdl.jpg",
        "width": 2833
    },
    "origin": "Dutch, 1566 - 1638",
    "resource": "the-judgment-of-paris-3",
    "title": "The Judgment of Paris",
    "url": "https://artvee.com/dl/the-judgment-of-paris-3/"
}

file-multi

Download artwork and write each to the filesystem. Each artwork is stored as two files: metadata (JSON) & image (JPG).

$ artvee-scraper-cli file-multi [optional arguments] <metadata_dir_path> <image_dir_path>
Positional arguments

metadata_dir_path (string) Position 0.

Path to existing directory used to store output metadata files.

image_dir_path (string) Position 1.

Path to existing directory used to store output image files.

Optional arguments

-h | --help (boolean)

Display help message.

-t | --worker-threads (integer)

The number of worker threads used for processing. Range of values is [1-10]. The default value is 3.

-l | --log-level (string)

Application log level. One of: DEBUG, INFO, WARNING, ERROR, CRITICAL. The default value is INFO.

-c | --category (string)

Category of artwork to fetch. One of: abstract, figurative, landscape, religion, mythology, posters, animals, illustration, still-life, botanical, drawings, asian-art. May be repeatedly used to specify multiple categories (-c animals -c drawings). The default value is ALL categories.

Optional log file arguments

--log-dir (string)

Path to existing directory used to store artvee_scraper.log log files. Disabled by default.

--log-max-size (integer)

Maximum size in MB the log file should reach before triggering a rollover. Only enabled if --log-dir has been specified. Range of values is [1-10240]. The default value is 1024MB (1GB).

--log-max-backups (integer)

Maximum number of log file archives to keep. Only enabled if --log-dir has been specified. The actively written file is artvee_scraper.log. Backup files will have an incrementing numerical suffix; artvee_scraper.log.1 ... artvee_scraper.log.N. If this value is zero, rollovers will be disabled. Range of values is [0-100]. The default value is 10.

Optional writer arguments

--space-level (integer)

Pretty print JSON; number of spaces to indent. Range of values is [2-6]. Disabled by default.

--sort-keys (boolean)

Sort JSON keys in alphabetical order. Disabled by default.

--overwrite-existing (boolean)

Allow existing duplicate files to be overwritten. Disabled by default.

Basic Example
$ artvee-scraper-cli file-multi ~/artvee/downloads/metadata ~/artvee/downloads/images
Output:
$ cat ~/artvee/downloads/metadata/the-pet-pig.json
{"url": "https://artvee.com/dl/the-pet-pig/", "resource": "the-pet-pig", "title": "The pet pig", "category": "Abstract", "artist": "Edvard Munch", "date": "1908-1910", "origin": "Norwegian, 1863 - 1944", "image": {"source_url": "https://mdl.artvee.com/sdl/103755absdl.jpg", "width": 1800, "height": 1320, "file_size": 1.67, "file_size_unit": "MB", "format_name": "jpg"}}
$ hexdump -C ~/artvee/downloads/images/the-pet-pig.jpg
00000000  ff d8 ff e0 00 10 4a 46  49 46 00 01 01 01 01 2c  |......JFIF.....,|
  ...
001aa430  40 2b 9c 02 8a 2b 48 b6  d6 bd ff 00 c8 0f ff d9  |@+...+H.........|
001aa440
Advanced Example
$ artvee-scraper-cli file-multi --worker-threads 1 --log-level INFO --category asian-art --log-dir /var/log/artvee --log-max-size 512 --log-max-backups 10 --space-level 2 --sort-keys --overwrite-existing ~/artvee/downloads/metadata ~/artvee/downloads/images
Output:
$ cat ~/artvee/downloads/metadata/two-ronin-looking-into-yoshiwara.json
{
  "artist": "Andō Hiroshige",
  "category": "Asian-art",
  "date": "19th century",
  "image": {
    "file_size": 2.29,
    "file_size_unit": "MB",
    "format_name": "jpg",
    "height": 1179,
    "source_url": "https://mdl.artvee.com/sdl/52015jpsdl.jpg",
    "width": 1800
  },
  "origin": "Japanese, 1797 – 1858",
  "resource": "two-ronin-looking-into-yoshiwara",
  "title": "Two Ronin Looking into Yoshiwara",
  "url": "https://artvee.com/dl/two-ronin-looking-into-yoshiwara/"
}
$ hexdump -C ~/artvee/downloads/images/two-ronin-looking-into-yoshiwara.jpg
00000000  ff d8 ff e0 00 10 4a 46  49 46 00 01 01 01 01 2c  |......JFIF.....,|
  ...
002499c0  a2 b4 fe bf ad cc 4f ff  d9                       |......O..|
002499c9

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

artvee-scraper-cli-1.0.1.tar.gz (16.7 kB view details)

Uploaded Source

Built Distribution

artvee_scraper_cli-1.0.1-py3-none-any.whl (17.7 kB view details)

Uploaded Python 3

File details

Details for the file artvee-scraper-cli-1.0.1.tar.gz.

File metadata

  • Download URL: artvee-scraper-cli-1.0.1.tar.gz
  • Upload date:
  • Size: 16.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for artvee-scraper-cli-1.0.1.tar.gz
Algorithm Hash digest
SHA256 f8cee9ebdfc91843b24334365701dc713dcfbca0ba93f296cbed0658938d8e65
MD5 fe3c91e5eee5651c6e847a85bf238917
BLAKE2b-256 b2587bcfcc141ba8eecb22129e26bdf86bd02796d83c8c0a55821d279c9e9203

See more details on using hashes here.

File details

Details for the file artvee_scraper_cli-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for artvee_scraper_cli-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 111d3a04cc70097e5b21c2fd85f8564cc5f97ab96b05fadf2b6126a0b9dbb7ce
MD5 22c649fc9c3003e96d9bd8d7d9d1137f
BLAKE2b-256 2cec3e6af44d51467a1fe5971aea72412ed56dca3b1495d10ab8b3b0211aa205

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page