A rather customizable image crawler structure, designed to download images with their information using multi-threading method. Besides, several wheels have been implemented to help better build a custom image crawler for yourself.

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Developers
- Science/Research
Programming Language
- Python :: 3
Topic
- Internet

Project description

Image Crawler Utils

A Customizable Multi-station Image Crawler Structure

English | 简体中文

About

Click Here for Documentation

A rather customizable image crawler structure, designed to download images with their information using multi-threading method. This GIF depicts a sample run:

Besides, several classes and functions have been implemented to help better build a custom image crawler for yourself.

Please follow the rules of robots.txt, and set a low number of threads with high number of delay time when crawling images. Frequent requests and massive download traffic may result in IP addresses being banned or accounts being suspended.

Installing

It is recommended to install it by

pip install image-crawler-utils

Requires Python >= 3.9.

Attentions!

nodriver is used to parse information from certain websites. It is suggested to install the latest version of Google Chrome first to ensure the crawler will be correctly running.

Features

Currently supported websites:
- Danbooru - features supported:
  - Downloading images searched by tags
- yande.re / konachan.com / konachan.net - features supported:
  - Downloading images searched by tags
- Gelbooru - features supported:
  - Downloading images searched by tags
- Safebooru - features supported:
  - Downloading images searched by tags
- Pixiv - features supported:
  - Downloading images searched by tags
  - Downloading images uploaded by a certain member
- Twitter / X - features supported:
  - Downloading images from searching result
  - Downloading images uploaded by a certain user
Logging of crawler process onto the console and (optional) into a file.
Using rich bars and logging messages to denote the progress of crawler (Jupyter Notebook support is included).
Save or load the settings and configs of a crawler.
Save or load the information of images for future downloading.
Acquire and manage cookies of some websites, including saving and loading them.
Several classes and functions for custom image crawler designing.

Example

Running this example will download the first 20 images from Danbooru with keyword / tag kuon_(utawarerumono) and rating:general into the "Danbooru" folder. Information of images will be stored in image_info_list.json at same the path of your program. Pay attention that the proxies may need to be changed manually.

from image_crawler_utils import CrawlerSettings, Downloader, save_image_infos
from image_crawler_utils.stations.booru import DanbooruKeywordParser

#======================================================================#
# This part prepares the settings for crawling and downloading images. #
#======================================================================#

crawler_settings = CrawlerSettings(
    image_num=20,
    # If you do not use system proxies, remove '#' and set the proxies manually.
    # proxies={"https": "socks5://127.0.0.1:7890"},
)

#==================================================================#
# This part gets the URLs and information of images from Danbooru. #
#==================================================================#

parser = DanbooruKeywordParser(
    crawler_settings=crawler_settings,
    standard_keyword_string="kuon_(utawarerumono) AND rating:general",
)
image_info_list = parser.run()
# The information will be saved at image_info_list.json
save_image_infos(image_info_list, "image_info_list")

#===================================================================#
# This part downloads the images according to the image information #
# just collected in the image_info_list.                            #
#===================================================================#

downloader = Downloader(
    store_path='Danbooru',
    image_info_list=image_info_list,
    crawler_settings=crawler_settings,
)
downloader.run()

Project details

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Developers
- Science/Research
Programming Language
- Python :: 3
Topic
- Internet

Release history Release notifications | RSS feed

0.4.6

Apr 1, 2026

0.4.5

Jul 8, 2025

0.4.4

Jun 25, 2025

This version

0.4.3

Jun 17, 2025

0.4.2

Jun 17, 2025

0.4.1

Jun 17, 2025

0.4.0

Jun 17, 2025

0.3.2

Apr 12, 2025

0.3.1

Apr 11, 2025

0.3.0

Apr 11, 2025

0.2.6

Apr 11, 2025

0.2.5

Apr 11, 2025

0.2.4

Apr 7, 2025

0.2.3

Mar 19, 2025

0.2.2

Feb 19, 2025

0.2.0

Jan 24, 2025

0.1.9

Jan 23, 2025

0.1.8

Jan 15, 2025

0.1.7

Jan 2, 2025

0.1.6

Dec 1, 2024

0.1.5

Nov 30, 2024

0.1.4

Nov 30, 2024

0.1.3

Nov 29, 2024

0.1.2

Nov 29, 2024

0.1.1

Nov 29, 2024

0.1.0

Nov 28, 2024

0.0.5

Nov 26, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

image_crawler_utils-0.4.3.tar.gz (85.6 kB view details)

Uploaded Jun 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

image_crawler_utils-0.4.3-py3-none-any.whl (115.2 kB view details)

Uploaded Jun 17, 2025 Python 3

File details

Details for the file image_crawler_utils-0.4.3.tar.gz.

File metadata

Download URL: image_crawler_utils-0.4.3.tar.gz
Upload date: Jun 17, 2025
Size: 85.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for image_crawler_utils-0.4.3.tar.gz
Algorithm	Hash digest
SHA256	`5787846e633c4f5a2cfd4a0d7d27440e953f260a8b0d082ffc847bcb0e67401b`
MD5	`01f0a74651e44069c478abff53db622e`
BLAKE2b-256	`13ce2ce464601aa63228fb200b22419ef42ae86ff8d2903c10fcab507bd693fe`

See more details on using hashes here.

File details

Details for the file image_crawler_utils-0.4.3-py3-none-any.whl.

File metadata

Download URL: image_crawler_utils-0.4.3-py3-none-any.whl
Upload date: Jun 17, 2025
Size: 115.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for image_crawler_utils-0.4.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9430c961e47de9fcef5c65c54a7925f76d2bc2cab2874f0754cc6876036044a8`
MD5	`58d49cc12a7c71ca7f2f96d43438d3ab`
BLAKE2b-256	`c6c56a96840558b9e18d267d5cd8a103e9e78ce5cbfa56f85ee658f9aea76ccb`

See more details on using hashes here.

image-crawler-utils 0.4.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Image Crawler Utils

A Customizable Multi-station Image Crawler Structure

About

Click Here for Documentation

Installing

Attentions!

Features

Example

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes