Skip to main content

Scraper for the site danbooru

Project description

danbooru-scraper

yet another danbooru scraper, this time distributed for sagemaker use

Installation:

pip install danbooru-scraper

Usage

cli:

# danbooru-scraper --help
usage: danbooru-scraper [-h] --from-id FROM_ID --to-id TO_ID
                        --local-dir LOCAL_DIR --upload-dir UPLOAD_DIR

example inputs:

danbooru-scraper --from-id 8627380 --to-id 8627391 --local-dir danbooru_downloads --upload-dir s3://dataset-ingested/danbooru

python:

from danbooru_scraper import DanbooruScraper

scraper = DanbooruScraper(root_dir='../data/')
post_ids = [i for i in range(1000, 10000)]
scraper.scrape_posts(post_ids)

SageMaker:

(Check notebooks/laucch_sagemaker.ipynb for a complete example of distributed scraping on sagemaker.)

Build:

python -m pip install build twine
python -m build
twine check dist/*
twine upload dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

danbooru_scraper-0.1.1.tar.gz (3.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

danbooru_scraper-0.1.1-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file danbooru_scraper-0.1.1.tar.gz.

File metadata

  • Download URL: danbooru_scraper-0.1.1.tar.gz
  • Upload date:
  • Size: 3.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.12

File hashes

Hashes for danbooru_scraper-0.1.1.tar.gz
Algorithm Hash digest
SHA256 be1f22e65f2eed0c076cbdf1a25f5d3a509d9bef7676022167272adc251f480f
MD5 d95d684ff40dc5f6b64cb2a13544569c
BLAKE2b-256 55fb4df4152293749d64fcc90787165a0dfbd1914f9202819256c2e786e01d3a

See more details on using hashes here.

File details

Details for the file danbooru_scraper-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for danbooru_scraper-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ec13b21a749bdd81fac57f8e8bbb9206ec815e684708fc5bab513a238d1b7c4d
MD5 118e98727dfa1703a6f6f94d7535faff
BLAKE2b-256 5ddbf381cd7c319dbbdcf8fc373a54659d57bd4d942af36933abf259702b98d5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page