Image Scraper for Google Drive, Imgur, AsiaChan, and more.
Project description
ImageURlScraper
ImageURLScraper is a multi-site image scraper. It automatically detects which site the image is coming from and scrapes it. Only relevant images are scraped from the site and shortened links are automatically unshortened. In the case that you have many links that need to be processed, these links can be distinguished by IDs when requesting the image links.
Currently Supported Sites:
Asiachan - Checks all previous and next pages from it's current location.
Google Drive - Checks all folders and grabs the first 1000 images in each folder.
Imgur - Grabs all images in a gallery.
Installation
In a terminal, type pip install imageurlscraper
.
In order to scrape images from Google Drive, the credentials are needed.
Steps to add Google Drive credentials:
Go to https://console.developers.google.com/apis/dashboard and at the top click + ENABLE APIS AND SERVICES
.
Next, search for Google Drive API
, click it, and then click Enable.
Select a project and then you'll be on a page with your project.
You will see a notice: "To use this API, you may need credentials. Click 'Create credentials' to get started.".
Go ahead and click Create Credentials
.
You will be requested information on the type of credentials you need.
For the API, select Google Drive API
, and select Other UI
for where the API will be called from.
For the data you will be accessing, select Application data.
After that, create a service account in the 2nd field. Have the role as project owner
and make sure the Key type is JSON
.
Get your credentials and rename the JSON file to credentials.JSON
Go to the project source (if you installed by pip, go to a terminal and type pip show imageurlscraper
)
Put the credentials.json
in the same folder as the main.py
.
Sample Code
"""
This sample code links directly to the main function that automatically processes the links
and returns back a dict with IDs and their image links. The original link will not be shown,
which is why IDs are useful.
IDs are REQUIRED input alongside their links, although they are only for classifying links.
Links can have several IDs if necessary to group them together.
"""
import imageurlscraper
import pprint
pp = pprint.PrettyPrinter(indent=4)
list_of_links = [
# the list must contain an ID along with a link
# This ID is helpful for distinguishing certain objects or people.
# When the dict is returned.
[0, "https://kpop.asiachan.com/222040"],
[1, 'https://imgur.com/a/mEUURoG'],
[2, 'https://bit.ly/36GWd2A'],
[3, 'http://imgur.com/a/jRcrF'],
# [999, 'https://drive.google.com/drive/folders/1uWIObdgq65-TmBcA8oJIWOnbuuR_H5PB']
# This google drive folder has a lot of media and will be skipped for testing purposes. but it can support
# google drive links like these and will go through every folder in that folder.
]
scraper = imageurlscraper.main.Scraper()
all_images = scraper.run(list_of_links) # a dict with all the links of the images.
pp.pprint(all_images)
Expected Output (dict)
{ 1: [ 'https://i.imgur.com/RUb6Xwl.jpg',
...],
3: [ 'https://i.imgur.com/ILixI73.jpg',
...],
4: [ 'https://i.imgur.com/X8jZOc7.jpg',
...],
5: [ 'https://i.imgur.com/L4SFme0.jpg',
...],
6: [ 'https://i.imgur.com/G2ltCDf.jpg',
...],
204: [ 'https://static.asiachan.com/Lee.Jueun.full.222040.jpg',
...]
}
More Samples
import imageurlscraper
scraper = imageurlscraper.main.Scraper()
shortened_link = "https://bit.ly/311n6vP"
unshortened_link = scraper.get_main_link(shortened_link) # Expected Output -> http://google.com/
# Want to process links one by one or do not want to use IDs?
link = "https://imgur.com/a/mEUURoG"
image_links = scraper.process_source(link) # Expected Output -> A LIST of image links.
# Want to run from the sources directly?
images = imageurlscraper.asiachan.AsiaChan().get_all_image_links(link) # Asiachan, expected output -> A LIST of image links.
images = imageurlscraper.googledrive.DriveScraper().get_links(link) # Google Drive, expected output -> A LIST of image links.
images = imageurlscraper.imgur.MediaScraper().start(link) # Imgur, expected output -> A LIST of image links.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file imageurlscraper-1.0.0.tar.gz
.
File metadata
- Download URL: imageurlscraper-1.0.0.tar.gz
- Upload date:
- Size: 7.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 138e04560918d97e7e3d046bc53c98cf16afa222f9defcb908096598b3ece18e |
|
MD5 | 308772d175bbf6e401711a09d5ed2053 |
|
BLAKE2b-256 | 9b1abde0109fc413ea85b8fbb7097b0928d46b8ca94a1f9c94984bb923ef1e0a |
File details
Details for the file imageurlscraper-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: imageurlscraper-1.0.0-py3-none-any.whl
- Upload date:
- Size: 11.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5a9b349fae06a949319f2674b8d80408a9443f99533da5b154ceb1e292a18eba |
|
MD5 | bca8e492b4590f685ea0eb9ba19a8ae8 |
|
BLAKE2b-256 | 7b3a2241565a3aa7328f56810f63d451fbec98528c4c9ab21a2634fda1c57649 |