WebScraping_Instagram

Project description

WebScraping_Instagram_igpicker

Date for record: 6th October 2019 by Kenneth Hau

Renamed the package from igenemy to igpicker

Updated: Hashtag combination, Starting post

About

This is used to scrape images / videos from Instagam by using chrome driver.

By setting those parameters, you can easily scrape either images or videos or both as well as select your designated path to save them.

It will automatically create folders in a location where you stated in 'save_to_path'. Those folders are named by each username and hashtag.

Before scraping, you will be informed to login your IG account in order to smoothen the scraping process. Don't worry, it wouldn't store your username and password.

Please follow the instructions below to install chrome driver on Colab

!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
!apt-get update
set chromedriver_path = 'chromedriver'
set chrome_headless = True

Install

pip install igpicker

Upgrade

pip install igpicker --upgrade

Library used

selenium, bs4, time, getpass, IPython, urllib, os, re, tqdm, wget, ssl

Reminder

Sometimes it may not run properly after an intensive scraping. Please wait for a while and start your scraping journey again.

Limitations

Only allows scraping either by 'username' or 'hashtag' at the same time (but you can easily change 'target_is_hashtag' parameter after finishing your first scraping)
Only allows chromedriver
Only allows to set the total number of posts you want

Possible function that can be created in the future

Store each post's information (e.g. like, post time, post location, post description, users' number of followers, etc.) into dataframes, or even consolidate them into databases. Therefore, they can be used to do descriptive analysis, train up machine learning models or build up a recommendation system.

Parameters & Attributes

(1) target : A list of string(s), default: []

either target username(s) or hashtag(s), if they are hashtags, 'target_is_hashtag' must be set to True

(2) target_is_hashtag : Boolean, default: False

True: you want to scrape by using hashtags

(3) chromedriver_path : String, default: './chromedriver'

a path of your chrome driver, you should name your driver as 'chromedrive'

(4) save_to_path : String, default: '.'

a path where the image(s) / video(s) will be saved into

(5) chromedriver_autoquit : Boolean, default: True

True: automatically quit the driver after finishing the scraping
if you don't want it, you can quit the driver manually by using a build-in function called 'close_driver'

(6) chrome_headless : Boolean, default: True

True: run chrome driver in the backend
if you want to see how the chrome driver works, you can set it to False

(7) save_img : Boolean, default: True

True: save images

(8) save_video : Boolean, default: False

True: save videos

(9) enable_gpu : Boolean, default: False

True: enable gpu in chrome driver

(10) ipython_display_image : Boolean, default: False

True: display images, only works in notebook but not terminal
if you set True when using terminal to display, it will fail to scrape images

Methods

(1) login : no parameter is required, return chrome_driver

used to access Instagram

(2) scraper : two parameters (chrome_driver, num_post), return a list of all targeted url

(a) chrome_driver : Selenium Webdriver
   - used for web scraping

(b) num_post : int, default: 10
   - the total number of posts you want to scrape
   - if this number is beyond the actual number of posts, it will stop scraping automatically

(c) start_from: unsigned int, default: 1
   - the start post of scraping

(d) hashtag_combination: list, default: []
   - only scrap posts matching all designated hashtags

(3) close_driver : one parameters (chrome_driver)

manually close the web driver

Import Library

from igpicker import IGpicker

Example 1 (Normal flow):

igpicker = IGpicker(target = ['hkfoodtalk', 'sportscenter'], target_is_hashtag = False, chromedriver_path= './chromedriver',
         save_to_path = './', chromedriver_autoquit = False,
         chrome_headless= True, save_img=True, save_video=False, enable_gpu = False, 
         ipython_display_image = True)

chrome_driver = igpicker.login()

all_target = igpicker.scraper(chrome_driver = chrome_driver, num_post = 10, start_from = 1)

igpicker.close_driver(chrome_driver) #manually close if 'chromedriver_autoquit' is False

Example 2 (Hashtag Combination):

igpicker = IGpicker(target = ['pizza'], target_is_hashtag = True, chromedriver_path= 'chromedriver',
       save_to_path = './', chromedriver_autoquit = False,
       chrome_headless= True, save_img=True, save_video=True, enable_gpu = False, 
       ipython_display_image = False)

chrome_driver = igpicker.login()

all_target = igpicker.scraper(chrome_driver = chrome_driver, num_post = 10, start_from = 10, hashtag_combination= ['cheesy', 'cheese'])

Example 3 (Change attributes):

igpicker.save_to_path = '../' #change path

igpicker.target = ['burger', 'hkmusic','pasta'] #change target

igpicker.target_is_hashtag = True #is it "hashtag" page?

igpicker.save_video = True

igpicker.save_img = True

all_target = igpicker.scraper(chrome_driver = chrome_driver, num_post = 10)

You can run 'scraper' again after you change the parameters if you haven't closed the chrome driver.

Project details

Release history Release notifications | RSS feed

This version

0.4.4

Oct 11, 2019

0.4.3

Oct 7, 2019

0.4.2

Oct 6, 2019

0.4.1

Oct 6, 2019

0.4.0

Oct 6, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

igpicker-0.4.4.tar.gz (9.4 kB view details)

Uploaded Oct 11, 2019 Source

Built Distribution

igpicker-0.4.4-py3-none-any.whl (10.9 kB view details)

Uploaded Oct 11, 2019 Python 3

File details

Details for the file igpicker-0.4.4.tar.gz.

File metadata

Download URL: igpicker-0.4.4.tar.gz
Upload date: Oct 11, 2019
Size: 9.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3

File hashes

Hashes for igpicker-0.4.4.tar.gz
Algorithm	Hash digest
SHA256	`cf724c9501b5fbb1d9556f525d21b87e38252479ab2bd611550bcaf9575ba526`
MD5	`24b10deaa5595b2e8d6eff29cea5c6b7`
BLAKE2b-256	`fe1c69c6616362dad7b03d9f47b652c19b16d1bf32576d4ea753dd5538b63d26`

See more details on using hashes here.

File details

Details for the file igpicker-0.4.4-py3-none-any.whl.

File metadata

Download URL: igpicker-0.4.4-py3-none-any.whl
Upload date: Oct 11, 2019
Size: 10.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3

File hashes

Hashes for igpicker-0.4.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ff35e232d53f30a58cb21072e854c4be1d20e127c85bd7cfa1a532b2ea6a50d4`
MD5	`e3658d9ea6dbfd57c881507811d55e6d`
BLAKE2b-256	`9d3a40b2f52e9e6e80776ea489cef71fc1a531601dc5c6ed7e43701a02507e32`

See more details on using hashes here.

igpicker 0.4.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

WebScraping_Instagram_igpicker

About

Please follow the instructions below to install chrome driver on Colab

Install

Upgrade

Library used

Reminder

Limitations

Possible function that can be created in the future

Parameters & Attributes

Methods

Import Library

Example 1 (Normal flow):

Example 2 (Hashtag Combination):

Example 3 (Change attributes):

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes