Super lightweight Instagram web scraper for data analysis
Project description
instascrape: Instagram scraping for humans
What is it?
instascrape is a powerful, lightweight library for scraping Instagram data without using their API. It is designed with flexibility and developer productivity in mind so you can stop wasting valuable time collecting data and just start analyzing!
Key features
-
:walking: Static HTML scrapers
Profile
: scrapes 50 data points from a profile- follower count
- recent posts
- verification status
- etc.
Post
: scrapes almost 50 data points from a post- likes
- amount of comments
- hashtags
- etc.
Hashtag
: scrapes over a dozen data points from a hashtag- amount of posts
- recent posts
- featured picture URL
- etc.
-
:floppy_disk: Download post media locally as png, jpg, mp4, and mp3
-
:musical_score: Expressive and consistent API for concise and elegant code
-
:bar_chart: Designed for seamless integration with
and other industry standard libraries for powerful data analysis
-
:hammer: Lightweight: you don't have to build a hammer factory when all you need is a hammer
-
:spider_web: The only hard dependencies are
Table of Contents
- :computer: Installation
- :mag_right: Sample Usage
- :books: Documentation
- :newspaper: Blog Posts
- :pray: Contributing
- :spider_web: Dependencies
- :credit_card: License
- :grey_question: Support
:computer: Installation
Minimum Python version
This library currently requires Python 3.7 or higher.
pip
Install from PyPI using
$ pip3 install insta-scrape
WARNING: make sure you install insta-scrape and not a package with a similar name!
:mag_right: Sample Usage
All top-level, ready-to-use features can be imported using:
from instascrape import *
instascrape uses clean, consistent, and expressive syntax to make the developer experience as painless as possible.
# Instantiate the scraper objects
google = Profile('https://www.instagram.com/google/')
google_post = Post('https://www.instagram.com/p/CG0UU3ylXnv/')
google_hashtag = Hashtag('https://www.instagram.com/explore/tags/google/')
# Load their respective data
google.load()
google_post.load()
google_hashtag.load()
After being scraped, relevant attributes can be accessed with dot (.) or bracket ([]) notation
print(google.followers)
print(google_post['hashtags'])
print(google_hashtag.amount_of_posts)
>>> 12262794
>>> ['growwithgoogle']
>>> 9053408
:books: Documentation
The official documentation can be found on Read The Docs :newspaper:
:newspaper: Blog Posts
Check out blog posts on DEV for ideas and tutorials!
- Scrape data from Instagram with instascrape
- Visualizing Instagram engagement with instascrape
- Exploratory data analysis of Instagram using instascrape and Python
- Creating a scatter matrix of Instagram data using Python
- Downloading an Instagram profile's recent photos using Python
:pray: Contributing
All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome!
Feel free to open an Issue or look at existing Issues to get a dialogue going on what you want to see added/changed/fixed.
Beginners to open source are highly encouraged to participate and ask questions :heart:
:spider_web: Dependencies
Instascrape primarily relies on two third-party libraries for requesting and scraping Instagram HTML content:
- Requests: HTTP requests
- BeautifulSoup: Scraping and parsing HTML data.
The rest of its functionality is provided directly from Python 3's standard library for unobtrusive code under the hood with little to no overhead.
:credit_card: License
:grey_question: Support
Reach out to me if you have questions or ideas!
- Email:
- Instagram:
- Twitter:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.