Super lightweight Instagram web scraper for data analysis
Project description
instascrape: powerful Instagram data scraping toolkit
What is it?
instascrape is a lightweight Python package that provides expressive and flexible tools for scraping Instagram data. It is geared towards being a high-level building block on the data scientist's toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.
Key features
Here are a few of the things that instascrape
does well:
- Powerful, object-oriented scraping tools for profiles, posts, hashtags, reels, and IGTV
- Scrapes HTML, BeautifulSoup, and JSON
- Download content to your computer as png, jpg, mp4, and mp3
- Dynamically retrieve HTML embed code for posts
- Expressive and consistent API for concise and elegant code
- Designed for seamless integration with Selenium, Pandas, and other industry standard tools for data collection and analysis
- Lightweight; no boilerplate or configurations necessary
- The only hard dependencies are Requests and Beautiful Soup
- Proven to work as of December, 2020
Table of Contents
:computer: Installation
Minimum Python version
This library currently requires Python 3.7 or higher.
pip
Install from PyPI using
$ pip3 install insta-scrape
WARNING: make sure you install insta-scrape and not a package with a similar name!
:mag_right: Sample Usage
All top-level, ready-to-use features can be imported using:
from instascrape import *
instascrape uses clean, consistent, and expressive syntax to make the developer experience as painless as possible.
# Instantiate the scraper objects
google = Profile('https://www.instagram.com/google/')
google_post = Post('https://www.instagram.com/p/CG0UU3ylXnv/')
google_hashtag = Hashtag('https://www.instagram.com/explore/tags/google/')
# Scrape their respective data
google.scrape()
google_post.scrape()
google_hashtag.scrape()
After being scraped, relevant attributes can be accessed with dot or bracket notation
print(google.followers)
print(google_post['hashtags'])
print(google_hashtag.amount_of_posts)
>>> 12262794
>>> ['growwithgoogle']
>>> 9053408
:books: Documentation
The official documentation can be found on Read The Docs
:newspaper: Blog Posts
Check out blog posts on the official site or DEV for ideas and tutorials!
- Scrape data from Instagram with instascrape
- Visualizing Instagram engagement with instascrape
- Exploratory data analysis of Instagram using instascrape and Python
- Creating a scatter matrix of Instagram data using Python
- Downloading an Instagram profile's recent photos using Python
- Scraping 25,000 data points from Joe Biden's Instagram using instascrape
- Compare major tech Instagram page's with instascrape
- Tracking an Instagram posts engagement in real time with instascrape
- Dynamically generate embeddable Instagram HTML with instascrape
- Scraping an Instagram location tag with instascrape
- Scraping Instagram reels with instascrape
- Scraping IGTV data with instascrape
- Scraping 10,000 data points from Donald Trump's Instagram with Python
:pray: Contributing
All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome!
Feel free to open an Issue, check out existing Issues, or start a discussion.
Beginners to open source are highly encouraged to participate and ask questions if you're unsure what to do/where to start :heart:
:spider_web: Dependencies
:credit_card: License
This library operates under the MIT license.
:grey_question: Support
Check out the FAQ
Reach out to me if you want to connect or have any questions!
- Email:
- Twitter:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.