Skip to main content

Professional Instagram data collection toolkit with automation features

Project description

InstaHarvest ๐ŸŒพ

Python Version PyPI version License Downloads GitHub stars

Professional Instagram Data Collection Toolkit โ€” Powerful library for Instagram automation, data collection, and analytics built on Playwright.

๐Ÿ“– Documentation | ๐Ÿ› Report Bug | ๐Ÿ’ก Request Feature | ๐Ÿ“‹ Changelog


โœจ Features

Category Capabilities
๐Ÿ“Š Profile Stats, verified badge, category, full bio, external links, Threads
๐Ÿ”Œ Web API 16+ JSON endpoints โ€” profiles, followers, feed, comments, reels, stories, hashtags
๐Ÿ“ธ Content Posts, Reels, Stories, Highlights, Tagged Posts โ€” with JSON-first architecture
๐Ÿ’ฌ Engagement Comments (with replies), likes, media download (images/videos via yt-dlp)
๐Ÿ‘ฅ Social Followers/Following lists, Follow/Unfollow, Direct Messaging
๐Ÿ” Discovery Search, Hashtag feeds, Location feeds, Explore, Notifications
โšก Performance Parallel processing, SharedBrowser (1 browser for all), Excel export
๐Ÿ›ก๏ธ Reliability Rate limiting, graceful shutdown (Ctrl+C), auto-save, retry logic

๐Ÿš€ Installation & Setup

# Install from PyPI
pip install instaharvest
playwright install chrome

# OR install from GitHub (latest dev version)
git clone https://github.com/mpython77/insta-harvester.git
cd insta-harvester
pip install -r requirements.txt
playwright install chrome

Create Instagram session (required, one-time):

from instaharvest import save_session
save_session()
# Browser opens โ†’ Login manually โ†’ Press ENTER โ†’ Session saved โœ…

โš ๏ธ Without instagram_session.json, the library won't work.


๐Ÿ“– Quick Start โ€” SharedBrowser (Recommended)

One browser for ALL operations โ€” fastest and most efficient way to use InstaHarvest.

from instaharvest import SharedBrowser
from instaharvest.config import ScraperConfig

config = ScraperConfig()

with SharedBrowser(config=config) as browser:
    # โ”€โ”€ Profile โ”€โ”€
    profile = browser.scrape_profile("username")
    print(f"{profile.full_name}: {profile.followers} followers")

    # โ”€โ”€ Social Actions โ”€โ”€
    browser.follow("user1")
    browser.send_message("user1", "Hello!")
    followers = browser.get_followers("user2", limit=100)

    # โ”€โ”€ Content Scraping โ”€โ”€
    post = browser.scrape_post("https://www.instagram.com/p/ABC/")
    reel = browser.scrape_reel("https://www.instagram.com/reel/XYZ/")
    stories = browser.scrape_stories("username")
    comments = browser.scrape_comments("https://www.instagram.com/p/ABC/")

    # โ”€โ”€ Discovery โ”€โ”€
    results = browser.search("fashion brands")
    hashtag = browser.scrape_hashtag("streetwear")
    notifs = browser.read_notifications()

    # โ”€โ”€ Batch Operations โ”€โ”€
    posts = browser.scrape_posts(["url1", "url2", "url3"])
    files = browser.download_post("https://www.instagram.com/p/ABC/")

    # โ”€โ”€ Web API (Direct JSON โ€” exact data) โ”€โ”€
    profile_json = browser.get_profile_json("username")
    print(f"Exact followers: {profile_json.follower_count:,}")

    feed = browser.get_user_feed_api(profile_json.user_id, count=5)
    reels = browser.get_reels_api(profile_json.user_id)
    highlights = browser.get_highlights_api(profile_json.user_id)

๐Ÿ“š API Reference

1. Profile Scraping

from instaharvest import ProfileScraper
from instaharvest.config import ScraperConfig

config = ScraperConfig()
scraper = ProfileScraper(config=config)
session_data = scraper.load_session()
scraper.setup_browser(session_data)

profile = scraper.scrape('username')
print(f"Posts: {profile.posts}, Followers: {profile.followers}")
print(f"Verified: {profile.is_verified}, Category: {profile.category}")
print(f"Bio: {profile.bio}, Links: {profile.external_links}")

scraper.close()

2. Followers / Following

from instaharvest import FollowersCollector
from instaharvest.config import ScraperConfig

config = ScraperConfig()
collector = FollowersCollector(config=config)
session_data = collector.load_session()
collector.setup_browser(session_data)

followers = collector.get_followers('username', limit=100, print_realtime=True)
following = collector.get_following('username', limit=50)

collector.close()

3. Follow / Unfollow & Direct Messaging

from instaharvest import FollowManager, MessageManager
from instaharvest.config import ScraperConfig

config = ScraperConfig()

# Follow
manager = FollowManager(config=config)
session_data = manager.load_session()
manager.setup_browser(session_data)
manager.follow('username')
manager.batch_follow(['user1', 'user2', 'user3'])
manager.close()

# DM
messenger = MessageManager(config=config)
session_data = messenger.load_session()
messenger.setup_browser(session_data)
messenger.send_message('username', 'Hello!')
messenger.batch_send(['user1', 'user2'], 'Hi there!')
messenger.close()

4. Post & Reel Data (JSON-First)

from instaharvest import PostDataScraper
from instaharvest.config import ScraperConfig

config = ScraperConfig()
scraper = PostDataScraper(config=config)
session_data = scraper.load_session()
scraper.setup_browser(session_data)

post = scraper.scrape('https://www.instagram.com/p/DVs7LK-iO0C/')

# 30+ fields extracted automatically from JSON
print(post.like_count, post.comment_count)     # Engagement
print(post.caption, post.tagged_accounts)       # Content
print(post.location.name if post.location else 'N/A')  # Location
print(post.owner.username if post.owner else 'N/A')     # Owner

for slide in post.carousel_slides:              # Carousel
    print(f"  Slide {slide.slide_index}: {slide.media_type}")

scraper.close()

5. Comment Scraping

from instaharvest import CommentScraper
from instaharvest.exporters import export_comments_to_json, export_comments_to_excel
from instaharvest.config import ScraperConfig

config = ScraperConfig()
scraper = CommentScraper(config=config)
session_data = scraper.load_session()
scraper.setup_browser(session_data)

result = scraper.scrape(
    'https://www.instagram.com/p/POST_ID/',
    max_comments=100,
    include_replies=True
)

for comment in result.comments:
    print(f"@{comment.author.username}: {comment.text}")
    for reply in comment.replies:
        print(f"  โ†ณ @{reply.author.username}: {reply.text}")

# Export
export_comments_to_json(result, 'comments.json')
export_comments_to_excel(result, 'comments.xlsx')

scraper.close()

6. Stories & Highlights

from instaharvest import StoryScraper, HighlightsScraper
from instaharvest.config import ScraperConfig

config = ScraperConfig()

# Stories โ€” JSON-first, per-slide tag mapping
scraper = StoryScraper(config=config)
session_data = scraper.load_session()
scraper.setup_browser(session_data)

result = scraper.scrape('username', extract_tags=True)
print(f"Stories: {result.story_count}, Tags: {result.all_tagged_accounts}")
for slide in result.slides:
    print(f"  Slide {slide.slide_index}: [{slide.media_type}] {slide.timestamp} โ†’ {slide.tagged_accounts}")

scraper.close()

# Highlights โ€” mentions, links, music, locations
hl_scraper = HighlightsScraper(config=config)
session = hl_scraper.load_session()
hl_scraper.setup_browser(session)

full = hl_scraper.scrape_all('mondayswimwear', max_slides_per=100)
print(f"{full.total_highlights} highlights, {full.total_slides} total slides")

hl_scraper.close()

7. Parallel Processing & Orchestrator

from instaharvest import SharedBrowser, InstagramOrchestrator
from instaharvest.config import ScraperConfig

config = ScraperConfig(headless=True)

with SharedBrowser(config=config) as browser:
    orch = InstagramOrchestrator(config, shared_browser=browser)

    results = orch.scrape_complete_profile_advanced(
        'username',
        parallel=3,
        save_excel=True,
        scrape_comments=True,
        scrape_stories=True
    )
    print(f"Scraped {len(results['posts_data'])} posts")

8. Tagged Posts

from instaharvest import TaggedPostsScraper
from instaharvest.config import ScraperConfig

config = ScraperConfig()
scraper = TaggedPostsScraper(config=config)
session = scraper.load_session()
scraper.setup_browser(session)

result = scraper.scrape('mondayswimwear', max_posts=100)
print(f"Total: {result.total_found} tagged posts, Unique taggers: {result.unique_taggers}")
for post in result.tagged_posts:
    print(f"  @{post.owner} โ†’ {post.url} ({post.media_type})")

scraper.close()

9. Notifications

from instaharvest import SharedBrowser
from instaharvest.config import ScraperConfig

config = ScraperConfig()
with SharedBrowser(config=config) as browser:
    notifs = browser.read_notifications()
    print(f"Total: {len(notifs)} notifications")

Notification types: follow, post_like, comment_like, comment, mention, follow_request, follow_accepted, thread, story, system

10. Media Download

from instaharvest import SharedBrowser
from instaharvest.config import ScraperConfig

config = ScraperConfig()
with SharedBrowser(config=config) as browser:
    # Handles images, videos, reels, carousels automatically
    files = browser.download_post("https://www.instagram.com/reel/C-example...")
    print(f"Downloaded {len(files)} files")

๐Ÿ”ง Video support requires Google Chrome (browser_channel='chrome', the default). Chromium lacks video codecs.


๐Ÿ”Œ Web API โ€” Direct JSON Data Extraction

Access Instagram's internal API endpoints directly through Playwright. Returns exact, structured data โ€” no DOM scraping.

16+ endpoints | 15 data models | Auto-pagination | Rate limiting | POST + GET support

from instaharvest import SharedBrowser
from instaharvest.config import ScraperConfig

config = ScraperConfig(headless=True)

with SharedBrowser(config=config) as browser:
    # โ”€โ”€ Profile (exact stats) โ”€โ”€
    profile = browser.get_profile_json('mondayswimwear')
    print(f"{profile.full_name}: {profile.follower_count:,} followers")
    user_id = profile.user_id

    # โ”€โ”€ Followers / Following โ”€โ”€
    followers = browser.get_followers_api(user_id, count=50)
    following = browser.get_following_api(user_id, count=50)

    # โ”€โ”€ Feed, Comments, Likers โ”€โ”€
    feed = browser.get_user_feed_api(user_id, count=12)
    comments = browser.get_media_comments_api(feed.posts[0].media_id)
    likers = browser.get_media_likers_api(feed.posts[0].media_id)

    # โ”€โ”€ Stories, Highlights, Reels โ”€โ”€
    stories = browser.get_stories_api(user_id)
    highlights = browser.get_highlights_api(user_id)
    reels = browser.get_reels_api(user_id)

    # โ”€โ”€ Hashtag & Location โ”€โ”€
    hashtag = browser.get_hashtag_feed_api('swimwear')
    location = browser.get_location_feed_api('213385402')

    # โ”€โ”€ Raw API (any endpoint, GET or POST) โ”€โ”€
    raw = browser.fetch_raw_api('/api/v1/users/1059031072/info/')

Available Endpoints:

Method Description Returns
get_profile_json(username) Profile with exact stats WebProfileData
get_user_info(user_id) Profile by ID WebProfileData
get_followers_api(id, count) Followers list (paginated) FollowListResult
get_following_api(id, count) Following list (paginated) FollowListResult
get_friendship_status(id) Follow relationship FriendshipStatus
get_user_feed_api(id, count) User's posts UserFeedResult
get_media_info_api(media_id) Detailed post info MediaInfo
get_media_comments_api(id) Post comments CommentsResult
get_media_likers_api(id) Post likers LikersResult
get_stories_api(id) Active stories List[StoryMediaItem]
get_highlights_api(id) Highlights list HighlightsResult
get_reels_api(id) Reels with plays ReelsResult
get_hashtag_feed_api(tag) Hashtag posts HashtagSection
get_location_feed_api(id) Location posts LocationSection
get_tagged_posts_api(id) Tagged posts UserFeedResult
fetch_raw_api(endpoint) Any endpoint (GET/POST) Dict

Direct API access (without SharedBrowser):

from instaharvest import InstagramWebAPI
from playwright.sync_api import sync_playwright

with sync_playwright() as pw:
    browser = pw.chromium.launch(headless=True)
    context = browser.new_context(storage_state='instagram_session.json')
    page = context.new_page()
    page.goto('https://www.instagram.com/')

    api = InstagramWebAPI(page=page)
    profile = api.get_profile('mondayswimwear')
    print(f"{profile.follower_count:,} followers")

    browser.close()

๐ŸŽฏ Complete Workflow Example

from instaharvest import SharedBrowser, InstagramOrchestrator
from instaharvest.config import ScraperConfig

config = ScraperConfig()

with SharedBrowser(config=config) as browser:
    # 1. Profile analysis
    profile = browser.scrape_profile('target_user')
    print(f"๐Ÿ“Š {profile.full_name}: {profile.followers} followers")

    # 2. Collect & follow
    followers = browser.get_followers('target_user', limit=50)
    for f in followers[:10]:
        browser.follow(f)

    # 3. Scrape posts
    post_links = browser.scrape_post_links('target_user')
    posts = browser.scrape_posts([l['url'] for l in post_links[:5]])

    # 4. Stories + Web API
    stories = browser.scrape_stories('target_user')
    profile_json = browser.get_profile_json('target_user')
    reels = browser.get_reels_api(profile_json.user_id)

    # 5. Full orchestrated scrape
    orch = InstagramOrchestrator(config, shared_browser=browser)
    results = orch.scrape_complete_profile_advanced(
        'target_user', parallel=3,
        save_excel=True, scrape_stories=True
    )
    print(f"โœ… {len(results['posts_data'])} posts scraped")

๐Ÿ“ Project Structure

๐Ÿ—‚๏ธ Package Structure
insta-harvester/
โ”œโ”€โ”€ instaharvest/              # Main package
โ”‚   โ”œโ”€โ”€ __init__.py            # Package entry point
โ”‚   โ”œโ”€โ”€ base.py                # Base scraper class
โ”‚   โ”œโ”€โ”€ config.py              # Configuration
โ”‚   โ”œโ”€โ”€ profile.py             # Profile scraping
โ”‚   โ”œโ”€โ”€ followers.py           # Followers collection
โ”‚   โ”œโ”€โ”€ follow.py              # Follow/unfollow
โ”‚   โ”œโ”€โ”€ message.py             # Direct messaging
โ”‚   โ”œโ”€โ”€ post_data.py           # Post data (JSON-first)
โ”‚   โ”œโ”€โ”€ reel_data.py           # Reel data extraction
โ”‚   โ”œโ”€โ”€ comment_scraper.py     # Comments with replies
โ”‚   โ”œโ”€โ”€ story_scraper.py       # Story scraping
โ”‚   โ”œโ”€โ”€ highlight_scraper.py   # Highlights extraction
โ”‚   โ”œโ”€โ”€ tagged_posts.py        # Tagged posts
โ”‚   โ”œโ”€โ”€ notifications.py       # Notification reader
โ”‚   โ”œโ”€โ”€ web_api.py             # ๐Ÿ”Œ Web API (16+ endpoints)
โ”‚   โ”œโ”€โ”€ shared_browser.py      # SharedBrowser
โ”‚   โ”œโ”€โ”€ orchestrator.py        # Workflow orchestrator
โ”‚   โ”œโ”€โ”€ parallel_scraper.py    # Parallel processing
โ”‚   โ”œโ”€โ”€ downloader.py          # Media download
โ”‚   โ””โ”€โ”€ ...                    # More modules
โ”œโ”€โ”€ examples/
โ”‚   โ”œโ”€โ”€ save_session.py        # Session setup
โ”‚   โ”œโ”€โ”€ all_in_one.py          # Interactive demo
โ”‚   โ”œโ”€โ”€ main_advanced.py       # Production scraping
โ”‚   โ”œโ”€โ”€ example_web_api.py     # ๐Ÿ”Œ Web API demo
โ”‚   โ””โ”€โ”€ example_custom_config.py
โ”œโ”€โ”€ tests/                     # 130+ unit tests
โ””โ”€โ”€ LICENSE                    # MIT License

โš™๏ธ Configuration

from instaharvest import ScraperConfig

config = ScraperConfig(
    headless=True,              # Run without browser UI
    viewport_width=1920,
    viewport_height=1080,
    default_timeout=30000,      # 30 seconds
    max_scroll_attempts=50,
    log_level='INFO',
    # Rate limiting
    follow_delay_min=10.0,
    follow_delay_max=15.0,
    message_delay_min=15.0,
    message_delay_max=20.0,
)

See Configuration Guide for all options.


๐Ÿ”ง Troubleshooting

๐Ÿ” Common Issues & Solutions
Problem Solution
playwright command not found pip install playwright && playwright install chrome
No module named 'instaharvest' pip install instaharvest or pip install -e .
Session file not found Run save_session() first
Login required / Session expired Re-run save_session()
Instagram says 'Try again later' Increase rate limiting delays in config
Could not follow/unfollow Increase popup_open_delay and action_delay_*
Slow internet errors Increase page_load_delay and scroll_delay_*
Posts: 0 but content exists Update to latest version (v2.7.1+)

Getting Help:


โš ๏ธ Disclaimer

This tool is for educational purposes only. Follow Instagram's Terms of Service, respect rate limits, and use responsibly.


๐Ÿ“œ License

MIT License โ€” see LICENSE for details.


๐Ÿค Contributing

Contributions welcome! Submit a Pull Request.


Made with โค๏ธ by Muydinov Doston

Happy Harvesting! ๐ŸŒพ

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

instaharvest-2.14.8.tar.gz (238.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

instaharvest-2.14.8-py3-none-any.whl (240.3 kB view details)

Uploaded Python 3

File details

Details for the file instaharvest-2.14.8.tar.gz.

File metadata

  • Download URL: instaharvest-2.14.8.tar.gz
  • Upload date:
  • Size: 238.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for instaharvest-2.14.8.tar.gz
Algorithm Hash digest
SHA256 ea6a811fa4597da1905c38da943005979cc963bbb2122c57447a2283c7b474b4
MD5 89867a52fbe7542dc1009acb488fd4cf
BLAKE2b-256 d2a1a5b2e9bee3cfa7015f81b4c40f049942d5d1658596a8f0f9717622371959

See more details on using hashes here.

File details

Details for the file instaharvest-2.14.8-py3-none-any.whl.

File metadata

  • Download URL: instaharvest-2.14.8-py3-none-any.whl
  • Upload date:
  • Size: 240.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for instaharvest-2.14.8-py3-none-any.whl
Algorithm Hash digest
SHA256 2d8a588e92b38c8e17700dbbc900042de36588f9212b1b761650f35032f68af0
MD5 070addce0135a5fd680f5eb50930489e
BLAKE2b-256 4982844235108eb6a93cc83c27f61726c8319fe173e9f9615850056396aea726

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page