Skip to main content

Professional Instagram data collection toolkit with automation features

Project description

InstaHarvest ๐ŸŒพ

Python Version PyPI version License Downloads GitHub stars

Professional Instagram Data Collection Toolkit โ€” Powerful library for Instagram automation, data collection, and analytics built on Playwright.

๐Ÿ“– Documentation | ๐Ÿ› Report Bug | ๐Ÿ’ก Request Feature | ๐Ÿ“‹ Changelog


โœจ Features

Category Capabilities
๐Ÿ“Š Profile Stats, verified badge, category, full bio, external links, Threads
๐Ÿ”Œ Web API 16+ JSON endpoints โ€” profiles, followers, feed, comments, reels, stories, hashtags
๐Ÿ“ธ Content Posts, Reels, Stories, Highlights, Tagged Posts โ€” with JSON-first architecture
๐Ÿ’ฌ Engagement Comments (with replies), likes, media download (images/videos via yt-dlp)
๐Ÿ‘ฅ Social Followers/Following lists, Follow/Unfollow, Direct Messaging
๐Ÿ” Discovery Search, Hashtag feeds, Location feeds, Explore, Notifications
โšก Performance Parallel processing, SharedBrowser (1 browser for all), Excel export
๐Ÿ›ก๏ธ Reliability Rate limiting, graceful shutdown (Ctrl+C), auto-save, retry logic

๐Ÿš€ Installation & Setup

# Install from PyPI
pip install instaharvest
playwright install chrome

# OR install from GitHub (latest dev version)
git clone https://github.com/mpython77/insta-harvester.git
cd insta-harvester
pip install -r requirements.txt
playwright install chrome

Create Instagram session (required, one-time):

from instaharvest import save_session
save_session()
# Browser opens โ†’ Login manually โ†’ Press ENTER โ†’ Session saved โœ…

โš ๏ธ Without instagram_session.json, the library won't work.


๐Ÿ“– Quick Start โ€” SharedBrowser (Recommended)

One browser for ALL operations โ€” fastest and most efficient way to use InstaHarvest.

from instaharvest import SharedBrowser
from instaharvest.config import ScraperConfig

config = ScraperConfig()

with SharedBrowser(config=config) as browser:
    # โ”€โ”€ Profile โ”€โ”€
    profile = browser.scrape_profile("username")
    print(f"{profile.full_name}: {profile.followers} followers")

    # โ”€โ”€ Social Actions โ”€โ”€
    browser.follow("user1")
    browser.send_message("user1", "Hello!")
    followers = browser.get_followers("user2", limit=100)

    # โ”€โ”€ Content Scraping โ”€โ”€
    post = browser.scrape_post("https://www.instagram.com/p/ABC/")
    reel = browser.scrape_reel("https://www.instagram.com/reel/XYZ/")
    stories = browser.scrape_stories("username")
    comments = browser.scrape_comments("https://www.instagram.com/p/ABC/")

    # โ”€โ”€ Discovery โ”€โ”€
    results = browser.search("fashion brands")
    hashtag = browser.scrape_hashtag("streetwear")
    notifs = browser.read_notifications()

    # โ”€โ”€ Batch Operations โ”€โ”€
    posts = browser.scrape_posts(["url1", "url2", "url3"])
    files = browser.download_post("https://www.instagram.com/p/ABC/")

    # โ”€โ”€ Web API (Direct JSON โ€” exact data) โ”€โ”€
    profile_json = browser.get_profile_json("username")
    print(f"Exact followers: {profile_json.follower_count:,}")

    feed = browser.get_user_feed_api(profile_json.user_id, count=5)
    reels = browser.get_reels_api(profile_json.user_id)
    highlights = browser.get_highlights_api(profile_json.user_id)

๐Ÿ“š API Reference

1. Profile Scraping

from instaharvest import ProfileScraper
from instaharvest.config import ScraperConfig

config = ScraperConfig()
scraper = ProfileScraper(config=config)
session_data = scraper.load_session()
scraper.setup_browser(session_data)

profile = scraper.scrape('username')
print(f"Posts: {profile.posts}, Followers: {profile.followers}")
print(f"Verified: {profile.is_verified}, Category: {profile.category}")
print(f"Bio: {profile.bio}, Links: {profile.external_links}")

scraper.close()

2. Followers / Following

from instaharvest import FollowersCollector
from instaharvest.config import ScraperConfig

config = ScraperConfig()
collector = FollowersCollector(config=config)
session_data = collector.load_session()
collector.setup_browser(session_data)

followers = collector.get_followers('username', limit=100, print_realtime=True)
following = collector.get_following('username', limit=50)

collector.close()

3. Follow / Unfollow & Direct Messaging

from instaharvest import FollowManager, MessageManager
from instaharvest.config import ScraperConfig

config = ScraperConfig()

# Follow
manager = FollowManager(config=config)
session_data = manager.load_session()
manager.setup_browser(session_data)
manager.follow('username')
manager.batch_follow(['user1', 'user2', 'user3'])
manager.close()

# DM
messenger = MessageManager(config=config)
session_data = messenger.load_session()
messenger.setup_browser(session_data)
messenger.send_message('username', 'Hello!')
messenger.batch_send(['user1', 'user2'], 'Hi there!')
messenger.close()

4. Post & Reel Data (JSON-First)

from instaharvest import PostDataScraper
from instaharvest.config import ScraperConfig

config = ScraperConfig()
scraper = PostDataScraper(config=config)
session_data = scraper.load_session()
scraper.setup_browser(session_data)

post = scraper.scrape('https://www.instagram.com/p/DVs7LK-iO0C/')

# 30+ fields extracted automatically from JSON
print(post.like_count, post.comment_count)     # Engagement
print(post.caption, post.tagged_accounts)       # Content
print(post.location.name if post.location else 'N/A')  # Location
print(post.owner.username if post.owner else 'N/A')     # Owner

for slide in post.carousel_slides:              # Carousel
    print(f"  Slide {slide.slide_index}: {slide.media_type}")

scraper.close()

5. Comment Scraping

from instaharvest import CommentScraper
from instaharvest.exporters import export_comments_to_json, export_comments_to_excel
from instaharvest.config import ScraperConfig

config = ScraperConfig()
scraper = CommentScraper(config=config)
session_data = scraper.load_session()
scraper.setup_browser(session_data)

result = scraper.scrape(
    'https://www.instagram.com/p/POST_ID/',
    max_comments=100,
    include_replies=True
)

for comment in result.comments:
    print(f"@{comment.author.username}: {comment.text}")
    for reply in comment.replies:
        print(f"  โ†ณ @{reply.author.username}: {reply.text}")

# Export
export_comments_to_json(result, 'comments.json')
export_comments_to_excel(result, 'comments.xlsx')

scraper.close()

6. Stories & Highlights

from instaharvest import StoryScraper, HighlightsScraper
from instaharvest.config import ScraperConfig

config = ScraperConfig()

# Stories โ€” JSON-first, per-slide tag mapping
scraper = StoryScraper(config=config)
session_data = scraper.load_session()
scraper.setup_browser(session_data)

result = scraper.scrape('username', extract_tags=True)
print(f"Stories: {result.story_count}, Tags: {result.all_tagged_accounts}")
for slide in result.slides:
    print(f"  Slide {slide.slide_index}: [{slide.media_type}] {slide.timestamp} โ†’ {slide.tagged_accounts}")

scraper.close()

# Highlights โ€” mentions, links, music, locations
hl_scraper = HighlightsScraper(config=config)
session = hl_scraper.load_session()
hl_scraper.setup_browser(session)

full = hl_scraper.scrape_all('mondayswimwear', max_slides_per=100)
print(f"{full.total_highlights} highlights, {full.total_slides} total slides")

hl_scraper.close()

7. Parallel Processing & Orchestrator

from instaharvest import SharedBrowser, InstagramOrchestrator
from instaharvest.config import ScraperConfig

config = ScraperConfig(headless=True)

with SharedBrowser(config=config) as browser:
    orch = InstagramOrchestrator(config, shared_browser=browser)

    results = orch.scrape_complete_profile_advanced(
        'username',
        parallel=3,
        save_excel=True,
        scrape_comments=True,
        scrape_stories=True
    )
    print(f"Scraped {len(results['posts_data'])} posts")

8. Tagged Posts

from instaharvest import TaggedPostsScraper
from instaharvest.config import ScraperConfig

config = ScraperConfig()
scraper = TaggedPostsScraper(config=config)
session = scraper.load_session()
scraper.setup_browser(session)

result = scraper.scrape('mondayswimwear', max_posts=100)
print(f"Total: {result.total_found} tagged posts, Unique taggers: {result.unique_taggers}")
for post in result.tagged_posts:
    print(f"  @{post.owner} โ†’ {post.url} ({post.media_type})")

scraper.close()

9. Notifications

from instaharvest import SharedBrowser
from instaharvest.config import ScraperConfig

config = ScraperConfig()
with SharedBrowser(config=config) as browser:
    notifs = browser.read_notifications()
    print(f"Total: {len(notifs)} notifications")

Notification types: follow, post_like, comment_like, comment, mention, follow_request, follow_accepted, thread, story, system

10. Media Download

from instaharvest import SharedBrowser
from instaharvest.config import ScraperConfig

config = ScraperConfig()
with SharedBrowser(config=config) as browser:
    # Handles images, videos, reels, carousels automatically
    files = browser.download_post("https://www.instagram.com/reel/C-example...")
    print(f"Downloaded {len(files)} files")

๐Ÿ”ง Video support requires Google Chrome (browser_channel='chrome', the default). Chromium lacks video codecs.


๐Ÿ”Œ Web API โ€” Direct JSON Data Extraction

Access Instagram's internal API endpoints directly through Playwright. Returns exact, structured data โ€” no DOM scraping.

16+ endpoints | 15 data models | Auto-pagination | Rate limiting | POST + GET support

from instaharvest import SharedBrowser
from instaharvest.config import ScraperConfig

config = ScraperConfig(headless=True)

with SharedBrowser(config=config) as browser:
    # โ”€โ”€ Profile (exact stats) โ”€โ”€
    profile = browser.get_profile_json('mondayswimwear')
    print(f"{profile.full_name}: {profile.follower_count:,} followers")
    user_id = profile.user_id

    # โ”€โ”€ Followers / Following โ”€โ”€
    followers = browser.get_followers_api(user_id, count=50)
    following = browser.get_following_api(user_id, count=50)

    # โ”€โ”€ Feed, Comments, Likers โ”€โ”€
    feed = browser.get_user_feed_api(user_id, count=12)
    comments = browser.get_media_comments_api(feed.posts[0].media_id)
    likers = browser.get_media_likers_api(feed.posts[0].media_id)

    # โ”€โ”€ Stories, Highlights, Reels โ”€โ”€
    stories = browser.get_stories_api(user_id)
    highlights = browser.get_highlights_api(user_id)
    reels = browser.get_reels_api(user_id)

    # โ”€โ”€ Hashtag & Location โ”€โ”€
    hashtag = browser.get_hashtag_feed_api('swimwear')
    location = browser.get_location_feed_api('213385402')

    # โ”€โ”€ Raw API (any endpoint, GET or POST) โ”€โ”€
    raw = browser.fetch_raw_api('/api/v1/users/1059031072/info/')

Available Endpoints:

Method Description Returns
get_profile_json(username) Profile with exact stats WebProfileData
get_user_info(user_id) Profile by ID WebProfileData
get_followers_api(id, count) Followers list (paginated) FollowListResult
get_following_api(id, count) Following list (paginated) FollowListResult
get_friendship_status(id) Follow relationship FriendshipStatus
get_user_feed_api(id, count) User's posts UserFeedResult
get_media_info_api(media_id) Detailed post info MediaInfo
get_media_comments_api(id) Post comments CommentsResult
get_media_likers_api(id) Post likers LikersResult
get_stories_api(id) Active stories List[StoryMediaItem]
get_highlights_api(id) Highlights list HighlightsResult
get_reels_api(id) Reels with plays ReelsResult
get_hashtag_feed_api(tag) Hashtag posts HashtagSection
get_location_feed_api(id) Location posts LocationSection
get_tagged_posts_api(id) Tagged posts UserFeedResult
fetch_raw_api(endpoint) Any endpoint (GET/POST) Dict

Direct API access (without SharedBrowser):

from instaharvest import InstagramWebAPI
from playwright.sync_api import sync_playwright

with sync_playwright() as pw:
    browser = pw.chromium.launch(headless=True)
    context = browser.new_context(storage_state='instagram_session.json')
    page = context.new_page()
    page.goto('https://www.instagram.com/')

    api = InstagramWebAPI(page=page)
    profile = api.get_profile('mondayswimwear')
    print(f"{profile.follower_count:,} followers")

    browser.close()

๐ŸŽฏ Complete Workflow Example

from instaharvest import SharedBrowser, InstagramOrchestrator
from instaharvest.config import ScraperConfig

config = ScraperConfig()

with SharedBrowser(config=config) as browser:
    # 1. Profile analysis
    profile = browser.scrape_profile('target_user')
    print(f"๐Ÿ“Š {profile.full_name}: {profile.followers} followers")

    # 2. Collect & follow
    followers = browser.get_followers('target_user', limit=50)
    for f in followers[:10]:
        browser.follow(f)

    # 3. Scrape posts
    post_links = browser.scrape_post_links('target_user')
    posts = browser.scrape_posts([l['url'] for l in post_links[:5]])

    # 4. Stories + Web API
    stories = browser.scrape_stories('target_user')
    profile_json = browser.get_profile_json('target_user')
    reels = browser.get_reels_api(profile_json.user_id)

    # 5. Full orchestrated scrape
    orch = InstagramOrchestrator(config, shared_browser=browser)
    results = orch.scrape_complete_profile_advanced(
        'target_user', parallel=3,
        save_excel=True, scrape_stories=True
    )
    print(f"โœ… {len(results['posts_data'])} posts scraped")

๐Ÿ“ Project Structure

๐Ÿ—‚๏ธ Package Structure
insta-harvester/
โ”œโ”€โ”€ instaharvest/              # Main package
โ”‚   โ”œโ”€โ”€ __init__.py            # Package entry point
โ”‚   โ”œโ”€โ”€ base.py                # Base scraper class
โ”‚   โ”œโ”€โ”€ config.py              # Configuration
โ”‚   โ”œโ”€โ”€ profile.py             # Profile scraping
โ”‚   โ”œโ”€โ”€ followers.py           # Followers collection
โ”‚   โ”œโ”€โ”€ follow.py              # Follow/unfollow
โ”‚   โ”œโ”€โ”€ message.py             # Direct messaging
โ”‚   โ”œโ”€โ”€ post_data.py           # Post data (JSON-first)
โ”‚   โ”œโ”€โ”€ reel_data.py           # Reel data extraction
โ”‚   โ”œโ”€โ”€ comment_scraper.py     # Comments with replies
โ”‚   โ”œโ”€โ”€ story_scraper.py       # Story scraping
โ”‚   โ”œโ”€โ”€ highlight_scraper.py   # Highlights extraction
โ”‚   โ”œโ”€โ”€ tagged_posts.py        # Tagged posts
โ”‚   โ”œโ”€โ”€ notifications.py       # Notification reader
โ”‚   โ”œโ”€โ”€ web_api.py             # ๐Ÿ”Œ Web API (16+ endpoints)
โ”‚   โ”œโ”€โ”€ shared_browser.py      # SharedBrowser
โ”‚   โ”œโ”€โ”€ orchestrator.py        # Workflow orchestrator
โ”‚   โ”œโ”€โ”€ parallel_scraper.py    # Parallel processing
โ”‚   โ”œโ”€โ”€ downloader.py          # Media download
โ”‚   โ””โ”€โ”€ ...                    # More modules
โ”œโ”€โ”€ examples/
โ”‚   โ”œโ”€โ”€ save_session.py        # Session setup
โ”‚   โ”œโ”€โ”€ all_in_one.py          # Interactive demo
โ”‚   โ”œโ”€โ”€ main_advanced.py       # Production scraping
โ”‚   โ”œโ”€โ”€ example_web_api.py     # ๐Ÿ”Œ Web API demo
โ”‚   โ””โ”€โ”€ example_custom_config.py
โ”œโ”€โ”€ tests/                     # 130+ unit tests
โ””โ”€โ”€ LICENSE                    # MIT License

โš™๏ธ Configuration

from instaharvest import ScraperConfig

config = ScraperConfig(
    headless=True,              # Run without browser UI
    viewport_width=1920,
    viewport_height=1080,
    default_timeout=30000,      # 30 seconds
    max_scroll_attempts=50,
    log_level='INFO',
    # Rate limiting
    follow_delay_min=10.0,
    follow_delay_max=15.0,
    message_delay_min=15.0,
    message_delay_max=20.0,
)

See Configuration Guide for all options.


๐Ÿ”ง Troubleshooting

๐Ÿ” Common Issues & Solutions
Problem Solution
playwright command not found pip install playwright && playwright install chrome
No module named 'instaharvest' pip install instaharvest or pip install -e .
Session file not found Run save_session() first
Login required / Session expired Re-run save_session()
Instagram says 'Try again later' Increase rate limiting delays in config
Could not follow/unfollow Increase popup_open_delay and action_delay_*
Slow internet errors Increase page_load_delay and scroll_delay_*
Posts: 0 but content exists Update to latest version (v2.7.1+)

Getting Help:


โš ๏ธ Disclaimer

This tool is for educational purposes only. Follow Instagram's Terms of Service, respect rate limits, and use responsibly.


๐Ÿ“œ License

MIT License โ€” see LICENSE for details.


๐Ÿค Contributing

Contributions welcome! Submit a Pull Request.


Made with โค๏ธ by Muydinov Doston

Happy Harvesting! ๐ŸŒพ

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

instaharvest-2.15.1.tar.gz (240.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

instaharvest-2.15.1-py3-none-any.whl (242.0 kB view details)

Uploaded Python 3

File details

Details for the file instaharvest-2.15.1.tar.gz.

File metadata

  • Download URL: instaharvest-2.15.1.tar.gz
  • Upload date:
  • Size: 240.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for instaharvest-2.15.1.tar.gz
Algorithm Hash digest
SHA256 b507c4fa7c4f05021442a8068d202e4e0c0b4d7ff1d945f84da583d556c5d233
MD5 cedf151ef42867c23428c6fbabf3bd47
BLAKE2b-256 677f648f2f257ce4b1f94c64dc319ed106f3d60c0bc43f8307ce68701e549605

See more details on using hashes here.

File details

Details for the file instaharvest-2.15.1-py3-none-any.whl.

File metadata

  • Download URL: instaharvest-2.15.1-py3-none-any.whl
  • Upload date:
  • Size: 242.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for instaharvest-2.15.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f791c90911f3a625da816802119ca636cb43ee9a724dbcdbe34626f1ba0a4d2b
MD5 d3138769821014ad528bdae6e3faaa6f
BLAKE2b-256 4abf930a0ed9ef9c49b2b13141770bf391946d3bdc74241c02d7d9a4fa78df60

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page