Skip to main content

Professional Instagram data collection toolkit with automation features

Project description

InstaHarvest ๐ŸŒพ

Python Version PyPI version License Code Style Downloads GitHub issues GitHub stars

Professional Instagram Data Collection Toolkit - A powerful and efficient library for Instagram automation, data collection, and analytics.

๐Ÿ“– Documentation | ๐Ÿ› Report Bug | ๐Ÿ’ก Request Feature | ๐Ÿค Contributing | ๐Ÿ“‹ Changelog


โœจ Features

  • ๐Ÿ“Š Profile Statistics - Collect followers, following, posts count
  • ๐Ÿ”— Post & Reel Links - Intelligent scrolling and link collection
  • ๐Ÿท๏ธ Tagged Accounts - Extract tags from posts and reels
  • ๐Ÿ‘ฅ Followers/Following - Collect lists with real-time output
  • ๐Ÿ’ฌ Direct Messaging - Send DMs with smart rate limiting
  • ๐Ÿค Follow/Unfollow - Manage following with rate limiting
  • โšก Parallel Processing - Scrape multiple posts simultaneously
  • ๐Ÿ“‘ Excel Export - Real-time data export to Excel
  • ๐ŸŒ Shared Browser - Single browser for all operations
  • ๐Ÿ” HTML Detection - Automatic structure change detection
  • ๐Ÿ“ Professional Logging - Comprehensive logging system

๐Ÿš€ Installation

Method 1: Install from PyPI (Recommended)

# Install the package
pip install instaharvest

# Install Playwright browser
playwright install chrome

Method 2: Install from GitHub (Latest Development Version)

Step 1: Clone the Repository

git clone https://github.com/mpython77/insta-harvester.git
cd insta-harvester

Step 2: Install Dependencies

# Install Python dependencies
pip install -r requirements.txt

# Install Playwright browser
playwright install chrome

Step 3: Install Package in Development Mode (Optional)

# Install as editable package
pip install -e .

OR simply use it without installation:

# Just make sure you're in the project directory
cd /path/to/insta-harvester

# Then run examples
python examples/save_session.py

๐Ÿ”ง Complete Setup Guide

Step 1: Verify Python Installation

# Check Python version (requires 3.8+)
python --version

# Should show: Python 3.8.0 or higher

Step 2: Install InstaHarvest

From GitHub:

git clone https://github.com/mpython77/insta-harvester.git
cd insta-harvester
pip install -r requirements.txt
playwright install chrome

From PyPI:

pip install instaharvest
playwright install chrome

Step 3: Create Instagram Session (REQUIRED!)

# Navigate to examples directory
cd examples

# Run session setup script
python save_session.py

This will:

  1. Open Chrome browser
  2. Navigate to Instagram
  3. Let you log in manually
  4. Save your session to instagram_session.json
  5. All future scripts will use this session (no re-login needed!)

Important: Without this session file, the library won't work!

Step 4: Test Your Setup

# First, create your Instagram session (required!)
python examples/save_session.py

# Try the all-in-one interactive demo (recommended for learning)
python examples/all_in_one.py

# Or try production scraping
python examples/main_advanced.py

โš ๏ธ IMPORTANT: Always Use ScraperConfig! All examples below use ScraperConfig() for proper timing and reliability. Even when using default settings, explicitly creating config is best practice. This prevents timing issues with popups, buttons, and rate limits. See Configuration Guide for customization options.

๐Ÿ“– Quick Start Examples

Example 1: Follow a User

from instaharvest import FollowManager
from instaharvest.config import ScraperConfig

# Create config (customize if needed)
config = ScraperConfig()

# Create manager with config
manager = FollowManager(config=config)

# Load session
session_data = manager.load_session()
manager.setup_browser(session_data)

# Follow someone
result = manager.follow("instagram")
print(result)  # {'success': True, 'status': 'followed', ...}

# Clean up
manager.close()

Example 2: Send Direct Message

from instaharvest import MessageManager
from instaharvest.config import ScraperConfig

# Create config
config = ScraperConfig()
manager = MessageManager(config=config)
session_data = manager.load_session()
manager.setup_browser(session_data)

# Send message
result = manager.send_message("username", "Hello from Python!")
print(result)

manager.close()

Example 3: Collect Followers

from instaharvest import FollowersCollector
from instaharvest.config import ScraperConfig

# Create config
config = ScraperConfig()
collector = FollowersCollector(config=config)
session_data = collector.load_session()
collector.setup_browser(session_data)

# Collect first 100 followers
followers = collector.get_followers("username", limit=100, print_realtime=True)
print(f"Collected {len(followers)} followers")

collector.close()

Example 4: All Operations in One Browser

from instaharvest import SharedBrowser
from instaharvest.config import ScraperConfig

# Create config for better reliability
config = ScraperConfig()

# One browser for everything!
with SharedBrowser(config=config) as browser:
    # Follow users
    browser.follow("user1")
    browser.follow("user2")

    # Send messages
    browser.send_message("user1", "Thanks for the follow!")

    # Collect followers
    followers = browser.get_followers("my_account", limit=50)
    print(f"Followers: {len(followers)}")

๐Ÿ“ Example Scripts

The examples/ directory contains ready-to-use scripts:

๐Ÿ”‘ Session Setup (Required First)

python examples/save_session.py

Creates Instagram session (one-time setup, then reused automatically).

๐ŸŽฎ Interactive Demo

python examples/all_in_one.py

Interactive menu with ALL features:

  • Follow/Unfollow users
  • Send messages
  • Collect followers/following
  • Batch operations
  • Profile scraping

๐Ÿš€ Production Scraping

python examples/main_advanced.py

Full automatic profile scraping:

  • Collects all post/reel links
  • Extracts data with parallel processing
  • Exports to Excel + JSON
  • Advanced diagnostics & error recovery

โš™๏ธ Configuration Examples

python examples/example_custom_config.py

Shows how to customize configuration (delays, viewport, etc.).

๐Ÿ“– Documentation

1. Profile Scraping

from instaharvest import ProfileScraper
from instaharvest.config import ScraperConfig

config = ScraperConfig()
scraper = ProfileScraper(config=config)
session_data = scraper.load_session()
scraper.setup_browser(session_data)

profile = scraper.scrape('username')
print(f"Posts: {profile.posts}")
print(f"Followers: {profile.followers}")
print(f"Following: {profile.following}")

scraper.close()

2. Collect Followers/Following

from instaharvest import FollowersCollector
from instaharvest.config import ScraperConfig

# Create config
config = ScraperConfig()
collector = FollowersCollector(config=config)
session_data = collector.load_session()
collector.setup_browser(session_data)

# Collect first 100 followers
followers = collector.get_followers('username', limit=100, print_realtime=True)
print(f"Collected {len(followers)} followers")

# Collect following
following = collector.get_following('username', limit=50)

collector.close()

3. Follow/Unfollow Management

from instaharvest import FollowManager
from instaharvest.config import ScraperConfig

config = ScraperConfig()
manager = FollowManager(config=config)
session_data = manager.load_session()
manager.setup_browser(session_data)

# Follow a user
result = manager.follow('username')
print(result)  # {'status': 'success', 'action': 'followed', ...}

# Unfollow
result = manager.unfollow('username')

# Batch follow
usernames = ['user1', 'user2', 'user3']
results = manager.batch_follow(usernames)

manager.close()

4. Direct Messaging

from instaharvest import MessageManager
from instaharvest.config import ScraperConfig

config = ScraperConfig()
messenger = MessageManager(config=config)
session_data = messenger.load_session()
messenger.setup_browser(session_data)

# Send single message
result = messenger.send_message('username', 'Hello!')

# Batch send
usernames = ['user1', 'user2']
results = messenger.batch_send(usernames, 'Hi there!')

messenger.close()

5. Shared Browser (Recommended!)

Use one browser for all operations - Much faster!

from instaharvest import SharedBrowser
from instaharvest.config import ScraperConfig

# Create config
config = ScraperConfig()

with SharedBrowser(config=config) as browser:
    # All operations use the same browser instance
    browser.follow('user1')
    browser.send_message('user1', 'Hello!')
    followers = browser.get_followers('user2', limit=100)
    profile = browser.scrape_profile('user3')

    # No browser reopening! Fast and efficient!

6. Advanced: Parallel Processing

from instaharvest import InstagramOrchestrator, ScraperConfig

config = ScraperConfig(headless=True)
orchestrator = InstagramOrchestrator(config)

# Scrape with 3 parallel workers + Excel export
results = orchestrator.scrape_complete_profile_advanced(
    'username',
    parallel=3,        # 3 parallel browser tabs
    save_excel=True,   # Real-time Excel export
    max_posts=100
)

print(f"Scraped {len(results['posts_data'])} posts")

7. Post Data Extraction

from instaharvest import PostDataScraper
from instaharvest.config import ScraperConfig

config = ScraperConfig()
scraper = PostDataScraper(config=config)
session_data = scraper.load_session()
scraper.setup_browser(session_data)

# Scrape single post
post = scraper.scrape('https://www.instagram.com/p/POST_ID/')
print(f"Tagged: {post.tagged_accounts}")
print(f"Likes: {post.likes}")
print(f"Date: {post.timestamp}")

scraper.close()

๐ŸŽฏ Complete Workflow Example

from instaharvest import SharedBrowser
from instaharvest.config import ScraperConfig

# Create config
config = ScraperConfig()

with SharedBrowser(config=config) as browser:
    # 1. Get profile stats
    profile = browser.scrape_profile('target_user')
    print(f"Target has {profile['followers']} followers")

    # 2. Collect their followers
    followers = browser.get_followers('target_user', limit=50)
    print(f"Collected {len(followers)} followers")

    # 3. Follow them
    for follower in followers[:10]:  # Follow first 10
        result = browser.follow(follower)
        if result['status'] == 'success':
            print(f"โœ“ Followed {follower}")

    # 4. Send welcome message
    for follower in followers[:5]:
        browser.send_message(follower, "Thanks for following!")

๐Ÿ“‹ Requirements

  • Python 3.8+
  • Playwright (with Chrome browser)
  • pandas
  • openpyxl
  • beautifulsoup4
  • lxml

๐Ÿ”ง Session Setup

First-time setup - Save your Instagram session:

python examples/save_session.py

This will:

  1. Open Chrome browser
  2. Let you log in to Instagram manually
  3. Save session to instagram_session.json
  4. All future scripts will use this session (no re-login needed!)

๐Ÿ“ Project Structure

instaharvest/
โ”œโ”€โ”€ instaharvest/          # Main package
โ”‚   โ”œโ”€โ”€ __init__.py        # Package entry point
โ”‚   โ”œโ”€โ”€ base.py            # Base scraper class
โ”‚   โ”œโ”€โ”€ config.py          # Configuration
โ”‚   โ”œโ”€โ”€ profile.py         # Profile scraping
โ”‚   โ”œโ”€โ”€ followers.py       # Followers collection
โ”‚   โ”œโ”€โ”€ follow.py          # Follow/unfollow
โ”‚   โ”œโ”€โ”€ message.py         # Direct messaging
โ”‚   โ”œโ”€โ”€ post_data.py       # Post data extraction
โ”‚   โ”œโ”€โ”€ shared_browser.py  # Shared browser manager
โ”‚   โ””โ”€โ”€ ...                # More modules
โ”œโ”€โ”€ examples/              # Example scripts
โ”œโ”€โ”€ README.md              # This file
โ”œโ”€โ”€ setup.py               # Package setup
โ””โ”€โ”€ LICENSE                # MIT License

โš™๏ธ Configuration

from instaharvest import ScraperConfig

config = ScraperConfig(
    headless=True,              # Run in headless mode
    viewport_width=1920,
    viewport_height=1080,
    default_timeout=30000,      # 30 seconds
    max_scroll_attempts=50,
    log_level='INFO'
)

๐Ÿ›ก๏ธ Best Practices

  1. Use SharedBrowser - Reuses browser instance, much faster
  2. Rate Limiting - Built-in delays to avoid Instagram bans
  3. Session Management - Auto-refreshes session to prevent expiration
  4. Error Handling - Comprehensive exception handling
  5. Logging - Professional logging for debugging

๐Ÿ”ง Troubleshooting

Installation Issues

Error: "playwright command not found"

# Solution: Install Playwright first
pip install playwright
playwright install chrome

Error: "No module named 'instaharvest'"

# Solution 1: If installed from PyPI
pip install instaharvest

# Solution 2: If using GitHub clone
cd /path/to/insta-harvester
pip install -e .

# Solution 3: Run from project directory
cd /path/to/insta-harvester
python examples/save_session.py  # Works without installation

Error: "Could not find Chrome browser"

# Solution: Install Playwright browsers
playwright install chrome

Session Issues

Error: "Session file not found"

# Solution: Create session first (REQUIRED!)
cd examples
python save_session.py

# Then run your script
python all_in_one.py  # or any other script

Error: "Login required" or "Session expired"

# Solution: Re-create session
cd examples
python save_session.py

# Log in again when browser opens

Operation Errors

Error: "Could not unfollow @username"

Cause: Unfollow popup appears too slowly for the program

Solution: Increase popup delays in configuration

from instaharvest import FollowManager
from instaharvest.config import ScraperConfig

config = ScraperConfig(
    popup_open_delay=4.0,       # Wait longer for popup
    action_delay_min=3.0,
    action_delay_max=4.5,
)

manager = FollowManager(config=config)

See CONFIGURATION_GUIDE.md for detailed configuration options.

Error: "Could not follow @username"

Solution:

config = ScraperConfig(
    button_click_delay=3.0,
    action_delay_min=2.5,
    action_delay_max=4.0,
)

Error: "Instagram says 'Try again later'"

Cause: Instagram rate limiting - you're doing too much too quickly

Solution: Increase rate limiting delays

config = ScraperConfig(
    follow_delay_min=10.0,      # Wait 10-15 seconds between follows
    follow_delay_max=15.0,
    message_delay_min=15.0,     # Wait 15-20 seconds between messages
    message_delay_max=20.0,
)

Slow Internet Issues

Problem: You have slow internet, pages load slowly, getting errors

Solution:

from instaharvest.config import ScraperConfig

config = ScraperConfig(
    page_load_delay=5.0,        # Wait longer for pages
    popup_open_delay=4.0,       # Wait longer for popups
    scroll_delay_min=3.0,       # Slower scrolling
    scroll_delay_max=5.0,
)

# Use with any manager
from instaharvest import FollowManager
manager = FollowManager(config=config)

Getting Help

  1. Check documentation:

    • README.md - Main guide (this file)
    • CONFIGURATION_GUIDE.md - Complete configuration reference
    • examples/README.md - Example scripts guide
    • CHANGELOG.md - Version history and changes
    • CONTRIBUTING.md - How to contribute
  2. Common issues:

    • Unfollow errors โ†’ Increase popup_open_delay
    • Slow internet โ†’ Increase all delays
    • Rate limiting โ†’ Increase follow_delay_* and message_delay_*
  3. Report bugs:

  4. Email support:


โš ๏ธ Disclaimer

This tool is for educational purposes only. Make sure to:

  • Follow Instagram's Terms of Service
  • Respect rate limits
  • Don't spam or harass users
  • Use responsibly

The authors are not responsible for any misuse of this library.


๐Ÿ“œ License

MIT License - see LICENSE file for details


๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.


๐Ÿ“ž Support


๐ŸŽ‰ Acknowledgments

Built with:


Made with โค๏ธ by Doston

Happy Harvesting! ๐ŸŒพ

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

instaharvest-2.5.1.tar.gz (71.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

instaharvest-2.5.1-py3-none-any.whl (70.2 kB view details)

Uploaded Python 3

File details

Details for the file instaharvest-2.5.1.tar.gz.

File metadata

  • Download URL: instaharvest-2.5.1.tar.gz
  • Upload date:
  • Size: 71.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for instaharvest-2.5.1.tar.gz
Algorithm Hash digest
SHA256 9f4a99e37e73caf3fb06725664913337da7ed21713a74ec7083ba742866920a3
MD5 353674fccca6393bf5e6fdf422a343e1
BLAKE2b-256 81329fa15388e5b2f05d5e0dcd3b0386e56bde18b2c4d812e98b54a0d1171855

See more details on using hashes here.

File details

Details for the file instaharvest-2.5.1-py3-none-any.whl.

File metadata

  • Download URL: instaharvest-2.5.1-py3-none-any.whl
  • Upload date:
  • Size: 70.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for instaharvest-2.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7654052f3590a49ab4022ebeeceb1c381346f8283d90033fa791fb398bf16f54
MD5 da35805b2cef49c615fd936662917807
BLAKE2b-256 fa6764f0a8d422743e6460f20c4476a1a6be196e060ea473103d578f90d6d388

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page