Skip to main content

Professional Instagram data collection toolkit with automation features

Project description

InstaHarvest ๐ŸŒพ

Python Version PyPI version License Code Style Downloads GitHub issues GitHub stars

Professional Instagram Data Collection Toolkit - A powerful and efficient library for Instagram automation, data collection, and analytics.

๐Ÿ“– Documentation | ๐Ÿ› Report Bug | ๐Ÿ’ก Request Feature | ๐Ÿค Contributing | ๐Ÿ“‹ Changelog


โœจ Features

  • ๐Ÿ“Š Profile Statistics - Collect followers, following, posts count
  • ๐Ÿ”— Post & Reel Links - Intelligent scrolling and link collection
  • ๐Ÿท๏ธ Tagged Accounts - Extract tags from posts and reels
  • ๐Ÿ‘ฅ Followers/Following - Collect lists with real-time output
  • ๐Ÿ’ฌ Direct Messaging - Send DMs with smart rate limiting
  • ๐Ÿค Follow/Unfollow - Manage following with rate limiting
  • โšก Parallel Processing - Scrape multiple posts simultaneously
  • ๐Ÿ“‘ Excel Export - Real-time data export to Excel
  • ๐ŸŒ Shared Browser - Single browser for all operations
  • ๐Ÿ” HTML Detection - Automatic structure change detection
  • ๐Ÿ“ Professional Logging - Comprehensive logging system

๐Ÿš€ Installation

๐Ÿ“ฆ Method 1: Install from PyPI (Recommended) - Click to expand
# Install the package
pip install instaharvest

# Install Playwright browser
playwright install chrome
๐Ÿ”ง Method 2: Install from GitHub (Latest Development Version) - Click to expand

Step 1: Clone the Repository

git clone https://github.com/mpython77/insta-harvester.git
cd insta-harvester

Step 2: Install Dependencies

# Install Python dependencies
pip install -r requirements.txt

# Install Playwright browser
playwright install chrome

Step 3: Install Package in Development Mode (Optional)

# Install as editable package
pip install -e .

OR simply use it without installation:

# Just make sure you're in the project directory
cd /path/to/insta-harvester

# Then run examples
python examples/save_session.py

๐Ÿ”ง Complete Setup Guide

๐Ÿ“‹ Step-by-Step Setup Instructions - Click to expand

Step 1: Verify Python Installation

# Check Python version (requires 3.8+)
python --version

# Should show: Python 3.8.0 or higher

Step 2: Install InstaHarvest

From GitHub:

git clone https://github.com/mpython77/insta-harvester.git
cd insta-harvester
pip install -r requirements.txt
playwright install chrome

From PyPI:

pip install instaharvest
playwright install chrome

Step 3: Create Instagram Session (REQUIRED!)

Option A: Using Python (Recommended) โญ

from instaharvest import save_session
save_session()

Option B: Using Example Script

# Navigate to examples directory
cd examples

# Run session setup script
python save_session.py

This will:

  1. Open Chrome browser
  2. Navigate to Instagram
  3. Let you log in manually
  4. Save your session to instagram_session.json
  5. All future scripts will use this session (no re-login needed!)

Important: Without this session file, the library won't work!

Step 4: Test Your Setup

# First, create your Instagram session (required!)
python examples/save_session.py

# Try the all-in-one interactive demo (recommended for learning)
python examples/all_in_one.py

# Or try production scraping
python examples/main_advanced.py

โš ๏ธ IMPORTANT: Always Use ScraperConfig! All examples below use ScraperConfig() for proper timing and reliability. Even when using default settings, explicitly creating config is best practice. This prevents timing issues with popups, buttons, and rate limits. See Configuration Guide for customization options.

๐Ÿš€ First-Time Setup

Before using any features, create an Instagram session (one-time setup):

from instaharvest import save_session

# Create session - this will open a browser
save_session()

# Follow the prompts:
# 1. Browser will open automatically
# 2. Login to Instagram manually
# 3. Press ENTER in terminal when done
# 4. Session saved to instagram_session.json โœ…

That's it! Now you can use all library features. The session will be reused automatically.


๐Ÿ“– Quick Start Examples

Example 1: Follow a User - Click to expand
from instaharvest import FollowManager
from instaharvest.config import ScraperConfig

# Create config (customize if needed)
config = ScraperConfig()

# Create manager with config
manager = FollowManager(config=config)

# Load session
session_data = manager.load_session()
manager.setup_browser(session_data)

# Follow someone
result = manager.follow("instagram")
print(result)  # {'success': True, 'status': 'followed', ...}

# Clean up
manager.close()
Example 2: Send Direct Message - Click to expand
from instaharvest import MessageManager
from instaharvest.config import ScraperConfig

# Create config
config = ScraperConfig()
manager = MessageManager(config=config)
session_data = manager.load_session()
manager.setup_browser(session_data)

# Send message
result = manager.send_message("username", "Hello from Python!")
print(result)

manager.close()
Example 3: Collect Followers - Click to expand
from instaharvest import FollowersCollector
from instaharvest.config import ScraperConfig

# Create config
config = ScraperConfig()
collector = FollowersCollector(config=config)
session_data = collector.load_session()
collector.setup_browser(session_data)

# Collect first 100 followers
followers = collector.get_followers("username", limit=100, print_realtime=True)
print(f"Collected {len(followers)} followers")

collector.close()
Example 4: All Operations in One Browser (SharedBrowser) - Click to expand
from instaharvest import SharedBrowser
from instaharvest.config import ScraperConfig

# Create config for better reliability
config = ScraperConfig()

# One browser for everything!
with SharedBrowser(config=config) as browser:
    # Follow users
    browser.follow("user1")
    browser.follow("user2")

    # Send messages
    browser.send_message("user1", "Thanks for the follow!")

    # Collect followers
    followers = browser.get_followers("my_account", limit=50)
    print(f"Followers: {len(followers)}")

๐Ÿ“ Example Scripts

๐Ÿ“‚ Ready-to-Use Scripts - Click to expand

The examples/ directory contains ready-to-use scripts:

๐Ÿ”‘ Session Setup (Required First)

python examples/save_session.py

Creates Instagram session (one-time setup, then reused automatically).

๐ŸŽฎ Interactive Demo

python examples/all_in_one.py

Interactive menu with ALL features:

  • Follow/Unfollow users
  • Send messages
  • Collect followers/following
  • Batch operations
  • Profile scraping

๐Ÿš€ Production Scraping

python examples/main_advanced.py

Full automatic profile scraping:

  • Collects all post/reel links
  • Extracts data with parallel processing
  • Exports to Excel + JSON
  • Advanced diagnostics & error recovery

โš™๏ธ Configuration Examples

python examples/example_custom_config.py

Shows how to customize configuration (delays, viewport, etc.).

๐Ÿ“– Documentation

๐Ÿ“š Full API Documentation - Click to expand

1. Profile Scraping

from instaharvest import ProfileScraper
from instaharvest.config import ScraperConfig

config = ScraperConfig()
scraper = ProfileScraper(config=config)
session_data = scraper.load_session()
scraper.setup_browser(session_data)

profile = scraper.scrape('username')
print(f"Posts: {profile.posts}")
print(f"Followers: {profile.followers}")
print(f"Following: {profile.following}")

scraper.close()

2. Collect Followers/Following

from instaharvest import FollowersCollector
from instaharvest.config import ScraperConfig

# Create config
config = ScraperConfig()
collector = FollowersCollector(config=config)
session_data = collector.load_session()
collector.setup_browser(session_data)

# Collect first 100 followers
followers = collector.get_followers('username', limit=100, print_realtime=True)
print(f"Collected {len(followers)} followers")

# Collect following
following = collector.get_following('username', limit=50)

collector.close()

3. Follow/Unfollow Management

from instaharvest import FollowManager
from instaharvest.config import ScraperConfig

config = ScraperConfig()
manager = FollowManager(config=config)
session_data = manager.load_session()
manager.setup_browser(session_data)

# Follow a user
result = manager.follow('username')
print(result)  # {'status': 'success', 'action': 'followed', ...}

# Unfollow
result = manager.unfollow('username')

# Batch follow
usernames = ['user1', 'user2', 'user3']
results = manager.batch_follow(usernames)

manager.close()

4. Direct Messaging

from instaharvest import MessageManager
from instaharvest.config import ScraperConfig

config = ScraperConfig()
messenger = MessageManager(config=config)
session_data = messenger.load_session()
messenger.setup_browser(session_data)

# Send single message
result = messenger.send_message('username', 'Hello!')

# Batch send
usernames = ['user1', 'user2']
results = messenger.batch_send(usernames, 'Hi there!')

messenger.close()

5. Shared Browser (Recommended!)

Use one browser for all operations - Much faster!

from instaharvest import SharedBrowser
from instaharvest.config import ScraperConfig

# Create config
config = ScraperConfig()

with SharedBrowser(config=config) as browser:
    # All operations use the same browser instance
    browser.follow('user1')
    browser.send_message('user1', 'Hello!')
    followers = browser.get_followers('user2', limit=100)
    profile = browser.scrape_profile('user3')

    # No browser reopening! Fast and efficient!

6. Advanced: Parallel Processing

from instaharvest import InstagramOrchestrator, ScraperConfig

config = ScraperConfig(headless=True)
orchestrator = InstagramOrchestrator(config)

# Scrape with 3 parallel workers + Excel export
results = orchestrator.scrape_complete_profile_advanced(
    'username',
    parallel=3,        # 3 parallel browser tabs
    save_excel=True,   # Real-time Excel export
    max_posts=100
)

print(f"Scraped {len(results['posts_data'])} posts")

7. Post Data Extraction

from instaharvest import PostDataScraper
from instaharvest.config import ScraperConfig

config = ScraperConfig()
scraper = PostDataScraper(config=config)
session_data = scraper.load_session()
scraper.setup_browser(session_data)

# Scrape single post
post = scraper.scrape('https://www.instagram.com/p/POST_ID/')
print(f"Tagged: {post.tagged_accounts}")
print(f"Likes: {post.likes}")
print(f"Date: {post.timestamp}")

scraper.close()

๐ŸŽฏ Complete Workflow Example

๐Ÿ”„ Full Automation Workflow - Click to expand
from instaharvest import SharedBrowser
from instaharvest.config import ScraperConfig

# Create config
config = ScraperConfig()

with SharedBrowser(config=config) as browser:
    # 1. Get profile stats
    profile = browser.scrape_profile('target_user')
    print(f"Target has {profile['followers']} followers")

    # 2. Collect their followers
    followers = browser.get_followers('target_user', limit=50)
    print(f"Collected {len(followers)} followers")

    # 3. Follow them
    for follower in followers[:10]:  # Follow first 10
        result = browser.follow(follower)
        if result['status'] == 'success':
            print(f"โœ“ Followed {follower}")

    # 4. Send welcome message
    for follower in followers[:5]:
        browser.send_message(follower, "Thanks for following!")

๐Ÿ“‹ Requirements

  • Python 3.8+
  • Playwright (with Chrome browser)
  • pandas
  • openpyxl
  • beautifulsoup4
  • lxml

๐Ÿ”ง Session Setup

First-time setup - Save your Instagram session:

Method 1: Using Library Function (Recommended) โญ

from instaharvest import save_session

# Create session - opens browser for manual login
save_session()

Method 2: Using Example Script

python examples/save_session.py

Both methods will:

  1. Open Chrome browser
  2. Let you log in to Instagram manually
  3. Save session to instagram_session.json
  4. All future scripts will use this session (no re-login needed!)

๐Ÿ“ Project Structure

๐Ÿ—‚๏ธ Package Structure - Click to expand
instaharvest/
โ”œโ”€โ”€ instaharvest/          # Main package
โ”‚   โ”œโ”€โ”€ __init__.py        # Package entry point
โ”‚   โ”œโ”€โ”€ base.py            # Base scraper class
โ”‚   โ”œโ”€โ”€ config.py          # Configuration
โ”‚   โ”œโ”€โ”€ profile.py         # Profile scraping
โ”‚   โ”œโ”€โ”€ followers.py       # Followers collection
โ”‚   โ”œโ”€โ”€ follow.py          # Follow/unfollow
โ”‚   โ”œโ”€โ”€ message.py         # Direct messaging
โ”‚   โ”œโ”€โ”€ post_data.py       # Post data extraction
โ”‚   โ”œโ”€โ”€ shared_browser.py  # Shared browser manager
โ”‚   โ””โ”€โ”€ ...                # More modules
โ”œโ”€โ”€ examples/              # Example scripts
โ”œโ”€โ”€ README.md              # This file
โ”œโ”€โ”€ setup.py               # Package setup
โ””โ”€โ”€ LICENSE                # MIT License

โš™๏ธ Configuration

๐Ÿ› ๏ธ Configuration Options - Click to expand
from instaharvest import ScraperConfig

config = ScraperConfig(
    headless=True,              # Run in headless mode
    viewport_width=1920,
    viewport_height=1080,
    default_timeout=30000,      # 30 seconds
    max_scroll_attempts=50,
    log_level='INFO'
)

๐Ÿ›ก๏ธ Best Practices

โœ… Recommended Practices - Click to expand
  1. Use SharedBrowser - Reuses browser instance, much faster
  2. Rate Limiting - Built-in delays to avoid Instagram bans
  3. Session Management - Auto-refreshes session to prevent expiration
  4. Error Handling - Comprehensive exception handling
  5. Logging - Professional logging for debugging

๐Ÿ”ง Troubleshooting

๐Ÿ” Common Issues & Solutions - Click to expand

Installation Issues

Error: "playwright command not found"

# Solution: Install Playwright first
pip install playwright
playwright install chrome

Error: "No module named 'instaharvest'"

# Solution 1: If installed from PyPI
pip install instaharvest

# Solution 2: If using GitHub clone
cd /path/to/insta-harvester
pip install -e .

# Solution 3: Run from project directory
cd /path/to/insta-harvester
python examples/save_session.py  # Works without installation

Error: "Could not find Chrome browser"

# Solution: Install Playwright browsers
playwright install chrome

Session Issues

Error: "Session file not found"

# Solution: Create session first (REQUIRED!)
cd examples
python save_session.py

# Then run your script
python all_in_one.py  # or any other script

Error: "Login required" or "Session expired"

# Solution: Re-create session
cd examples
python save_session.py

# Log in again when browser opens

Operation Errors

Error: "Could not unfollow @username"

Cause: Unfollow popup appears too slowly for the program

Solution: Increase popup delays in configuration

from instaharvest import FollowManager
from instaharvest.config import ScraperConfig

config = ScraperConfig(
    popup_open_delay=4.0,       # Wait longer for popup
    action_delay_min=3.0,
    action_delay_max=4.5,
)

manager = FollowManager(config=config)

See Configuration Guide for detailed configuration options.

Error: "Could not follow @username"

Solution:

config = ScraperConfig(
    button_click_delay=3.0,
    action_delay_min=2.5,
    action_delay_max=4.0,
)

Error: "Instagram says 'Try again later'"

Cause: Instagram rate limiting - you're doing too much too quickly

Solution: Increase rate limiting delays

config = ScraperConfig(
    follow_delay_min=10.0,      # Wait 10-15 seconds between follows
    follow_delay_max=15.0,
    message_delay_min=15.0,     # Wait 15-20 seconds between messages
    message_delay_max=20.0,
)

Slow Internet Issues

Problem: You have slow internet, pages load slowly, getting errors

Solution:

from instaharvest.config import ScraperConfig

config = ScraperConfig(
    page_load_delay=5.0,        # Wait longer for pages
    popup_open_delay=4.0,       # Wait longer for popups
    scroll_delay_min=3.0,       # Slower scrolling
    scroll_delay_max=5.0,
)

# Use with any manager
from instaharvest import FollowManager
manager = FollowManager(config=config)

Getting Help

  1. Check documentation:

  2. Common issues:

    • Unfollow errors โ†’ Increase popup_open_delay
    • Slow internet โ†’ Increase all delays
    • Rate limiting โ†’ Increase follow_delay_* and message_delay_*
  3. Report bugs:

  4. Email support:


โš ๏ธ Disclaimer

This tool is for educational purposes only. Make sure to:

  • Follow Instagram's Terms of Service
  • Respect rate limits
  • Don't spam or harass users
  • Use responsibly

The authors are not responsible for any misuse of this library.


๐Ÿ“œ License

MIT License - see LICENSE file for details


๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.


๐Ÿ“ž Support


๐ŸŽ‰ Acknowledgments

Built with:


Made with โค๏ธ by Doston

Happy Harvesting! ๐ŸŒพ

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

instaharvest-2.5.3.tar.gz (72.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

instaharvest-2.5.3-py3-none-any.whl (73.2 kB view details)

Uploaded Python 3

File details

Details for the file instaharvest-2.5.3.tar.gz.

File metadata

  • Download URL: instaharvest-2.5.3.tar.gz
  • Upload date:
  • Size: 72.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for instaharvest-2.5.3.tar.gz
Algorithm Hash digest
SHA256 308082e592042e1dbd3253a4fec63062c06b7e5975ac11a8bf26724e5f3e43f2
MD5 7e2b1c9982499b57cdd0bf48c5ce9a22
BLAKE2b-256 cab3b12806d8e2c7029bcf5e1a7d7165ed7c6ef9bc92be557797ea4e4001ef09

See more details on using hashes here.

File details

Details for the file instaharvest-2.5.3-py3-none-any.whl.

File metadata

  • Download URL: instaharvest-2.5.3-py3-none-any.whl
  • Upload date:
  • Size: 73.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for instaharvest-2.5.3-py3-none-any.whl
Algorithm Hash digest
SHA256 fa28ecda847f881552387a2b44f383fa802e5ab66073b0df5f5d75955f078d50
MD5 be70da5a1a96c084f09691da29db8db3
BLAKE2b-256 5227f71b6fc863158be78439d61f535adaefbc9bf1f30205deac8ff20c7795ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page