Professional Instagram data collection toolkit with automation features
Project description
InstaHarvest ๐พ
Professional Instagram Data Collection Toolkit - A powerful and efficient library for Instagram automation, data collection, and analytics.
๐ Documentation | ๐ Report Bug | ๐ก Request Feature | ๐ค Contributing | ๐ Changelog
โจ Features
- ๐ Profile Statistics - Collect followers, following, posts count
- ๐ Post & Reel Links - Intelligent scrolling and link collection
- ๐ท๏ธ Tagged Accounts - Extract tags from posts and reels
- ๐ฅ Followers/Following - Collect lists with real-time output
- ๐ฌ Direct Messaging - Send DMs with smart rate limiting
- ๐ค Follow/Unfollow - Manage following with rate limiting
- โก Parallel Processing - Scrape multiple posts simultaneously
- ๐ Excel Export - Real-time data export to Excel
- ๐ Shared Browser - Single browser for all operations
- ๐ HTML Detection - Automatic structure change detection
- ๐ Professional Logging - Comprehensive logging system
๐ Installation
๐ฆ Method 1: Install from PyPI (Recommended) - Click to expand
# Install the package
pip install instaharvest
# Install Playwright browser
playwright install chrome
๐ง Method 2: Install from GitHub (Latest Development Version) - Click to expand
Step 1: Clone the Repository
git clone https://github.com/mpython77/insta-harvester.git
cd insta-harvester
Step 2: Install Dependencies
# Install Python dependencies
pip install -r requirements.txt
# Install Playwright browser
playwright install chrome
Step 3: Install Package in Development Mode (Optional)
# Install as editable package
pip install -e .
OR simply use it without installation:
# Just make sure you're in the project directory
cd /path/to/insta-harvester
# Then run examples
python examples/save_session.py
๐ง Complete Setup Guide
๐ Step-by-Step Setup Instructions - Click to expand
Step 1: Verify Python Installation
# Check Python version (requires 3.8+)
python --version
# Should show: Python 3.8.0 or higher
Step 2: Install InstaHarvest
From GitHub:
git clone https://github.com/mpython77/insta-harvester.git
cd insta-harvester
pip install -r requirements.txt
playwright install chrome
From PyPI:
pip install instaharvest
playwright install chrome
Step 3: Create Instagram Session (REQUIRED!)
Option A: Using Python (Recommended) โญ
from instaharvest import save_session
save_session()
Option B: Using Example Script
# Navigate to examples directory
cd examples
# Run session setup script
python save_session.py
This will:
- Open Chrome browser
- Navigate to Instagram
- Let you log in manually
- Save your session to
instagram_session.json - All future scripts will use this session (no re-login needed!)
Important: Without this session file, the library won't work!
Step 4: Test Your Setup
# First, create your Instagram session (required!)
python examples/save_session.py
# Try the all-in-one interactive demo (recommended for learning)
python examples/all_in_one.py
# Or try production scraping
python examples/main_advanced.py
โ ๏ธ IMPORTANT: Always Use ScraperConfig! All examples below use
ScraperConfig()for proper timing and reliability. Even when using default settings, explicitly creating config is best practice. This prevents timing issues with popups, buttons, and rate limits. See Configuration Guide for customization options.
๐ First-Time Setup
Before using any features, create an Instagram session (one-time setup):
from instaharvest import save_session
# Create session - this will open a browser
save_session()
# Follow the prompts:
# 1. Browser will open automatically
# 2. Login to Instagram manually
# 3. Press ENTER in terminal when done
# 4. Session saved to instagram_session.json โ
That's it! Now you can use all library features. The session will be reused automatically.
๐ Quick Start Examples
Example 1: Follow a User - Click to expand
from instaharvest import FollowManager
from instaharvest.config import ScraperConfig
# Create config (customize if needed)
config = ScraperConfig()
# Create manager with config
manager = FollowManager(config=config)
# Load session
session_data = manager.load_session()
manager.setup_browser(session_data)
# Follow someone
result = manager.follow("instagram")
print(result) # {'success': True, 'status': 'followed', ...}
# Clean up
manager.close()
Example 2: Send Direct Message - Click to expand
from instaharvest import MessageManager
from instaharvest.config import ScraperConfig
# Create config
config = ScraperConfig()
manager = MessageManager(config=config)
session_data = manager.load_session()
manager.setup_browser(session_data)
# Send message
result = manager.send_message("username", "Hello from Python!")
print(result)
manager.close()
Example 3: Collect Followers - Click to expand
from instaharvest import FollowersCollector
from instaharvest.config import ScraperConfig
# Create config
config = ScraperConfig()
collector = FollowersCollector(config=config)
session_data = collector.load_session()
collector.setup_browser(session_data)
# Collect first 100 followers
followers = collector.get_followers("username", limit=100, print_realtime=True)
print(f"Collected {len(followers)} followers")
collector.close()
Example 4: All Operations in One Browser (SharedBrowser) - Click to expand
from instaharvest import SharedBrowser
from instaharvest.config import ScraperConfig
# Create config for better reliability
config = ScraperConfig()
# One browser for everything!
with SharedBrowser(config=config) as browser:
# Follow users
browser.follow("user1")
browser.follow("user2")
# Send messages
browser.send_message("user1", "Thanks for the follow!")
# Collect followers
followers = browser.get_followers("my_account", limit=50)
print(f"Followers: {len(followers)}")
๐ Example Scripts
๐ Ready-to-Use Scripts - Click to expand
The examples/ directory contains ready-to-use scripts:
๐ Session Setup (Required First)
python examples/save_session.py
Creates Instagram session (one-time setup, then reused automatically).
๐ฎ Interactive Demo
python examples/all_in_one.py
Interactive menu with ALL features:
- Follow/Unfollow users
- Send messages
- Collect followers/following
- Batch operations
- Profile scraping
๐ Production Scraping
python examples/main_advanced.py
Full automatic profile scraping:
- Collects all post/reel links
- Extracts data with parallel processing
- Exports to Excel + JSON
- Advanced diagnostics & error recovery
โ๏ธ Configuration Examples
python examples/example_custom_config.py
Shows how to customize configuration (delays, viewport, etc.).
๐ Documentation
๐ Full API Documentation - Click to expand
1. Profile Scraping
from instaharvest import ProfileScraper
from instaharvest.config import ScraperConfig
config = ScraperConfig()
scraper = ProfileScraper(config=config)
session_data = scraper.load_session()
scraper.setup_browser(session_data)
profile = scraper.scrape('username')
print(f"Posts: {profile.posts}")
print(f"Followers: {profile.followers}")
print(f"Following: {profile.following}")
scraper.close()
2. Collect Followers/Following
from instaharvest import FollowersCollector
from instaharvest.config import ScraperConfig
# Create config
config = ScraperConfig()
collector = FollowersCollector(config=config)
session_data = collector.load_session()
collector.setup_browser(session_data)
# Collect first 100 followers
followers = collector.get_followers('username', limit=100, print_realtime=True)
print(f"Collected {len(followers)} followers")
# Collect following
following = collector.get_following('username', limit=50)
collector.close()
3. Follow/Unfollow Management
from instaharvest import FollowManager
from instaharvest.config import ScraperConfig
config = ScraperConfig()
manager = FollowManager(config=config)
session_data = manager.load_session()
manager.setup_browser(session_data)
# Follow a user
result = manager.follow('username')
print(result) # {'status': 'success', 'action': 'followed', ...}
# Unfollow
result = manager.unfollow('username')
# Batch follow
usernames = ['user1', 'user2', 'user3']
results = manager.batch_follow(usernames)
manager.close()
4. Direct Messaging
from instaharvest import MessageManager
from instaharvest.config import ScraperConfig
config = ScraperConfig()
messenger = MessageManager(config=config)
session_data = messenger.load_session()
messenger.setup_browser(session_data)
# Send single message
result = messenger.send_message('username', 'Hello!')
# Batch send
usernames = ['user1', 'user2']
results = messenger.batch_send(usernames, 'Hi there!')
messenger.close()
5. Shared Browser (Recommended!)
Use one browser for all operations - Much faster!
from instaharvest import SharedBrowser
from instaharvest.config import ScraperConfig
# Create config
config = ScraperConfig()
with SharedBrowser(config=config) as browser:
# All operations use the same browser instance
browser.follow('user1')
browser.send_message('user1', 'Hello!')
followers = browser.get_followers('user2', limit=100)
profile = browser.scrape_profile('user3')
# No browser reopening! Fast and efficient!
6. Advanced: Parallel Processing
from instaharvest import InstagramOrchestrator, ScraperConfig
config = ScraperConfig(headless=True)
orchestrator = InstagramOrchestrator(config)
# Scrape with 3 parallel workers + Excel export
results = orchestrator.scrape_complete_profile_advanced(
'username',
parallel=3, # 3 parallel browser tabs
save_excel=True, # Real-time Excel export
max_posts=100
)
print(f"Scraped {len(results['posts_data'])} posts")
7. Post Data Extraction
from instaharvest import PostDataScraper
from instaharvest.config import ScraperConfig
config = ScraperConfig()
scraper = PostDataScraper(config=config)
session_data = scraper.load_session()
scraper.setup_browser(session_data)
# Scrape single post
post = scraper.scrape('https://www.instagram.com/p/POST_ID/')
print(f"Tagged: {post.tagged_accounts}")
print(f"Likes: {post.likes}")
print(f"Date: {post.timestamp}")
scraper.close()
๐ฏ Complete Workflow Example
๐ Full Automation Workflow - Click to expand
from instaharvest import SharedBrowser
from instaharvest.config import ScraperConfig
# Create config
config = ScraperConfig()
with SharedBrowser(config=config) as browser:
# 1. Get profile stats
profile = browser.scrape_profile('target_user')
print(f"Target has {profile['followers']} followers")
# 2. Collect their followers
followers = browser.get_followers('target_user', limit=50)
print(f"Collected {len(followers)} followers")
# 3. Follow them
for follower in followers[:10]: # Follow first 10
result = browser.follow(follower)
if result['status'] == 'success':
print(f"โ Followed {follower}")
# 4. Send welcome message
for follower in followers[:5]:
browser.send_message(follower, "Thanks for following!")
๐ Requirements
- Python 3.8+
- Playwright (with Chrome browser)
- pandas
- openpyxl
- beautifulsoup4
- lxml
๐ง Session Setup
First-time setup - Save your Instagram session:
Method 1: Using Library Function (Recommended) โญ
from instaharvest import save_session
# Create session - opens browser for manual login
save_session()
Method 2: Using Example Script
python examples/save_session.py
Both methods will:
- Open Chrome browser
- Let you log in to Instagram manually
- Save session to
instagram_session.json - All future scripts will use this session (no re-login needed!)
๐ Project Structure
๐๏ธ Package Structure - Click to expand
instaharvest/
โโโ instaharvest/ # Main package
โ โโโ __init__.py # Package entry point
โ โโโ base.py # Base scraper class
โ โโโ config.py # Configuration
โ โโโ profile.py # Profile scraping
โ โโโ followers.py # Followers collection
โ โโโ follow.py # Follow/unfollow
โ โโโ message.py # Direct messaging
โ โโโ post_data.py # Post data extraction
โ โโโ shared_browser.py # Shared browser manager
โ โโโ ... # More modules
โโโ examples/ # Example scripts
โโโ README.md # This file
โโโ setup.py # Package setup
โโโ LICENSE # MIT License
โ๏ธ Configuration
๐ ๏ธ Configuration Options - Click to expand
from instaharvest import ScraperConfig
config = ScraperConfig(
headless=True, # Run in headless mode
viewport_width=1920,
viewport_height=1080,
default_timeout=30000, # 30 seconds
max_scroll_attempts=50,
log_level='INFO'
)
๐ก๏ธ Best Practices
โ Recommended Practices - Click to expand
- Use SharedBrowser - Reuses browser instance, much faster
- Rate Limiting - Built-in delays to avoid Instagram bans
- Session Management - Auto-refreshes session to prevent expiration
- Error Handling - Comprehensive exception handling
- Logging - Professional logging for debugging
๐ง Troubleshooting
๐ Common Issues & Solutions - Click to expand
Installation Issues
Error: "playwright command not found"
# Solution: Install Playwright first
pip install playwright
playwright install chrome
Error: "No module named 'instaharvest'"
# Solution 1: If installed from PyPI
pip install instaharvest
# Solution 2: If using GitHub clone
cd /path/to/insta-harvester
pip install -e .
# Solution 3: Run from project directory
cd /path/to/insta-harvester
python examples/save_session.py # Works without installation
Error: "Could not find Chrome browser"
# Solution: Install Playwright browsers
playwright install chrome
Session Issues
Error: "Session file not found"
# Solution: Create session first (REQUIRED!)
cd examples
python save_session.py
# Then run your script
python all_in_one.py # or any other script
Error: "Login required" or "Session expired"
# Solution: Re-create session
cd examples
python save_session.py
# Log in again when browser opens
Operation Errors
Error: "Could not unfollow @username"
Cause: Unfollow popup appears too slowly for the program
Solution: Increase popup delays in configuration
from instaharvest import FollowManager
from instaharvest.config import ScraperConfig
config = ScraperConfig(
popup_open_delay=4.0, # Wait longer for popup
action_delay_min=3.0,
action_delay_max=4.5,
)
manager = FollowManager(config=config)
See Configuration Guide for detailed configuration options.
Error: "Could not follow @username"
Solution:
config = ScraperConfig(
button_click_delay=3.0,
action_delay_min=2.5,
action_delay_max=4.0,
)
Error: "Instagram says 'Try again later'"
Cause: Instagram rate limiting - you're doing too much too quickly
Solution: Increase rate limiting delays
config = ScraperConfig(
follow_delay_min=10.0, # Wait 10-15 seconds between follows
follow_delay_max=15.0,
message_delay_min=15.0, # Wait 15-20 seconds between messages
message_delay_max=20.0,
)
Slow Internet Issues
Problem: You have slow internet, pages load slowly, getting errors
Solution:
from instaharvest.config import ScraperConfig
config = ScraperConfig(
page_load_delay=5.0, # Wait longer for pages
popup_open_delay=4.0, # Wait longer for popups
scroll_delay_min=3.0, # Slower scrolling
scroll_delay_max=5.0,
)
# Use with any manager
from instaharvest import FollowManager
manager = FollowManager(config=config)
Getting Help
-
Check documentation:
- README.md - Main guide
- Configuration Guide - Complete configuration reference
- Examples Guide - Example scripts guide
- Changelog - Version history and changes
- Contributing - How to contribute
-
Common issues:
- Unfollow errors โ Increase
popup_open_delay - Slow internet โ Increase all delays
- Rate limiting โ Increase
follow_delay_*andmessage_delay_*
- Unfollow errors โ Increase
-
Report bugs:
- GitHub Issues: https://github.com/mpython77/insta-harvester/issues
- See
CONTRIBUTING.mdfor bug report guidelines
-
Email support:
โ ๏ธ Disclaimer
This tool is for educational purposes only. Make sure to:
- Follow Instagram's Terms of Service
- Respect rate limits
- Don't spam or harass users
- Use responsibly
The authors are not responsible for any misuse of this library.
๐ License
MIT License - see LICENSE file for details
๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
๐ Support
- GitHub Issues: Report a bug
- Documentation: Read the docs
- Email: kelajak054@gmail.com
๐ Acknowledgments
Built with:
- Playwright - Browser automation
- Pandas - Data processing
- BeautifulSoup - HTML parsing
Made with โค๏ธ by Doston
Happy Harvesting! ๐พ
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file instaharvest-2.5.3.tar.gz.
File metadata
- Download URL: instaharvest-2.5.3.tar.gz
- Upload date:
- Size: 72.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
308082e592042e1dbd3253a4fec63062c06b7e5975ac11a8bf26724e5f3e43f2
|
|
| MD5 |
7e2b1c9982499b57cdd0bf48c5ce9a22
|
|
| BLAKE2b-256 |
cab3b12806d8e2c7029bcf5e1a7d7165ed7c6ef9bc92be557797ea4e4001ef09
|
File details
Details for the file instaharvest-2.5.3-py3-none-any.whl.
File metadata
- Download URL: instaharvest-2.5.3-py3-none-any.whl
- Upload date:
- Size: 73.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa28ecda847f881552387a2b44f383fa802e5ab66073b0df5f5d75955f078d50
|
|
| MD5 |
be70da5a1a96c084f09691da29db8db3
|
|
| BLAKE2b-256 |
5227f71b6fc863158be78439d61f535adaefbc9bf1f30205deac8ff20c7795ee
|