CLI tool to scrape posts, comments, and reactions from private Facebook groups
Project description
Forage
CLI tool to scrape posts, comments, and reactions from private Facebook groups using browser automation.
Table of Contents
- Installation
- Quick Start
- Usage
- Output Formats
- Data Analysis
- Development
- Roadmap
- Security
- Support
- License
Installation
From PyPI
# Install with pip
pip install ForageFacebook
# Or with uv
uv pip install ForageFacebook
# Install Playwright browsers
playwright install chromium
From Source
# Clone and install with uv
git clone https://github.com/jwmoss/forage.git
cd forage
uv sync
# Install Playwright browsers
uv run playwright install chromium
Quick Start
# Step 1: Log into Facebook (opens browser window)
uv run forage login
# Step 2: Scrape a group (last 7 days by default)
uv run forage scrape https://www.facebook.com/groups/your-group-id -o data.json
Usage
Login
# Default: opens Chromium browser
uv run forage login
# Use Firefox instead
uv run forage login --browser firefox
Scrape Posts
# Basic scrape (last 7 days, with comments)
uv run forage scrape your-group-slug
# Last 14 days, save to file
uv run forage scrape your-group-slug --days 14 -o posts.json
# Specific date range
uv run forage scrape your-group-slug --since 2024-01-01 --until 2024-01-15
# Skip comments (faster)
uv run forage scrape your-group-slug --skip-comments
# Only popular comments (5+ reactions)
uv run forage scrape your-group-slug --min-reactions 5
# Top 10 comments per post
uv run forage scrape your-group-slug --top-comments 10
# Watch the browser (debugging)
uv run forage scrape your-group-slug --no-headless -v
# Slower scraping to avoid rate limits
uv run forage scrape your-group-slug --delay 5.0
# Read group from stdin (for scripting)
echo "your-group-slug" | uv run forage scrape -
CLI Reference
forage [global flags] <command> [args]
Global Flags:
-v, --verbose Show progress and debug info
-q, --quiet Suppress non-error output
--no-color Disable colored output
--version Show version
--help Show help
Commands:
login Open browser for interactive Facebook login
scrape Scrape posts from a Facebook group
scrape flags
| Flag | Default | Description |
|---|---|---|
--days |
7 |
Posts from last N days |
--since |
- | Start date (ISO 8601: YYYY-MM-DD) |
--until |
- | End date (ISO 8601: YYYY-MM-DD) |
--limit |
0 |
Max posts (0 = unlimited) |
--delay |
2.0 |
Seconds between page loads |
--min-reactions |
0 |
Min reactions for comments |
--top-comments |
0 |
Top N comments per post |
--skip-comments |
false |
Skip comment fetching |
--skip-reactions |
false |
Skip reaction counts |
-o, --output |
- |
Output file (default: stdout) |
-f, --format |
json |
Output format: json, sqlite, csv |
--no-headless |
false |
Show browser window |
--browser |
chromium |
Browser: chromium, firefox, webkit |
SQLite Export
Export directly to SQLite for easier analysis:
# Export to SQLite database
uv run forage scrape your-group-slug -f sqlite -o data.db
# Query with sqlite3
sqlite3 data.db "SELECT content, reactions_total FROM posts ORDER BY reactions_total DESC LIMIT 10"
# Join posts with comments
sqlite3 data.db "SELECT p.content, c.content FROM posts p JOIN comments c ON c.post_id = p.id"
CSV Export
Export to CSV for spreadsheet analysis:
# Export to CSV (creates posts.csv and posts.comments.csv)
uv run forage scrape your-group-slug -f csv -o posts.csv
# Open in Excel/Numbers/Sheets or analyze with csvkit
csvstat posts.csv
csvcut -c author_name,content,reactions_total posts.csv | head -20
Output Format
{
"group": {
"id": "123456",
"name": "My Group",
"url": "https://www.facebook.com/groups/123456"
},
"scraped_at": "2024-01-20T15:30:00Z",
"date_range": {
"since": "2024-01-13",
"until": "2024-01-20"
},
"posts": [
{
"id": "pfbid...",
"author": {
"name": "Jane Doe",
"profile_url": "https://facebook.com/jane.doe"
},
"content": "Post text here...",
"timestamp": "2024-01-19T12:00:00Z",
"reactions": {
"total": 42,
"like": 0,
"love": 0,
"haha": 0,
"wow": 0,
"sad": 0,
"angry": 0
},
"comments_count": 15,
"comments": [
{
"id": "comment_...",
"author": {"name": "John Smith", "profile_url": "..."},
"content": "Comment text...",
"timestamp": null,
"reactions": {"total": 5},
"replies": []
}
]
}
]
}
Data Analysis Examples
# Top 10 posts by reactions
uv run forage scrape mygroup --skip-comments | \
jq '.posts | sort_by(.reactions.total) | reverse | .[0:10]'
# All post content
uv run forage scrape mygroup --skip-comments | \
jq '.posts[].content'
# Posts with 50+ reactions
uv run forage scrape mygroup | \
jq '.posts | map(select(.reactions.total >= 50))'
# Count posts per author
uv run forage scrape mygroup | \
jq '.posts | group_by(.author.name) | map({author: .[0].author.name, count: length}) | sort_by(.count) | reverse'
Development
# Install dev dependencies
uv sync --extra dev
# Run type checker
uv run ty check src/
# Run tests
uv run pytest
Architecture
src/forage/
├── cli.py # Click CLI commands
├── auth.py # Session management (login, cookies)
├── scraper.py # Core scraping logic
├── parser.py # HTML parsing for posts/comments
└── models.py # Pydantic data models
Limitations
- Requires manual login (no automated auth)
- Facebook's HTML structure changes frequently
- Rate limiting may require slower scraping
- Individual reaction types not broken out (only total)
- Session cookies expire after ~30 days
Roadmap
Planned features and improvements:
High Priority
- Cookie import - Import cookies from browser extensions (EditThisCookie, Netscape format)
- Incremental scraping - Only fetch posts newer than last scrape
- Progress persistence - Resume interrupted scrapes
Medium Priority
- Multiple groups - Scrape multiple groups in one command
- Media extraction - Download images/videos from posts
- Reaction breakdown - Extract individual reaction types (like, love, etc.)
- Author statistics - Aggregate stats per author
- Scheduled scraping - Cron-friendly mode with locking
Nice to Have
- Web UI - Local web interface for browsing scraped data
- Webhook notifications - Notify on new posts matching criteria
- Public group support - Scrape without login for public groups
- Parallel scraping - Speed up multi-group scrapes
Contributing
See CONTRIBUTING.md for development setup and guidelines.
Security
See SECURITY.md for security considerations and best practices.
Support
If you find this tool useful, consider sponsoring development:
License
MPL-2.0 (Mozilla Public License 2.0)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file foragefacebook-1.0.8.tar.gz.
File metadata
- Download URL: foragefacebook-1.0.8.tar.gz
- Upload date:
- Size: 78.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2def29392d2b34da36b75a58935c24f19eb1abf657822a92e1804546dcfb6c9
|
|
| MD5 |
7d85b7cb0090ed392fe393e5bbffcd2a
|
|
| BLAKE2b-256 |
f9776fb2b626d5bf2f99198d0ef54f15b32d4307fd3ab41ff3d3229c8ca07d32
|
Provenance
The following attestation bundles were made for foragefacebook-1.0.8.tar.gz:
Publisher:
publish.yml on jwmoss/forage
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
foragefacebook-1.0.8.tar.gz -
Subject digest:
d2def29392d2b34da36b75a58935c24f19eb1abf657822a92e1804546dcfb6c9 - Sigstore transparency entry: 854886690
- Sigstore integration time:
-
Permalink:
jwmoss/forage@467f56d877cb585b40a5b4ccf0065b4429a9fa06 -
Branch / Tag:
refs/tags/v1.0.8 - Owner: https://github.com/jwmoss
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@467f56d877cb585b40a5b4ccf0065b4429a9fa06 -
Trigger Event:
release
-
Statement type:
File details
Details for the file foragefacebook-1.0.8-py3-none-any.whl.
File metadata
- Download URL: foragefacebook-1.0.8-py3-none-any.whl
- Upload date:
- Size: 30.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d3110d84369a847c781396336b2eb1aa5e453e59ba4fff234c20a0e437d16713
|
|
| MD5 |
6a491fe26a2512c2e0c4eb42af27b8e9
|
|
| BLAKE2b-256 |
0ffd3047d9d7e817d4baab16f20f9b20f55df4999455c9c1d37f996200232ed6
|
Provenance
The following attestation bundles were made for foragefacebook-1.0.8-py3-none-any.whl:
Publisher:
publish.yml on jwmoss/forage
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
foragefacebook-1.0.8-py3-none-any.whl -
Subject digest:
d3110d84369a847c781396336b2eb1aa5e453e59ba4fff234c20a0e437d16713 - Sigstore transparency entry: 854886695
- Sigstore integration time:
-
Permalink:
jwmoss/forage@467f56d877cb585b40a5b4ccf0065b4429a9fa06 -
Branch / Tag:
refs/tags/v1.0.8 - Owner: https://github.com/jwmoss
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@467f56d877cb585b40a5b4ccf0065b4429a9fa06 -
Trigger Event:
release
-
Statement type: