Skip to main content

CLI tool to scrape posts, comments, and reactions from private Facebook groups

Project description

Forage

CI License: MPL 2.0

CLI tool to scrape posts, comments, and reactions from private Facebook groups using browser automation.

Table of Contents

Installation

From PyPI

# Install with pip
pip install ForageFacebook

# Or with uv
uv pip install ForageFacebook

# Install Playwright browsers
playwright install chromium

From Source

# Clone and install with uv
git clone https://github.com/jwmoss/forage.git
cd forage
uv sync

# Install Playwright browsers
uv run playwright install chromium

Quick Start

# Step 1: Log into Facebook (opens browser window)
uv run forage login

# Step 2: Scrape a group (last 7 days by default)
uv run forage scrape https://www.facebook.com/groups/your-group-id -o data.json

Usage

Login

# Default: opens Chromium browser
uv run forage login

# Use Firefox instead
uv run forage login --browser firefox

Scrape Posts

# Basic scrape (last 7 days, with comments)
uv run forage scrape your-group-slug

# Last 14 days, save to file
uv run forage scrape your-group-slug --days 14 -o posts.json

# Specific date range
uv run forage scrape your-group-slug --since 2024-01-01 --until 2024-01-15

# Skip comments (faster)
uv run forage scrape your-group-slug --skip-comments

# Only popular comments (5+ reactions)
uv run forage scrape your-group-slug --min-reactions 5

# Top 10 comments per post
uv run forage scrape your-group-slug --top-comments 10

# Watch the browser (debugging)
uv run forage scrape your-group-slug --no-headless -v

# Slower scraping to avoid rate limits
uv run forage scrape your-group-slug --delay 5.0

# Read group from stdin (for scripting)
echo "your-group-slug" | uv run forage scrape -

CLI Reference

forage [global flags] <command> [args]

Global Flags:
  -v, --verbose   Show progress and debug info
  -q, --quiet     Suppress non-error output
  --no-color      Disable colored output
  --version       Show version
  --help          Show help

Commands:
  login           Open browser for interactive Facebook login
  scrape          Scrape posts from a Facebook group

scrape flags

Flag Default Description
--days 7 Posts from last N days
--since - Start date (ISO 8601: YYYY-MM-DD)
--until - End date (ISO 8601: YYYY-MM-DD)
--limit 0 Max posts (0 = unlimited)
--delay 2.0 Seconds between page loads
--min-reactions 0 Min reactions for comments
--top-comments 0 Top N comments per post
--skip-comments false Skip comment fetching
--skip-reactions false Skip reaction counts
-o, --output - Output file (default: stdout)
-f, --format json Output format: json, sqlite, csv
--no-headless false Show browser window
--browser chromium Browser: chromium, firefox, webkit

SQLite Export

Export directly to SQLite for easier analysis:

# Export to SQLite database
uv run forage scrape your-group-slug -f sqlite -o data.db

# Query with sqlite3
sqlite3 data.db "SELECT content, reactions_total FROM posts ORDER BY reactions_total DESC LIMIT 10"

# Join posts with comments
sqlite3 data.db "SELECT p.content, c.content FROM posts p JOIN comments c ON c.post_id = p.id"

CSV Export

Export to CSV for spreadsheet analysis:

# Export to CSV (creates posts.csv and posts.comments.csv)
uv run forage scrape your-group-slug -f csv -o posts.csv

# Open in Excel/Numbers/Sheets or analyze with csvkit
csvstat posts.csv
csvcut -c author_name,content,reactions_total posts.csv | head -20

Output Format

{
  "group": {
    "id": "123456",
    "name": "My Group",
    "url": "https://www.facebook.com/groups/123456"
  },
  "scraped_at": "2024-01-20T15:30:00Z",
  "date_range": {
    "since": "2024-01-13",
    "until": "2024-01-20"
  },
  "posts": [
    {
      "id": "pfbid...",
      "author": {
        "name": "Jane Doe",
        "profile_url": "https://facebook.com/jane.doe"
      },
      "content": "Post text here...",
      "timestamp": "2024-01-19T12:00:00Z",
      "reactions": {
        "total": 42,
        "like": 0,
        "love": 0,
        "haha": 0,
        "wow": 0,
        "sad": 0,
        "angry": 0
      },
      "comments_count": 15,
      "comments": [
        {
          "id": "comment_...",
          "author": {"name": "John Smith", "profile_url": "..."},
          "content": "Comment text...",
          "timestamp": null,
          "reactions": {"total": 5},
          "replies": []
        }
      ]
    }
  ]
}

Data Analysis Examples

# Top 10 posts by reactions
uv run forage scrape mygroup --skip-comments | \
  jq '.posts | sort_by(.reactions.total) | reverse | .[0:10]'

# All post content
uv run forage scrape mygroup --skip-comments | \
  jq '.posts[].content'

# Posts with 50+ reactions
uv run forage scrape mygroup | \
  jq '.posts | map(select(.reactions.total >= 50))'

# Count posts per author
uv run forage scrape mygroup | \
  jq '.posts | group_by(.author.name) | map({author: .[0].author.name, count: length}) | sort_by(.count) | reverse'

Development

# Install dev dependencies
uv sync --extra dev

# Run type checker
uv run ty check src/

# Run tests
uv run pytest

Architecture

src/forage/
├── cli.py       # Click CLI commands
├── auth.py      # Session management (login, cookies)
├── scraper.py   # Core scraping logic
├── parser.py    # HTML parsing for posts/comments
└── models.py    # Pydantic data models

Limitations

  • Requires manual login (no automated auth)
  • Facebook's HTML structure changes frequently
  • Rate limiting may require slower scraping
  • Individual reaction types not broken out (only total)
  • Session cookies expire after ~30 days

Roadmap

Planned features and improvements:

High Priority

  • Cookie import - Import cookies from browser extensions (EditThisCookie, Netscape format)
  • Incremental scraping - Only fetch posts newer than last scrape
  • Progress persistence - Resume interrupted scrapes

Medium Priority

  • Multiple groups - Scrape multiple groups in one command
  • Media extraction - Download images/videos from posts
  • Reaction breakdown - Extract individual reaction types (like, love, etc.)
  • Author statistics - Aggregate stats per author
  • Scheduled scraping - Cron-friendly mode with locking

Nice to Have

  • Web UI - Local web interface for browsing scraped data
  • Webhook notifications - Notify on new posts matching criteria
  • Public group support - Scrape without login for public groups
  • Parallel scraping - Speed up multi-group scrapes

Contributing

See CONTRIBUTING.md for development setup and guidelines.

Security

See SECURITY.md for security considerations and best practices.

Support

If you find this tool useful, consider sponsoring development:

Sponsor

License

MPL-2.0 (Mozilla Public License 2.0)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

foragefacebook-1.0.8.tar.gz (78.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

foragefacebook-1.0.8-py3-none-any.whl (30.8 kB view details)

Uploaded Python 3

File details

Details for the file foragefacebook-1.0.8.tar.gz.

File metadata

  • Download URL: foragefacebook-1.0.8.tar.gz
  • Upload date:
  • Size: 78.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for foragefacebook-1.0.8.tar.gz
Algorithm Hash digest
SHA256 d2def29392d2b34da36b75a58935c24f19eb1abf657822a92e1804546dcfb6c9
MD5 7d85b7cb0090ed392fe393e5bbffcd2a
BLAKE2b-256 f9776fb2b626d5bf2f99198d0ef54f15b32d4307fd3ab41ff3d3229c8ca07d32

See more details on using hashes here.

Provenance

The following attestation bundles were made for foragefacebook-1.0.8.tar.gz:

Publisher: publish.yml on jwmoss/forage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file foragefacebook-1.0.8-py3-none-any.whl.

File metadata

  • Download URL: foragefacebook-1.0.8-py3-none-any.whl
  • Upload date:
  • Size: 30.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for foragefacebook-1.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 d3110d84369a847c781396336b2eb1aa5e453e59ba4fff234c20a0e437d16713
MD5 6a491fe26a2512c2e0c4eb42af27b8e9
BLAKE2b-256 0ffd3047d9d7e817d4baab16f20f9b20f55df4999455c9c1d37f996200232ed6

See more details on using hashes here.

Provenance

The following attestation bundles were made for foragefacebook-1.0.8-py3-none-any.whl:

Publisher: publish.yml on jwmoss/forage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page