Flexible multimodal scraper for social media and the open web.

These details have not been verified by PyPI

Project links

Project description

scrapeMM: Multimodal Web Retrieval

Simple web scraper to asynchronously retrieve webpages and access social media contents, fetching text along with media, i.e., images and videos.

This library aims to help developers and researchers to easily access multimodal data from the web and use it for LLM processing.

Setup

If you want to download videos: Then, the installation of ffmpeg is highly recommended. In Conda, you can install it with conda install -c conda-forge ffmpeg.
If you want to scrape Perma.cc archive records or Facebook photos, you'll need to install playwright with pip install playwright and running playwright install.

Usage

from scrapemm import retrieve
import asyncio

if __name__ == "__main__":
    url = "https://www.snopes.com/fact-check/gauze-originate-from-gaza/"
    result = asyncio.run(retrieve(url))
    if result.errors:
        print(result.errors)
    else:
        print(result.content)

scrapeMM will ask you for the API secrets needed for the integrations. You may skip them if you don't need them.

You will also be prompted to choose a password that is used to secure the secrets in an encrypted file.

How it works

Input:                                  Output:
URL (string)   -->   retrieve()   -->   MultimodalSequence

The MultimodalSequence is a sequence of Markdown-formatted text and media provided by the ezMM library.

Web scraping is done with Firecrawl and Decodo.

Supported Platforms

Social Media

✅ X/Twitter
✅ Telegram
✅ Bluesky
✅ TikTok
✅ YouTube
(✅️) Instagram: works for most content
✅️ Facebook
❌ Threads: TBD
❌ Reddit: TBD

Archiving Services

✅ Perma.cc
(✅) Archive.today: Sometimes ending up in TimeoutErrors, generally pretty slow
✅ MediaVault (mvau.lt)
❌ Wayback Machine, Internet Archive (web.archive.org)
❌ AwesomeScreenshot.com
❌ Ghost Archive (ghostarchive.org)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.2

Apr 14, 2026

0.6.1

Apr 13, 2026

This version

0.6.0

Apr 13, 2026

0.5.5

Dec 19, 2025

0.5.4

Dec 16, 2025

0.5.3

Dec 14, 2025

0.5.2

Dec 13, 2025

0.5.1

Dec 12, 2025

0.5.0

Dec 12, 2025

0.4.5

Dec 9, 2025

0.4.4

Dec 9, 2025

0.4.3

Dec 8, 2025

0.4.2

Dec 8, 2025

0.4.1

Dec 6, 2025

0.4.0

Dec 3, 2025

0.3.6

Dec 2, 2025

0.3.5

Dec 2, 2025

0.3.4

Nov 21, 2025

0.3.3

Nov 13, 2025

0.3.2

Nov 12, 2025

0.3.1

Nov 11, 2025

0.3.0

Nov 10, 2025

0.2.2

Jul 29, 2025

0.2.1

Jul 11, 2025

0.1.3

Jul 7, 2025

0.1.2

Jul 2, 2025

0.1.1

Jul 2, 2025

0.1.0

Jul 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapemm-0.6.0.tar.gz (53.1 kB view details)

Uploaded Apr 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scrapemm-0.6.0-py3-none-any.whl (64.7 kB view details)

Uploaded Apr 13, 2026 Python 3

File details

Details for the file scrapemm-0.6.0.tar.gz.

File metadata

Download URL: scrapemm-0.6.0.tar.gz
Upload date: Apr 13, 2026
Size: 53.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for scrapemm-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`aa710cdc560ad82961634ef3b9e61482275d0b5677d20155cd60157564fc1d18`
MD5	`5caf5df1c14702ce76e336c4fd42f317`
BLAKE2b-256	`d93ec81b484ea66c35a54937949a928016fff2b4e1e5c99da5e6c266535077a2`

See more details on using hashes here.

File details

Details for the file scrapemm-0.6.0-py3-none-any.whl.

File metadata

Download URL: scrapemm-0.6.0-py3-none-any.whl
Upload date: Apr 13, 2026
Size: 64.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for scrapemm-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4cef7ba59028bc431cd8908da87db8b2ac88aabdcd9a44519a814f28fdb0ff7b`
MD5	`14e4d5bba197714064095e7bf3ef6ba7`
BLAKE2b-256	`d8643c390dfb3607df16387890943739bd9257af9fa29d3fdfb3a739cd09852b`

See more details on using hashes here.

scrapeMM 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

scrapeMM: Multimodal Web Retrieval

Setup

Usage

How it works

Supported Platforms

Social Media

Archiving Services

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes