Skip to main content

Python package to scrap facebook's pages front end with no limitations

Project description

Facebook Page Scraper

Maintenance PyPI license Python >=3.6.9

No need of API key, No limitation on number of requests. Import the library and Just Do It !

Table of Contents

Table of Contents
  1. Getting Started
  2. Usage
  3. Tech
  4. License

Prerequisites

  • Internet Connection
  • Python 3.6+
  • Chrome or Firefox browser installed on your machine


Installation:

Installing from source:

git clone https://github.com/shaikhsajid1111/facebook_page_scraper

Inside project's directory

python3 setup.py install

Installing with pypi

pip3 install facebook-page-scraper


How to use?

#import Facebook_scraper class from facebook_page_scraper
from facebook_page_scraper import Facebook_scraper

#instantiate the Facebook_scraper class

page_name = "metaai"
posts_count = 10
browser = "firefox"
proxy = "IP:PORT" #if proxy requires authentication then user:password@IP:PORT
timeout = 600 #600 seconds
headless = True
meta_ai = Facebook_scraper(page_name, posts_count, browser, proxy=proxy, timeout=timeout, headless=headless)

Parameters for Facebook_scraper(page_name, posts_count, browser, proxy, timeout, headless, browser_profile) class

Parameter Name Parameter Type Description
page_name String Name of the facebook page
posts_count Integer Number of posts to scrap, if not passed default is 10
browser String Which browser to use, either chrome or firefox. if not passed,default is chrome
proxy(optional) String Optional argument, if user wants to set proxy, if proxy requires authentication then the format will be user:password@IP:PORT
timeout Integer The maximum amount of time the bot should run for. If not passed, the default timeout is set to 10 minutes
headless Boolean Whether to run browser in headless mode?. Default is True
browser_profile String Path to the browser profile where cookies are stored and can be used for scraping data in an authenticated way.



Done with instantiation?. Let the scraping begin!


For post's data in JSON format:

#call the scrap_to_json() method

json_data = meta_ai.scrap_to_json()
print(json_data)

Output:

{
  "2024182624425347": {
    "name": "Meta AI",
    "shares": 0,
    "reactions": {
      "likes": 154,
      "loves": 19,
      "wow": 0,
      "cares": 0,
      "sad": 0,
      "angry": 0,
      "haha": 0
    },
    "reaction_count": 173,
    "comments": 2,
    "content": "We’ve built data2vec, the first general high-performance self-supervised algorithm for speech, vision, and text. We applied it to different modalities and found it matches or outperforms the best self-supervised algorithms. We hope this brings us closer to a world where computers can learn to solve many different tasks without supervision. Learn more and get the code:  https://ai.facebook.com/…/the-first-high-performance-self-s…",
    "posted_on": "2022-01-20T22:43:35",
    "video": [],
    "image": [
      "https://scontent-bom1-2.xx.fbcdn.net/v/t39.30808-6/s480x480/272147088_2024182621092014_6532581039236849529_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=8024bb&_nc_ohc=j4_1PAndJTIAX82OLNq&_nc_ht=scontent-bom1-2.xx&oh=00_AT9us__TvC9eYBqRyQEwEtYSit9r2UKYg0gFoRK7Efrhyw&oe=61F17B71"
    ],
    "post_url": "https://www.facebook.com/MetaAI/photos/a.360372474139712/2024182624425347/?type=3&__xts__%5B0%5D=68.ARBoSaQ-pAC_ApucZNHZ6R-BI3YUSjH4sXsfdZRQ2zZFOwgWGhjt6dmg0VOcmGCLhSFyXpecOY9g1A94vrzU_T-GtYFagqDkJjHuhoyPW2vnkn7fvfzx-ql7fsBYxL5DgQVSsiC1cPoycdCvHmi6BV5Sc4fKADdgDhdFvVvr-ttzXG1ng2DbLzU-XfSes7SAnrPs-gxjODPKJ7AdqkqkSQJ4HrsLgxMgcLFdCsE6feWL7rXjptVWegMVMthhJNVqO0JHu986XBfKKqB60aBFvyAzTSEwJD6o72GtnyzQ-BcH7JxmLtb2_A&__tn__=-R"
  }, ...

}
Output Structure for JSON format:
{
    "id": {
        "name": string,
        "shares": integer,
        "reactions": {
            "likes": integer,
            "loves": integer,
            "wow": integer,
            "cares": integer,
            "sad": integer,
            "angry": integer,
            "haha": integer
        },
        "reaction_count": integer,
        "comments": integer,
        "content": string,
        "video" : list,
        "image" : list,
        "posted_on": datetime,  //string containing datetime in ISO 8601
        "post_url": string
    }
}



For saving post's data directly to CSV file

#call scrap_to_csv(filename,directory) method


filename = "data_file"  #file name without CSV extension,where data will be saved
directory = "E:\data" #directory where CSV file will be saved
meta_ai.scrap_to_csv(filename, directory)

content of data_file.csv:

id,name,shares,likes,loves,wow,cares,sad,angry,haha,reactions_count,comments,content,posted_on,video,image,post_url
2024182624425347,Meta AI,0,154,19,0,0,0,0,0,173,2,"We’ve built data2vec, the first general high-performance self-supervised algorithm for speech, vision, and text. We applied it to different modalities and found it matches or outperforms the best self-supervised algorithms. We hope this brings us closer to a world where computers can learn to solve many different tasks without supervision. Learn more and get the code:  https://ai.facebook.com/…/the-first-high-performance-self-s…",2022-01-20T22:43:35,,https://scontent-bom1-2.xx.fbcdn.net/v/t39.30808-6/s480x480/272147088_2024182621092014_6532581039236849529_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=8024bb&_nc_ohc=j4_1PAndJTIAX82OLNq&_nc_ht=scontent-bom1-2.xx&oh=00_AT9us__TvC9eYBqRyQEwEtYSit9r2UKYg0gFoRK7Efrhyw&oe=61F17B71,https://www.facebook.com/MetaAI/photos/a.360372474139712/2024182624425347/?type=3&__xts__%5B0%5D=68.ARAse4eiZmZQDOZumNZEDR0tQkE5B6g50K6S66JJPccb-KaWJWg6Yz4v19BQFSZRMd04MeBmV24VqvqMB3oyjAwMDJUtpmgkMiITtSP8HOgy8QEx_vFlq1j-UEImZkzeEgSAJYINndnR5aSQn0GUwL54L3x2BsxEqL1lElL7SnHfTVvIFUDyNfAqUWIsXrkI8X5KjoDchUj7aHRga1HB5EE0x60dZcHogUMb1sJDRmKCcx8xisRgk5XzdZKCQDDdEkUqN-Ch9_NYTMtxlchz1KfR0w9wRt8y9l7E7BNhfLrmm4qyxo-ZpA&__tn__=-R
...



Parameters for scrap_to_csv(filename, directory) method.

Parameter Name Parameter Type Description
filename String Name of the CSV file where post's data will be saved
directory String Directory where CSV file have to be stored.



Keys of the outputs:

Key Type Description
id String Post Identifier(integer casted inside string)
name String Name of the page
shares Integer Share count of post
reactions Dictionary Dictionary containing reactions as keys and its count as value. Keys => ["likes","loves","wow","cares","sad","angry","haha"]
reaction_count Integer Total reaction count of post
comments Integer Comments count of post
content String Content of post as text
video List URLs of video present in that post
image List List containing URLs of all images present in the post
posted_on Datetime Time at which post was posted(in ISO 8601 format)
post_url String URL for that post


Tech

This project uses different libraries to work properly.



If you encounter anything unusual please feel free to create issue here

LICENSE

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

facebook_page_scraper-4.0.3.tar.gz (21.4 kB view details)

Uploaded Source

Built Distribution

facebook_page_scraper-4.0.3-py3-none-any.whl (19.6 kB view details)

Uploaded Python 3

File details

Details for the file facebook_page_scraper-4.0.3.tar.gz.

File metadata

  • Download URL: facebook_page_scraper-4.0.3.tar.gz
  • Upload date:
  • Size: 21.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for facebook_page_scraper-4.0.3.tar.gz
Algorithm Hash digest
SHA256 d1058b57be0ee1a2ca580e85ed016bbbec7dbfc199b026ddf72e154883f14d09
MD5 39c7718408f22fe1b5d714081c0a49e6
BLAKE2b-256 510c85298759e93b961b630620a27548056138c6f3c7f788a146db4768c7c13d

See more details on using hashes here.

File details

Details for the file facebook_page_scraper-4.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for facebook_page_scraper-4.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 862f6e2e1bdde743ad679cefbe2321acc8e9225274ac239b5a2357fa154e9f11
MD5 57d46585db013c5cbb60459c25e7d762
BLAKE2b-256 a83153676781f0230a882da4307eb34e0a90647c5f3325fd07c1f19e6ca91148

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page