Python package to scrap facebook's pages front end with no limitations
Project description
Facebook Page Scraper
No registration, No need of API key, No limitation on number of requests. Import the library and Just Do It !
Prerequisites
- Internet Connection
- Python 3.6+
- Chrome or Firefox browser installed on your machine
Installation:
Installing from source:
git clone https://github.com/shaikhsajid1111/facebook_page_scraper
Inside project's directory
python3 setup.py install
Installing with pypi
pip3 install facebook-page-scraper
How to use?
#import Facebook_scraper class from facebook_page_scraper
from facebook_page_scraper import Facebook_scraper
#instantiate the Facebook_scraper class
page_name = "facebookai"
posts_count = 10
browser = "firefox"
proxy = "IP:PORT" #if proxy requires authentication then user:password@IP:PORT
timeout = 600 #600 seconds
meta_ai = Facebook_scraper(page_name,posts_count,browser,proxy=proxy,timeout=timeout)
Parameters for Facebook_scraper(page_name,posts_count,browser,proxy,timeout)
class
Parameter Name | Parameter Type | Description |
page_name | string | name of the facebook page |
posts_count | integer | number of posts to scrap, if not passed default is 10 |
browser | string | which browser to use, either chrome or firefox. if not passed,default is chrome |
proxy(optional) | string |
optional argument, if user wants to set proxy, if proxy requires authentication then the format will be user:password@IP:PORT
|
timeout | integer | The maximum amount of time the bot should run for. If not passed, the default timeout is set to 10 minutes |
Done with instantiation?. Let the scraping begin!
For post's data in JSON format:
#call the scrap_to_json() method
json_data = meta_ai.scrap_to_json()
print(json_data)
Output:
{
"2024182624425347": {
"name": "Meta AI",
"shares": 0,
"reactions": {
"likes": 154,
"loves": 19,
"wow": 0,
"cares": 0,
"sad": 0,
"angry": 0,
"haha": 0
},
"reaction_count": 173,
"comments": 2,
"content": "We’ve built data2vec, the first general high-performance self-supervised algorithm for speech, vision, and text. We applied it to different modalities and found it matches or outperforms the best self-supervised algorithms. We hope this brings us closer to a world where computers can learn to solve many different tasks without supervision. Learn more and get the code: https://ai.facebook.com/…/the-first-high-performance-self-s…",
"posted_on": "2022-01-20T22:43:35",
"video": "",
"image": [
"https://scontent-bom1-2.xx.fbcdn.net/v/t39.30808-6/s480x480/272147088_2024182621092014_6532581039236849529_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=8024bb&_nc_ohc=j4_1PAndJTIAX82OLNq&_nc_ht=scontent-bom1-2.xx&oh=00_AT9us__TvC9eYBqRyQEwEtYSit9r2UKYg0gFoRK7Efrhyw&oe=61F17B71"
],
"post_url": "https://www.facebook.com/MetaAI/photos/a.360372474139712/2024182624425347/?type=3&__xts__%5B0%5D=68.ARBoSaQ-pAC_ApucZNHZ6R-BI3YUSjH4sXsfdZRQ2zZFOwgWGhjt6dmg0VOcmGCLhSFyXpecOY9g1A94vrzU_T-GtYFagqDkJjHuhoyPW2vnkn7fvfzx-ql7fsBYxL5DgQVSsiC1cPoycdCvHmi6BV5Sc4fKADdgDhdFvVvr-ttzXG1ng2DbLzU-XfSes7SAnrPs-gxjODPKJ7AdqkqkSQJ4HrsLgxMgcLFdCsE6feWL7rXjptVWegMVMthhJNVqO0JHu986XBfKKqB60aBFvyAzTSEwJD6o72GtnyzQ-BcH7JxmLtb2_A&__tn__=-R"
}, ...
}
Output Structure for JSON format:
{
"id": {
"name": string,
"shares": integer,
"reactions": {
"likes": integer,
"loves": integer,
"wow": integer,
"cares": integer,
"sad": integer,
"angry": integer,
"haha": integer
},
"reaction_count": integer,
"comments": integer,
"content": string,
"video" : string,
"image" : list,
"posted_on": datetime, //string containing datetime in ISO 8601
"post_url": string
}
}
For saving post's data directly to CSV file
#call scrap_to_csv(filename,directory) method
filename = "data_file" #file name without CSV extension,where data will be saved
directory = "E:\data" #directory where CSV file will be saved
meta_ai.scrap_to_csv(filename,directory)
content of data_file.csv
:
id,name,shares,likes,loves,wow,cares,sad,angry,haha,reactions_count,comments,content,posted_on,video,image,post_url
2024182624425347,Meta AI,0,154,19,0,0,0,0,0,173,2,"We’ve built data2vec, the first general high-performance self-supervised algorithm for speech, vision, and text. We applied it to different modalities and found it matches or outperforms the best self-supervised algorithms. We hope this brings us closer to a world where computers can learn to solve many different tasks without supervision. Learn more and get the code: https://ai.facebook.com/…/the-first-high-performance-self-s…",2022-01-20T22:43:35,,https://scontent-bom1-2.xx.fbcdn.net/v/t39.30808-6/s480x480/272147088_2024182621092014_6532581039236849529_n.jpg?_nc_cat=100&ccb=1-5&_nc_sid=8024bb&_nc_ohc=j4_1PAndJTIAX82OLNq&_nc_ht=scontent-bom1-2.xx&oh=00_AT9us__TvC9eYBqRyQEwEtYSit9r2UKYg0gFoRK7Efrhyw&oe=61F17B71,https://www.facebook.com/MetaAI/photos/a.360372474139712/2024182624425347/?type=3&__xts__%5B0%5D=68.ARAse4eiZmZQDOZumNZEDR0tQkE5B6g50K6S66JJPccb-KaWJWg6Yz4v19BQFSZRMd04MeBmV24VqvqMB3oyjAwMDJUtpmgkMiITtSP8HOgy8QEx_vFlq1j-UEImZkzeEgSAJYINndnR5aSQn0GUwL54L3x2BsxEqL1lElL7SnHfTVvIFUDyNfAqUWIsXrkI8X5KjoDchUj7aHRga1HB5EE0x60dZcHogUMb1sJDRmKCcx8xisRgk5XzdZKCQDDdEkUqN-Ch9_NYTMtxlchz1KfR0w9wRt8y9l7E7BNhfLrmm4qyxo-ZpA&__tn__=-R
...
Parameters for scrap_to_csv(filename,directory)
method.
Parameter Name | Parameter Type | Description |
filename | string | name of the CSV file where post's data will be saved |
directory | string | directory where CSV file have to be stored. |
Keys of the outputs:
Key | Type | Description |
id | string | Post Identifier(integer casted inside string) |
name | string | Name of the page |
shares | integer | share count of post |
reactions | dictionary |
dictionary containing reactions as keys and its count as value. Keys => ["likes","loves","wow","cares","sad","angry","haha"]
|
reaction_count | integer | total reaction count of post |
comments | integer | comments count of post |
content | string | content of post as text |
video | string | URL of video present in that post |
image | list | python's list containing URLs of all images present in the post |
posted_on | datetime | time at which post was posted(in ISO 8601 format) |
post_url | string | URL for that post |
Privacy
This scraper only scrapes public data available to unauthenticated user and does not holds the capability to scrap anything private.
Tech
This project uses different libraries to work properly.
If you encounter anything unusual please feel free to create issue here
LICENSE
MITProject details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file facebook_page_scraper-2.0.0.tar.gz
.
File metadata
- Download URL: facebook_page_scraper-2.0.0.tar.gz
- Upload date:
- Size: 19.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d4acc735f02f6aeec6b448e1c627f1ba715b744ff16fcbb6170c92e89ad7b921 |
|
MD5 | d9bdc4dd233a3dd3fbac393d4baac976 |
|
BLAKE2b-256 | aa15d7e6bce8f711ccba542b20c5d98466d4141bb12d309dc365342f5e0e7c00 |