Skip to main content

Facebook Scrapper

Project description

Facebook Post Scraper App

This app leverages streamlit and selenium to scrape public Facebook posts from designated accounts. Multiprocessing enables scraping from multiple accounts concurrently.

Key Features

Concurrent Scraping: Supports concurrent scraping from multiple Facebook accounts using multiprocessing, significantly speeding up the data collection process.

facebook_scrapping

Important Note: Ensure the number of threads does not exceed your CPU cores to avoid performance issues.

Configurable Scraping: Allows users to define the number of posts to scrape and the number of concurrent scraping threads through the app’s interface.

Data Processing Pipeline: Processes collected data through a pipeline to prepare it for analysis.

Random Forest Model: Utilizes the processed data to create a Random Forest model for predictions.

Setup and Configuration:

Prerequisites Python 3.9 or higher streamlit selenium multiprocessing pandas sklearn numpy scikit-learn scipy webdriver-manager matplotlib

Installation

1- Clone the repository:

git clone https://github.com/yourusername/facebook-post-scraper.git
cd facebook-post-scraper

2- Install the required packages:

pip install -r requirements.txt

Fake Accounts Setup

To use the app, you'll need to create at least two fake Facebook accounts. Configure these accounts in the config.py file as follows:

email_account1 = "facebook@email.com"
password_account1 = "password"

email_account2 = "facebook@email.com"
password_account2 = "password"

email_account3 = "facebook@email.com"
password_account3 = "password"

Running the App

To run the app, execute the following command:

streamlit run .\Scrap_and_predict_accounts.py

User Interface

The app features an intuitive interface with two sidebars:

Scrap and predict accounts: Select and configure accounts for prediction. Training Model: Select and configure accounts for training.

front_page

Visualization

  • The app generates visualizations to compare data from fake and true accounts.
  • This includes plots that help in understanding the distribution and characteristics of the scraped data.

data_plot

Usage Instructions

1. Define Scraping Parameters:

  • Set the number of posts to scrape.
  • Set the number of concurrent threads in the app’s sidebar.

Run the Scraper:

  • Initiate the scraping process by clicking the appropriate button.

Capabilities of the App for Training Regression Models and Generating Metrics

The app is able to train many regression models including:

  • Logistic Regression
  • K-Nearest Neighbors
  • Random Forest
  • Decision Tree
  • Gradient Boosting

It also creates metrics such as:

  • Accuracy
  • Precision
  • Recall
  • F1
  • Roc_auc
  • Mean_metrics

metrics

  • View Results:

  • Once the scraping is complete, view the processed data and generated plots.

Model Training:

  • The app will process the data through a pipeline and create the following models for prediction purposes:
    • Logistic Regression
    • K-Nearest Neighbors
    • Random Forest
    • Decision Tree
    • Gradient Boosting

Using FastAPI for GET and POST requests.

Additionally, you can use FastAPI to retrieve true or false accounts that you want to scrape, to post the processed DataFrame, or to post the metrics to FastAPI.

In this screenshot, true accounts have been fetched from FastAPI.

fastapi_get_accounts

In this screenshot, the DataFrame will be sent to FastAPI as JSON data.

fastapi_post_df

In this screenshot, the metrics will be sent to FastAPI.

fastapi_post_metrics

Install facebook-scrapper package

package-facebook-scrapper is a Python package for scraping data from Facebook.

Installation

You can install the package using pip:

pip install package-facebook-scrapper

Final Notes

  • Ensure you follow all guidelines and ethical considerations while scraping data from Facebook.
  • Use this tool responsibly and only for purposes that comply with Facebook's policies and legal requirements.

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

facebook_scrapper_ml-0.1.tar.gz (16.6 kB view details)

Uploaded Source

Built Distribution

facebook_scrapper_ml-0.1-py3-none-any.whl (19.9 kB view details)

Uploaded Python 3

File details

Details for the file facebook_scrapper_ml-0.1.tar.gz.

File metadata

  • Download URL: facebook_scrapper_ml-0.1.tar.gz
  • Upload date:
  • Size: 16.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.1

File hashes

Hashes for facebook_scrapper_ml-0.1.tar.gz
Algorithm Hash digest
SHA256 04648411d1fa2b4abde18bb948715b1436d8e60bfa2fdeba3d3142a0bbcc2185
MD5 f29ace135badd0d6b3c5d274efbeabb5
BLAKE2b-256 594d857707ee5c73a91854f0f9787f4556f2b316a9512b2c107e0b6750088561

See more details on using hashes here.

File details

Details for the file facebook_scrapper_ml-0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for facebook_scrapper_ml-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1d98bcff65a9b64d6ceaacfd1dc5f0dc48aaf2a1940d29fd7c0cc33eabc92703
MD5 6376869233dfed7addda0d5c39f76b12
BLAKE2b-256 d7a13cf7d5813b3fe142bf3a52d868f6f94d1fd56f80d454e5ce24ee37fa8674

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page