A Python package to scrape YouTube comments using Selenium and BeautifulSoup

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

YouTube Background

YouTube Comment Scraper

YouTubeCommentScraper is a Python package designed to scrape comments from YouTube videos using Selenium. The scraper is customizable, allowing you to run the browser in headless mode, control the timeout, pause time for scrolling, and more. You can also choose whether to log actions and return the page source along with the comments.

Features

Headless Mode: Run the browser in headless mode (optional).
Customizable Timeouts: Set the timeout for waiting for elements to load.
Automatic Scrolling: Automatically scrolls the page until all comments are loaded.
Logging Support: Enable logging to a file for tracking activities.
Return Page Source: Optionally return the page source along with the comments.
BeautifulSoup Integration: Extract comments using BeautifulSoup for robust parsing.

Installation

To install the package, use the following command:

pip install youtube-comments-scrapper

Dependencies

This package requires the following dependencies:

selenium
webdriver-manager
beautifulsoup4
lxml (optional but recommended for faster HTML parsing)

You can install these dependencies using the following command (optional):

pip install selenium webdriver-manager beautifulsoup4 lxml

Usage

1. Basic Usage: Scraping Comments

Here's a simple example to scrape comments from a YouTube video:

from youtube_comments_scraper import YouTubeCommentScraper

scraper = YouTubeCommentScraper(headless=True, timeout=10, scroll_pause_time=1.5, enable_logging=False, return_page_source=False)
video_url = "https://www.youtube.com/watch?v=Ycg48pVp3SU"
comments = scraper.scrape_comments(video_url)

print("Comments:", comments)

2. Scraping Comments with Logging Enabled

Enable logging to track the actions performed during scraping:

from youtube_comments_scraper import YouTubeCommentScraper

scraper = YouTubeCommentScraper(headless=True, timeout=10, scroll_pause_time=1.5, enable_logging=True, return_page_source=False)
video_url = "https://www.youtube.com/watch?v=Ycg48pVp3SU"
comments = scraper.scrape_comments(video_url)

print("Comments:", comments)

This will generate a log file (youtube_scraper.log) in the current directory.

3. Returning Page Source Along with Comments

If you want to extract comments and return the page's HTML source:

from youtube_comments_scraper import YouTubeCommentScraper

scraper = YouTubeCommentScraper(headless=True, timeout=10, scroll_pause_time=1.5, enable_logging=False, return_page_source=True)
video_url = "https://www.youtube.com/watch?v=Ycg48pVp3SU"
comments, page_source = scraper.scrape_comments(video_url)

print("Comments:", comments)
print("Page Source:", page_source)

4. Custom Scroll Pause Time

You can control how long the scraper pauses between scroll actions using the scroll_pause_time parameter:

from youtube_comments_scraper import YouTubeCommentScraper

scraper = YouTubeCommentScraper(headless=True, timeout=10, scroll_pause_time=2.0, enable_logging=False, return_page_source=False)
video_url = "https://www.youtube.com/watch?v=Ycg48pVp3SU"
comments = scraper.scrape_comments(video_url)

print("Comments:", comments)

5. Scraping Comments Without Scrolling

If you only want to scrape the comments that load without scrolling:

from youtube_comments_scraper import YouTubeCommentScraper

scraper = YouTubeCommentScraper(headless=True, timeout=10, scroll_pause_time=1.5, enable_logging=False, return_page_source=False)
video_url = "https://www.youtube.com/watch?v=Ycg48pVp3SU"
comments = scraper.scrape_comments(video_url, scroll=False)

print("Comments:", comments)

6. Logging Custom Messages

You can log custom messages using the built-in log_info, log_warning, and log_error methods:

scraper.log_info("This is an info log message.")
scraper.log_warning("This is a warning message.")
scraper.log_error("This is an error message.")

Class Reference