Skip to main content

Scraping high intensity content sites

Project description

SiteScraper

This repository contains the following methods:

'yt_vedio()'

This method uses Selenium and Firefox web driver to scrape YouTube videos' title, views, and upload time from a given URL. It returns a dictionary with keys 'title', 'views', and 'when', and corresponding values.

'yt_vedio_comment()'

This method uses Selenium and Firefox web driver to scrape YouTube comments' text, likes, and time posted from a given URL. It returns a Pandas DataFrame with columns 'comment_text', 'likes', and 'comment_time', and corresponding values.

To use these methods, you will need to have Python 3 installed, along with the following libraries: pandas, selenium, and geckodriver-autoinstaller.

To install the required libraries, you can use pip:

pip install pandas selenium geckodriver-autoinstaller

To run the methods, you will need to import the SiteScraper module and create an instance of the 'yt_vedio' or 'yt_vedio_comment' class:

import SiteScraper as ss

create an instance of yt_vedio class

import SiteScraper as ss
import pandas as pd 
df = ss.yt_vedio()
new_data = df.yt_vedios_data('https://www.youtube.com/@campusx-official/videos')
dataframe = pd.DataFrame(new_data)
dataframe.to_csv('campusx.csv', index=False)

create an instance of yt_vedio_comment class

import SiteScraper as ss
import pandas as pd 
df = ss.yt_vedio_comment()
new_data = df.yt_vedio_comment('https://www.youtube.com/watch?v=xxxxxxxx')
dataframe = pd.DataFrame(new_data)
dataframe.to_csv('youtube_comments.csv', index=False)

Note that you will need to replace the URL with the actual YouTube URL that you want to scrape. #New methods will be commited for twitter and reddit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SiteScraper-0.2.2.tar.gz (2.9 kB view hashes)

Uploaded Source

Built Distribution

SiteScraper-0.2.2-py3-none-any.whl (3.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page