Skip to main content

A simple scraper for Youtube

Project description

Youtube Simple Scraper

This is a simple youtube scraper that uses the youtube API to get the videos metadata and comments of a channel.

You don't need an API key to use this scraper, so there are no usage limits or associated costs.

It should be noted that although there are no limits on use, YouTube can block the IP if it detects abusive use of the API.

Features

Scrape the following information of a channel:

  • Channel metadata
  • Videos metadata and comments
  • Shorts metadata and comments

Installation

pip install youtube_simple_scraper

Usage

from youtube_simple_scraper.entities import GetChannelOptions
from youtube_simple_scraper.list_video_comments import ApiVideoCommentRepository, ApiShortVideoCommentRepository
from youtube_simple_scraper.list_videos import ApiChannelRepository
from youtube_simple_scraper.logger import build_default_logger
from youtube_simple_scraper.network import Requester
from youtube_simple_scraper.stop_conditions import ListCommentMaxPagesStopCondition, \
    ListVideoMaxPagesStopCondition

if __name__ == '__main__':
    ##############################
    # To Avoid IP Blocking
    # Set the request rate per second to 0.5 seconds
    Requester.request_rate_per_second = 0.5
    
    # In every request sleep between 1 and 5 seconds
    Requester.min_sleep_time_sec = 1
    Requester.max_sleep_time_sec = 5
    
    # Every 100 requests sleep 30 seconds    
    Requester.long_sleep_time_sec = 30
    Requester.long_sleep_after_requests = 100   
    ##############################
    
    
    logger = build_default_logger()
    video_comment_repo = ApiVideoCommentRepository()
    short_comment_repo = ApiShortVideoCommentRepository()
    repo = ApiChannelRepository(
        video_comment_repo=video_comment_repo,
        shorts_comment_repo=short_comment_repo,
        logger=logger,
    )
    opts = GetChannelOptions(
        list_video_stop_conditions=[
          ListVideoMaxPagesStopCondition(2) # Stop after 2 pages of videos
        ],
        list_video_comment_stop_conditions=[
          ListCommentMaxPagesStopCondition(3) # Stop after 3 pages of comments
        ],
        list_short_stop_conditions=[
          ListVideoMaxPagesStopCondition(1) # Stop after 1 page of shorts
        ],
        list_short_comment_stop_conditions=[
          ListCommentMaxPagesStopCondition(4) # Stop after 4 pages of comments
        ]
    )
    channel_ = repo.get_channel("BancoFalabellaChile", opts)
    print(channel_.model_dump_json(indent=2))

Example of the output channel object parsed to json:

{
  "id": "UCaY_-ksFSQtTGk0y1HA_3YQ",
  "name": "IbaiLlanos",
  "target_id": "668be16f-0000-20de-b6a2-582429cfbdec",
  "title": "Ibai",
  "description": "contenido premium ▶️\n",
  "subscriber_count": 11600000,
  "video_count": 1400,
  "videos": [
    {
      "id": "VFXu8gzcpNc",
      "title": "EL RESTAURANTE MÁS ÚNICO AL QUE HE IDO NUNCA",
      "description": "MI CANAL DE DIRECTOS: https://www.youtube.com/@Ibai_TV\nExtraído de mi canal de TWITCH: https://www.twitch.tv/ibai/\nMI PODCAST: \nhttps://www.youtube.com/channel/UC6jNDNkoOKQfB5djK2IBDoA\nTWITTER:...",
      "date": "2024-06-02T19:18:27.647137",
      "view_count": 1455817,
      "like_count": 0,
      "dislike_count": 0,
      "comment_count": 0,
      "thumbnail_url": "https://i.ytimg.com/vi/VFXu8gzcpNc/hqdefault.jpg?sqp=-oaymwEbCKgBEF5IVfKriqkDDggBFQAAiEIYAXABwAEG&rs=AOn4CLCEmoQtslruHk-droajdw0KJUI_KA",
      "comments": [
        {
          "id": "UgzV8lY8eJ4dyHjl9Bp4AaABAg",
          "text": "Todo muy rico pero....Y la cuenta?",
          "user": "@eliasabregu2813",
          "date": "2024-06-03T19:11:28.109467",
          "likes": 0
        },
        {
          "id": "UgwHtPZb8jprbCH-ysp4AaABAg",
          "text": "Que humilde Ibai, comiendo todo para generar ingresos a los nuevos negocios",
          "user": "@user-ui2sk7sr5i",
          "date": "2024-06-03T19:04:28.112228",
          "likes": 0
        }
      ]
    },
    // More videos ...
  ],
  "shorts": [
    // the shorts videos and comments
  ]
}

Stop conditions

Videos stop conditions

  • ListVideoMaxPagesStopCondition: Stops the scraping process when the number of pages scraped is greater than the specified value.
  • ListVideoNeverStopCondition: The scraping process stop when all the videos of the channel are scraped.

Comments stop conditions

  • ListCommentMaxPagesStopCondition: Stops the scraping process when the number of pages scraped is greater than the specified value.
  • ListCommentNeverStopCondition: The scraping process stop when all the comments of the video are scraped.

The stop conditions are used to stop the scraping process. The following stop conditions are available:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

youtube_simple_scraper-0.0.8.tar.gz (18.2 kB view details)

Uploaded Source

Built Distribution

youtube_simple_scraper-0.0.8-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file youtube_simple_scraper-0.0.8.tar.gz.

File metadata

File hashes

Hashes for youtube_simple_scraper-0.0.8.tar.gz
Algorithm Hash digest
SHA256 84b8e530d9b8df2c4f2fcd142fdd0902ba29b22fb7d59560128b3df493327afe
MD5 6fb6cca682021897a41587ee171b1352
BLAKE2b-256 41aa3383d35f8bbfe5a08daac46140c9febf4b6a195d6303410a689db22e64df

See more details on using hashes here.

File details

Details for the file youtube_simple_scraper-0.0.8-py3-none-any.whl.

File metadata

File hashes

Hashes for youtube_simple_scraper-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 0b1f78caca409c4176cdc5eaab5b9ccceb25b49af41d6066b11d8fafb71f0ea8
MD5 da185a3b2bf74f434c5d14c27bb7829b
BLAKE2b-256 ab1efca64e3f3e2b876a43b6a0a6dab2e8b4c96f610f9011947a702656336d3c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page