Skip to main content

Clark University, Package for YouTube crawler and cleaning data

Project description

clarku-youtube-crawler

Clark University YouTube crawler and JSON decoder for YouTube json. Please read documentation in DOCS

Installing

To install,

pip install clarku-youtube-crawler

The crawler needs multiple other packages to function. If missing requirements (I already include all dependencies so it shouldn't happen), download requirements.txt . Navigate to the folder where it contains requirements.txt and run

pip install -r requirements.txt

Example usage

To initialize,

# your_script.py
import clarku_youtube_crawler as cu

test = cu.RawCrawler()
test.__build__()
test.crawl("searchkey",start_date=14, start_month=12, start_year=2020, day_count=2)
test.crawl_videos_in_list(comment_page_count=1)
test.merge_all()

channel = cu.ChannelCrawler()
channel.__build__()
channel.setup_channel(subscriber_cutoff=1000, keyword="")
channel.crawl()
channel.crawl_videos_in_list(comment_page_count=1)
channel.merge_all()

jsonn = cu.JSONDecoder()
jsonn.load_json("YouTube_RAW_20201221/FINAL_raw_merged.json")

Changelog

Version 0.0.1->0.0.3

This is beta without testing since python packaging is a pain. Please don't install these versions.

Version 0.0.5

Finally figured out testing. It works okay. More documentation to come.

Version 0.0.6

Stable release only for RawCrawler feature

Version 1.0.0 Version 1.0.1

I think this might be our first full stable release.

Version 1.0.1.dev Pre-release

Added different file types for ChannelCrawler. Added documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clarku_youtube_crawler-1.1.13.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clarku_youtube_crawler-1.1.13-py3-none-any.whl (18.0 kB view details)

Uploaded Python 3

File details

Details for the file clarku_youtube_crawler-1.1.13.tar.gz.

File metadata

  • Download URL: clarku_youtube_crawler-1.1.13.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.8.1

File hashes

Hashes for clarku_youtube_crawler-1.1.13.tar.gz
Algorithm Hash digest
SHA256 34044fc8c5c6b1fc2a27ad50b034c05c411f59fed89465713109fb49a64026c8
MD5 dda6552c0f5c7a577cf56e74aad83c66
BLAKE2b-256 38315861ff23213e1fdd61304b4ebddffd7dac98189d1912a0e88b1f10eb2550

See more details on using hashes here.

File details

Details for the file clarku_youtube_crawler-1.1.13-py3-none-any.whl.

File metadata

  • Download URL: clarku_youtube_crawler-1.1.13-py3-none-any.whl
  • Upload date:
  • Size: 18.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.8.1

File hashes

Hashes for clarku_youtube_crawler-1.1.13-py3-none-any.whl
Algorithm Hash digest
SHA256 8639d76deb9628fc6bc742b4adc67f313ad17093edffabe161ab25d4d7b3ff49
MD5 c2fe0fb3ca9f8ad73d9c7c3eefc6c872
BLAKE2b-256 9e1739446c3307ec15f8f10d44aea7559ef03cdae1ee533a19f2882622df2c14

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page