Skip to main content

Clark University, Package for YouTube crawler and cleaning data

Project description

clarku-youtube-crawler

Clark University YouTube crawler and JSON decoder for YouTube json. Please read documentation in DOCS

Installing

To install,

pip install clarku-youtube-crawler

If missing requirements (I already include all dependencies so it shouldn't happen), download requirements.txt here on this repo and run

pip install -r requirements.txt

Example usage

To initialize,

# your_script.py
import clarku_youtube_crawler as cu

test = cu.RawCrawler()
test.__build__()
test.crawl("searchkey",start_date=14, start_month=12, start_year=2020, day_count=2)
test.crawl_videos_in_list(comment_page_count=1)
test.merge_all()

channel = cu.ChannelCrawler()
channel.__build__()
channel.setup_channel(subscriber_cutoff=1000, keyword="")
channel.crawl()
channel.crawl_videos_in_list(comment_page_count=1)
channel.merge_all()

jsonn = cu.JSONDecoder()
jsonn.load_json("YouTube_RAW_20201221/FINAL_raw_merged.json")

Changelog

Version 0.0.1->0.0.3

This is beta without testing since python packaging is a pain. Please don't install these versions.

Version 0.0.5

Finally figured out testing. It works okay. More documentation to come.

Version 0.0.6

Stable release only for RawCrawler feature

Version 1.0.0 Version 1.0.1

I think this might be our first full stable release.

Version 1.0.1.dev Pre-release

Added different file types for ChannelCrawler. Added documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clarku_youtube_crawler-1.0.5.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clarku_youtube_crawler-1.0.5-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file clarku_youtube_crawler-1.0.5.tar.gz.

File metadata

  • Download URL: clarku_youtube_crawler-1.0.5.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.9.1

File hashes

Hashes for clarku_youtube_crawler-1.0.5.tar.gz
Algorithm Hash digest
SHA256 7adf8799c03f013e986ba3d217d2f7853bc3c7ef108aa5f8d3b10ff21b71050d
MD5 f2972cbdd7e87dd79b483d73ed8d5e81
BLAKE2b-256 69ca043828e672fbf125bc1e9a855e04ee821ee9d0ed4c7e1392548f3b5633bf

See more details on using hashes here.

File details

Details for the file clarku_youtube_crawler-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: clarku_youtube_crawler-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 16.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.9.1

File hashes

Hashes for clarku_youtube_crawler-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 b16db31b448b66c0d5ee771c195942b6dc1b34bf4bcf772fb3cb0c211b36b795
MD5 d5a31c5cbfd0b99378d4871613764ec6
BLAKE2b-256 0b342788d655e0eb4ebfdd5c1b54782089268c9fcf7d2d4836abaf35160640c2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page