Skip to main content

Netflix parser

Project description

NetflixParser


Wavve Popular Episode crawling.


1. Data Crawling Info

#### Today's TOP 10 TV Program in Korea

#### columns 
   - rank : TOP 10 rank
   - title : program title
   - Date : crawling date

2. Package File

import requests
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.remote.webelement import WebElement

3. Installation

pip install NetflixParser

4. NetflixParser

import requests
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.remote.webelement import WebElement

class NetflixParser:
    
    def __init__(self, datetime):
        self.driver = self.login()
        self.scan(self.driver)
        
        df = pd.DataFrame(self.items_list)
        df = df.drop_duplicates(keep='last').set_index('rank')
        df['Date'] = datetime
        self.df = df.sort_index()

        self.driver.quit()

    def login(self):
        driver = webdriver.Chrome()
        driver.set_window_size(1080,800)
        url = 'https://www.netflix.com/kr/login?nextpage=https%3A%2F%2Fwww.netflix.com%2Fbrowse%2Fgenre%2F83'
        driver.get(url)
        driver.implicitly_wait(1)
        #로그인
        driver.find_element_by_css_selector('#id_userLoginId').send_keys(login_id())
        driver.find_element_by_css_selector('#id_password').send_keys(login_pw())
        driver.find_element_by_css_selector('.btn').click()
        driver.implicitly_wait(3)
        # driver.find_element_by_css_selector('#appMountPoint > div > div > div:nth-child(1) > div.bd.dark-background > div.profiles-gate-container > div > div > ul > li:nth-child(1) > div > a > div > div').click()
        driver.find_element_by_css_selector('#appMountPoint > div > div > div:nth-child(1) > div.bd.dark-background > div.profiles-gate-container > div > div > ul > li:nth-child(2) > div > a > div > div').click()

        return driver
    
    def scan(self, driver):
        import time

        self.items_list = []
        items_get = self.driver.find_element_by_xpath('//div[@data-list-context="mostWatched"]')

        if items_get:
            items = items_get.find_element_by_css_selector('.rowContent .slider .sliderContent')
            items.text.strip()
            items_get.find_element_by_css_selector('.handle').click()
            time.sleep(2)
            items_get.find_element_by_css_selector('.handle').click()
            items_2 = items_get.find_elements_by_css_selector("div.ptrack-content a")

        for item in items_2:
            title = item.get_attribute("aria-label")
            rank = item.find_elements_by_css_selector("div > svg > use")[0].get_attribute("xlink:href")
            rank = int(rank.split('-')[1])
            self.items_list.append({"title" : title, "rank" : rank})

        return self.items_list

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

NetflixParser-0.1.1.tar.gz (2.6 kB view details)

Uploaded Source

File details

Details for the file NetflixParser-0.1.1.tar.gz.

File metadata

  • Download URL: NetflixParser-0.1.1.tar.gz
  • Upload date:
  • Size: 2.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.0.0.post20200309 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for NetflixParser-0.1.1.tar.gz
Algorithm Hash digest
SHA256 93ac359b7b70a610f2bb537db0b6259996ee02e85bdad3b5cedd0b7c90ea040a
MD5 f26995b3807ef70dfcede19bf2ef3362
BLAKE2b-256 0c51bbc6bfb3e959500b0eb16f696b3b895754d7b3a44b26a084d25afbacbbdd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page