Get html in string from a page
Project description
Generate HTML in string from URL
# One Single Page Websites also work
# Get html from page
from scraping.scraper import PageSources
page = PageSources('https://...')
print(page.get_current_html())
# save data in a directory call web_data
from scraping.scraper import PageSources
page = PageSources('https://...')
page.get_current_html()
page.save()
# page.save(directory='web_page') default
#Multiple link
from scraping.scraper import PageSources
lista = ['https://...','https://...']
page = PageSources()
page.get_multiple_html(lista)
When create a file it'll get name of hostPage and amount of file in your directory, like:
-> web_data
-hostPage_1.html
-hostPage_2.html
-hostPage_3.html
...
# getting csv file
from scraping.scraper import PageSources
page = PageSources()
dict_data = [
{'name':'a','page':'b'},
{'name':'a','page':'b'},
{'name':'a','page':'b'}
]
page.save_csv(dict_data)
# def save_csv(self, dict_data, outfile = 'output.csv', open_file = 'w'):
# Using proxy
from scraping.scraper import PageSources
"""
PROXY = "158.69.25.178:32769" # IP:PORT or HOST:PORT
"""
page = PageSources('https://andycode.ga', headless=False, proxy='158.69.25.178:32769')
page.get_current_html()
page.save()
It need a Google Chrome Driver
To check the version you have of Google Chrome, you can do it from the browser information and in the "Help" section:
- Open a window in the browser.
- Go to the three points in the upper right.
- Choose the "Help" option from the drop-down menu.
- Tap on "Google Chrome Information"
Go to https://chromedriver.chromium.org/downloads select your version, system and download
It will be a file like this:
Copy and paste in your root project
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scrape-html-0.0.24.tar.gz
(4.4 kB
view hashes)
Built Distribution
Close
Hashes for scrape_html-0.0.24-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae47d278f4240d77dddd8244f53a1f9f1858b4dcd7cd9908b5cbeb3ae8de3f4f |
|
MD5 | c72dace6550c6c85e610f6d5367f590f |
|
BLAKE2b-256 | 297a3b12a3baff018f4a699a5854086c45c9425a28351c5324e80ec03a53959f |