Skip to main content

Get html in string from a page

Project description

Generate HTML in string from URL

# One Single Page Websites also work
# Get html from page
from scraping.scraper import PageSources

page = PageSources('https://...')

print(page.get_current_html())

# save data in a directory call web_data
from scraping.scraper import PageSources

page = PageSources('https://...')
page.get_current_html()
page.save()
# page.save(directory='web_page') default

#Multiple link
from scraping.scraper import PageSources

lista = ['https://...','https://...']

page = PageSources()
page.get_multiple_html(lista)

When create a file it'll get name of hostPage and amount of file in your directory, like:
-> web_data
    -hostPage_1.html
    -hostPage_2.html
    -hostPage_3.html
    ...

# getting csv file
from scraping.scraper import PageSources

page = PageSources()

dict_data = [
    {'name':'a','page':'b'},
    {'name':'a','page':'b'},
    {'name':'a','page':'b'}
]

page.save_csv(dict_data)
# def save_csv(self, dict_data, outfile = 'output.csv', open_file = 'w'):

# Using proxy
from scraping.scraper import PageSources

"""
PROXY = "158.69.25.178:32769" # IP:PORT or HOST:PORT
"""
page = PageSources('https://andycode.ga', headless=False, proxy='158.69.25.178:32769')
page.get_current_html()
page.save()

It need a Google Chrome Driver

To check the version you have of Google Chrome, you can do it from the browser information and in the "Help" section:

  • Open a window in the browser.
  • Go to the three points in the upper right.
  • Choose the "Help" option from the drop-down menu.
  • Tap on "Google Chrome Information"

Go to https://chromedriver.chromium.org/downloads select your version, system and download

It will be a file like this:

Copy and paste in your root project

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrape-html-0.0.24.tar.gz (4.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrape_html-0.0.24-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file scrape-html-0.0.24.tar.gz.

File metadata

  • Download URL: scrape-html-0.0.24.tar.gz
  • Upload date:
  • Size: 4.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.4

File hashes

Hashes for scrape-html-0.0.24.tar.gz
Algorithm Hash digest
SHA256 bf90c8eeda1a8b3cb1abcb11081b0a41a9ffc7f25055403c4141e76ec4701315
MD5 bcce33ad169750fcf3459615b372e22c
BLAKE2b-256 7c8790cbaf1c413e01a5bb28040dc6196a85e1f7a635357e096ee3ac523bd890

See more details on using hashes here.

File details

Details for the file scrape_html-0.0.24-py3-none-any.whl.

File metadata

  • Download URL: scrape_html-0.0.24-py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.4

File hashes

Hashes for scrape_html-0.0.24-py3-none-any.whl
Algorithm Hash digest
SHA256 ae47d278f4240d77dddd8244f53a1f9f1858b4dcd7cd9908b5cbeb3ae8de3f4f
MD5 c72dace6550c6c85e610f6d5367f590f
BLAKE2b-256 297a3b12a3baff018f4a699a5854086c45c9425a28351c5324e80ec03a53959f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page