Skip to main content

anywebscraper is your ultimate web data extraction companion. We understand the challenges of web scraping, including dealing with anti-bot measures and the need for reliable proxies. We make web scraping easy and effective, even in the most challenging environments. Whether you're a data scientist, business analyst or developer looking to harness the power of web data, anywebscraper has you covered.

Project description

anywebscraper

Have you ever tried to access data from a number of sites only to be blocked? The solution is often to use proxies, which may or may not work. Don't worry, anywebscraper is the solution to these problems.

anywebscraper is your ultimate web data extraction companion. We understand the challenges of web scraping, including dealing with anti-bot measures and the need for reliable proxies. Our library is designed to make web scraping easy and effective, even in the most challenging environments. Whether you're a data scientist, business analyst, or developer looking to harness the power of web data, anywebscraper has you covered.

We benchmarked our solution against websites such as https://www.g2.com and https://www.investing.com. The results are quite satisfactory, we have successfully web-scraped all the data without any problems.

Key Features

  • No Proxy Integration: Say goodbye to IP bans and blocked access. anywebscraper does not integrate with proxies but ensures uninterrupted data retrieval.

  • Universal Compatibility: Our library can scrape data from virtually any type of website, including e-commerce platforms, social media networks, news sites, and more.

  • Customisable and Scalable: If you're familiar with BeautifulSoup, bingo! anywebscraper returns a BeautifulSoup-like object and is highly customizable to meet your specific scraping needs.

  • Reliable Support: Count on our dedicated support team to assist you with any questions or issues you encounter.

Usage

pip install anywebscraper
from anywebscraper import AnyWebScraper

# Initialization
webscraper = AnyWebScraper()

# Define the url of the website you want to web scrape
url="your-website URL" # i.e url = "https://google.com"
soup = webscraper.webScrape(url=url) # Returns 'BeautifulSoup' object
soup.find("your-html-element", class_="your-class")

Note that all BeautifulSoup methods can be applied to soup that our library returns: .prettify(), .find(), .find_all(), .get_text(), etc.

Example

Let's try to collect historical bitcoin data from https://www.investing.com using the classic method, as shown in this image:

Bitcoin Historical Data

To do that, we can use Python code below:

from urllib.request import urlopen
url = "https://www.investing.com/crypto/bitcoin"
page = urlopen(url)
html_bytes = page.read()
html = html_bytes.decode("utf-8")
print(html)

An error is displayed stating that it is forbidden to request this link, as shown in this image. Error from Classical Method

Now let's use anywebscraper to see the difference.

from anywebscraper import AnyWebScraper

url = "https://www.investing.com/crypto/bitcoin"
web_scraper = AnyWebScraper()

html = web_scraper.webScrape(url=url)
print(html.prettify())

AnyWebScraper Method

We can see that there is no forbidden error and that our library returns data from this link. Now all we need to do is collect the relevant data, as we do with BeautifulSoup, using the appropriate methods.

Contribution

Pull requests are welcome. For significant changes, please open an issue first to discuss what you want to change.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anywebscraper-0.1.3.tar.gz (422.9 kB view details)

Uploaded Source

Built Distribution

anywebscraper-0.1.3-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file anywebscraper-0.1.3.tar.gz.

File metadata

  • Download URL: anywebscraper-0.1.3.tar.gz
  • Upload date:
  • Size: 422.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.15

File hashes

Hashes for anywebscraper-0.1.3.tar.gz
Algorithm Hash digest
SHA256 f67f7cbb19e491f703b8cd63af7547a1ed8b718dd71e8381462df701dcc307b6
MD5 9df9932b1ce419c47525c2ea06897432
BLAKE2b-256 5678998b59d633907813b63dc8a7860788630deefce65c6534685ba3f803bc68

See more details on using hashes here.

File details

Details for the file anywebscraper-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for anywebscraper-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8224d39976a9b9ee24416a3e3fc06b3f8b7386b44f08114be4c2b51cc1af7db8
MD5 3690bbefb08ba5cf058d2a343d74308b
BLAKE2b-256 6eaf78de6b13ac6113491d395b2e7267eed154cb5ca5ce8ee929d30b41e81e98

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page