anywebscraper is your ultimate web data extraction companion. We understand the challenges of web scraping, including dealing with anti-bot measures and the need for reliable proxies. We make web scraping easy and effective, even in the most challenging environments. Whether you're a data scientist, business analyst or developer looking to harness the power of web data, anywebscraper has you covered.
Project description
anywebscraper
Have you ever tried to access data from a number of sites only to be blocked? The solution is often to use proxies, which may or may not work. Don't worry, anywebscraper is the solution to these problems.
anywebscraper is your ultimate web data extraction companion. We understand the challenges of web scraping, including dealing with anti-bot measures and the need for reliable proxies. Our library is designed to make web scraping easy and effective, even in the most challenging environments. Whether you're a data scientist, business analyst, or developer looking to harness the power of web data, anywebscraper has you covered.
We benchmarked our solution against websites such as https://www.g2.com and https://www.investing.com. The results are quite satisfactory, we have successfully web-scraped all the data without any problems.
Key Features
-
No Proxy Integration: Say goodbye to IP bans and blocked access. anywebscraper does not integrate with proxies but ensures uninterrupted data retrieval.
-
Universal Compatibility: Our library can scrape data from virtually any type of website, including e-commerce platforms, social media networks, news sites, and more.
-
Customisable and Scalable: If you're familiar with BeautifulSoup, bingo! anywebscraper returns a BeautifulSoup-like object and is highly customizable to meet your specific scraping needs.
-
Reliable Support: Count on our dedicated support team to assist you with any questions or issues you encounter.
Usage
pip install anywebscraper
from anywebscraper import AnyWebScraper
# Initialization
webscraper = AnyWebScraper()
# Define the url of the website you want to web scrape
url="your-website URL" # i.e url = "https://google.com"
soup = webscraper.webScrape(url=url) # Returns 'BeautifulSoup' object
soup.find("your-html-element", class_="your-class")
Note that all BeautifulSoup methods can be applied to soup that our library returns: .prettify(), .find(), .find_all(), .get_text(), etc.
Example
Let's try to collect historical bitcoin data from https://www.investing.com using the classic method, as shown in this image:
To do that, we can use Python code below:
from urllib.request import urlopen
url = "https://www.investing.com/crypto/bitcoin"
page = urlopen(url)
html_bytes = page.read()
html = html_bytes.decode("utf-8")
print(html)
An error is displayed stating that it is forbidden to request this link, as shown in this image.
Now let's use anywebscraper to see the difference.
from anywebscraper import AnyWebScraper
url = "https://www.investing.com/crypto/bitcoin"
web_scraper = AnyWebScraper()
html = web_scraper.webScrape(url=url)
print(html.prettify())
We can see that there is no forbidden error and that our library returns data from this link. Now all we need to do is collect the relevant data, as we do with BeautifulSoup, using the appropriate methods.
Contribution
Pull requests are welcome. For significant changes, please open an issue first to discuss what you want to change.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file anywebscraper-0.1.3.tar.gz
.
File metadata
- Download URL: anywebscraper-0.1.3.tar.gz
- Upload date:
- Size: 422.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.7.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f67f7cbb19e491f703b8cd63af7547a1ed8b718dd71e8381462df701dcc307b6 |
|
MD5 | 9df9932b1ce419c47525c2ea06897432 |
|
BLAKE2b-256 | 5678998b59d633907813b63dc8a7860788630deefce65c6534685ba3f803bc68 |
File details
Details for the file anywebscraper-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: anywebscraper-0.1.3-py3-none-any.whl
- Upload date:
- Size: 4.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.7.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8224d39976a9b9ee24416a3e3fc06b3f8b7386b44f08114be4c2b51cc1af7db8 |
|
MD5 | 3690bbefb08ba5cf058d2a343d74308b |
|
BLAKE2b-256 | 6eaf78de6b13ac6113491d395b2e7267eed154cb5ca5ce8ee929d30b41e81e98 |