Skip to main content

Using proxycrawl api to scrape similarweb data

Project description

similarweb_scraper

similarweb_scraperis is a python library for scraping similarweb with proxycrawl api and it can bypass the distil projection so far. It also provides some functionality for transforming scraped data into pd dataframe.

Installation

Use the package manager pip to install foobar.

pip install similarweb-scraper

## Usage

from similarweb_scraper import scraper

### get the website html
web_scrape = scraper()
web_scrape.login(#api key from proxycrawl.com)
web_scrape.webpage_scrape(#websit e.g: hk.yahoo.com)

### get the html code
soup = web_scrape.og_soup
### get the html code as json format
web_json = web_scrape.json_storage

### get data into json format
df = web_scrape.metrics_to_df(#str(metrics_type))
##metrics_type name :
#'country_share',
#'traffic_share',
# engagement',
#'monthly_traffic_data'
# more function will be available soon

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

Project details


Release history Release notifications

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for similarweb-scraper, version 0.0.3
Filename, size File type Python version Upload date Hashes
Filename, size similarweb_scraper-0.0.3-py3-none-any.whl (5.5 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size similarweb_scraper-0.0.3.tar.gz (3.9 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page