Skip to main content

This package lets your script scrape web sites. JQuery-Like API.

Project description

Easy to use JQuery-Like API for Web Scraping/Crawling. It also supports Cookies and custom User Agents. Solidscraper is compatible with Python 2 and 3.

2. “Hello World” Examples

Getting all url of all links:

import solidscraper as ss

doc = ss.load("https://www.example.com/the/path")

# print the list of urls from all <a> elements
print(doc.select("a").getAttribute("href"))

Getting all url of all links inside <div>s whose class id is ‘links’:

import solidscraper as ss

doc = ss.load("https://www.example.com/the/path")

# print the list of urls from all <a> elements inside <div id="links">
print(doc.select("div #links").then("a").getAttribute("href"))

Getting the text of all <span> elements inside <p> whose class are ‘info’:

import solidscraper as ss

doc = ss.load("https://www.example.com/the/path")

# print the text of all <span> elements inside <p class="info">
print(doc.select("p .info").then("span").text())

Note: these examples use the python 3 print function, in case you want to run them with python 2, either replace the print() function with the python 2 print statement or add the following import line as the first statement of your code: from __future__ import print_function.


3. “Real World” Examples

The examples folder above contains two fully functional examples: one to download tweets by hashtags and another to download complete users timeline (tweets and images). Both scripts were completely built using solidscraper.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

solidscraper-0.7.7.tar.gz (17.3 kB view details)

Uploaded Source

Built Distribution

solidscraper-0.7.7-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file solidscraper-0.7.7.tar.gz.

File metadata

  • Download URL: solidscraper-0.7.7.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.4.2 requests/2.20.0 setuptools/39.1.0 requests-toolbelt/0.9.1 tqdm/4.24.0 CPython/3.6.5

File hashes

Hashes for solidscraper-0.7.7.tar.gz
Algorithm Hash digest
SHA256 587ec89bf1691e5660321be257523e06a4f4456da94cb8ef4bdf37784e7ee00b
MD5 15a6a2b5158343ea785608f210e4fed2
BLAKE2b-256 8e3c701cf973376e885eab3c7762979f686dd78830a4a10835e242a7cb92ac6f

See more details on using hashes here.

File details

Details for the file solidscraper-0.7.7-py3-none-any.whl.

File metadata

  • Download URL: solidscraper-0.7.7-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.4.2 requests/2.20.0 setuptools/39.1.0 requests-toolbelt/0.9.1 tqdm/4.24.0 CPython/3.6.5

File hashes

Hashes for solidscraper-0.7.7-py3-none-any.whl
Algorithm Hash digest
SHA256 259bc104918d1b992920458e6b52dbaf19586bd506bff16ef3788cb92c91541a
MD5 feeec441943ba81c8d1317a466c017ad
BLAKE2b-256 d1aa82f2c915a1339a1d3a18ef1f8ac2514608efda5ed507dda9c5506f8ce51f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page