This package lets your script scrape web sites. JQuery-Like API.
Project description
Solid Scraper
=============
Easy to use JQuery-Like API for Web Scraping/Crawling. It also supports
Cookies and custom User Agents. Solidscraper is compatible with **Python
2 and 3**.
+-----+
| ## |
| 1. |
| Ins |
| tal |
| lat |
| ion |
+-----+
| ``p |
| ip |
| ins |
| tal |
| l s |
| oli |
| dsc |
| rap |
| er` |
| ` |
+-----+
2. "Hello World" Example
------------------------
Getting all url of all links:
.. code:: python
import solidscraper as ss
doc = ss.load("https://www.example.com/the/path")
# print the list of urls from all <a> elements
print(doc.select("a").getAttribute("href"))
Getting all url of all links inside <div>s whose class id is 'links':
.. code:: python
import solidscraper as ss
doc = ss.load("https://www.example.com/the/path")
# print the list of urls from all <a> elements inside <div id="links">
print(doc.select("div #links").then("a").getAttribute("href"))
Getting the text of all <span> elements inside <p> whose class are
'info':
.. code:: python
import solidscraper as ss
doc = ss.load("https://www.example.com/the/path")
# print the text of all <span> elements inside <p class="info">
print(doc.select("p .info").then("span").text())
**Note:** these examples use the python 3 print function, in case you
want to run them with python 2, either replace the ``print()`` function
with the python 2 ``print`` statement or add the following import line
as the first statement of your code:
``from __future__ import print_function``.
=============
Easy to use JQuery-Like API for Web Scraping/Crawling. It also supports
Cookies and custom User Agents. Solidscraper is compatible with **Python
2 and 3**.
+-----+
| ## |
| 1. |
| Ins |
| tal |
| lat |
| ion |
+-----+
| ``p |
| ip |
| ins |
| tal |
| l s |
| oli |
| dsc |
| rap |
| er` |
| ` |
+-----+
2. "Hello World" Example
------------------------
Getting all url of all links:
.. code:: python
import solidscraper as ss
doc = ss.load("https://www.example.com/the/path")
# print the list of urls from all <a> elements
print(doc.select("a").getAttribute("href"))
Getting all url of all links inside <div>s whose class id is 'links':
.. code:: python
import solidscraper as ss
doc = ss.load("https://www.example.com/the/path")
# print the list of urls from all <a> elements inside <div id="links">
print(doc.select("div #links").then("a").getAttribute("href"))
Getting the text of all <span> elements inside <p> whose class are
'info':
.. code:: python
import solidscraper as ss
doc = ss.load("https://www.example.com/the/path")
# print the text of all <span> elements inside <p class="info">
print(doc.select("p .info").then("span").text())
**Note:** these examples use the python 3 print function, in case you
want to run them with python 2, either replace the ``print()`` function
with the python 2 ``print`` statement or add the following import line
as the first statement of your code:
``from __future__ import print_function``.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
solidscraper-0.7.6.tar.gz
(17.1 kB
view hashes)
Built Distribution
Close
Hashes for solidscraper-0.7.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dcdf7dd800a4976c9ecfbadabdb4b05151a9909d25d05f624e7f61fc298b7949 |
|
MD5 | 125eb37c9564f7cdb7854bba1b137301 |
|
BLAKE2b-256 | 10d659515f1fd9cf485b3015a038fb41d7a57691aafd91a923edc00952e75fe0 |