RobinWould is framework for fast and easy development on web scraping tools based.
Project description
RobinWould
Spend time thinking, not coding. Scrape data with RobinWould
Introduction
RobinWould is framework for fast and easy development on web scraping tools based. With less than 10 lines of code you already have script ready to fish for data on the web.
Requirements
- Python 3.8+
- aiohttp
- Scrapy
Installing
pip install robinwould
Example
Create it
Create a main.py
with:
from robinwould import Crawler, fields, interfaces
class DataToScrape(interfaces.Model):
foo = fields.StringField()
bar = fields.IntegerField()
crawler = Crawler()
@crawler.spider(url="https://www.example.com/")
def mrs_spider(response):
yield DataToScrape(
foo='//div[@class="foobar-wrapper" and position()=1]/p[@class="foo"]/text()',
bar='//div[@class="foobar-wrapper" and position()=1]/p[@class="bar"]/text()'
)
if __name__ == '__main__':
crawler.run()
Run it
Run the script with:
- On Windows:
python main.py
- On Linux or MacOS:
python3 main.py
Check it
If the spider worked, it should print the scraped data as the follow:
Data scraped: {'foo': 'Foo data', 'bar': 2}
You just created an script that:
- Downloads the source file from
https://www.example.com/
; - Scraped the
foo
andbar
data;
The crawler.run()
method returns all the scraped data, so if you want to write the data into a file, just assign it to a variable and process it.
More information
- For learning more about the XPath expressions, you can find it on Scrapy documentation.
I'm so sorry for not being able to deliver all the information you may need, I'll be working on a more complete documentation for future versions.
Licence
This project is licensed under the terms of the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for robinwould-0.1.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bc947fc7d1cec643bc9b6f6b11440338e1973ddf1643eba4a4367e512619edd9 |
|
MD5 | 0f5667004c318d8e5f297300e572e2a9 |
|
BLAKE2b-256 | 2395371035eae0a281afa11be6489d71fc68318aa8beee6022f34516da4c15c3 |