Skip to main content

RobinWould is framework for fast and easy development on web scraping tools based.

Project description

RobinWould

Spend time thinking, not coding. Scrape data with RobinWould

Test Coverage Package version

Introduction

RobinWould is framework for fast and easy development on web scraping tools based. With less than 10 lines of code you already have script ready to fish for data on the web.

Requirements

  • Python 3.8+
  • aiohttp
  • Scrapy

Installing

pip install robinwould

Example

Create it

Create a main.py with:

from robinwould import Crawler, fields, interfaces

class DataToScrape(interfaces.Model):
    foo = fields.StringField()
    bar = fields.IntegerField()

crawler = Crawler()

@crawler.spider(url="https://www.example.com/")
def mrs_spider(response):
    yield DataToScrape(
        foo='//div[@class="foobar-wrapper" and position()=1]/p[@class="foo"]/text()',
        bar='//div[@class="foobar-wrapper" and position()=1]/p[@class="bar"]/text()'
    )
    
if __name__ == '__main__':
    crawler.run()

Run it

Run the script with:

  • On Windows:
python main.py
  • On Linux or MacOS:
python3 main.py

Check it

If the spider worked, it should print the scraped data as the follow:

Data scraped: {'foo': 'Foo data', 'bar': 2}

You just created an script that:

  • Downloads the source file from https://www.example.com/;
  • Scraped the foo and bar data;

The crawler.run() method returns all the scraped data, so if you want to write the data into a file, just assign it to a variable and process it.

More information

I'm so sorry for not being able to deliver all the information you may need, I'll be working on a more complete documentation for future versions.

Licence

This project is licensed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

robinwould-0.1.1.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

robinwould-0.1.1-py2.py3-none-any.whl (8.1 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file robinwould-0.1.1.tar.gz.

File metadata

  • Download URL: robinwould-0.1.1.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.28.1

File hashes

Hashes for robinwould-0.1.1.tar.gz
Algorithm Hash digest
SHA256 dc9482a28e2adde519cc0fc9fdd4ac1d6ca6b700c0e25c2a229fc5659ece4a72
MD5 ed6ce80e2957ee137d5551d776774371
BLAKE2b-256 0f679992e879641e5c52a711fedb0c8ef29804078a78d2ab832b60d7a5ee3f0c

See more details on using hashes here.

File details

Details for the file robinwould-0.1.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for robinwould-0.1.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 bc947fc7d1cec643bc9b6f6b11440338e1973ddf1643eba4a4367e512619edd9
MD5 0f5667004c318d8e5f297300e572e2a9
BLAKE2b-256 2395371035eae0a281afa11be6489d71fc68318aa8beee6022f34516da4c15c3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page