Skip to main content

Python framework for fast development of scraping tools

Project description

RobinWould

Spend time thinking, not coding. Scrape data with RobinWould

Test Coverage Package version

Introduction

RobinWould is framework for fast and easy development on web scraping tools based. With less than 10 lines of code you already have script ready to fish for data on the web.

Requirements

  • Python 3.8+
  • aiohttp
  • Scrapy

Installing

pip install robinwould

Example

Create it

Create a main.py with:

from robinwould import Crawler, fields, interfaces

class DataToScrape(interfaces.Model):
    foo = fields.StringField()
    bar = fields.IntegerField()

crawler = Crawler()

@crawler.spider(url="https://www.example.com/")
def mrs_spider(response):
    yield DataToScrape(
        foo='//div[@class="foobar-wrapper" and position()=1]/p[@class="foo"]/text()',
        bar='//div[@class="foobar-wrapper" and position()=1]/p[@class="bar"]/text()'
    )
    
if __name__ == '__main__':
    crawler.run()

Run it

Run the script with:

  • On Windows:
python main.py
  • On Linux or MacOS:
python3 main.py

Check it

If the spider worked, it should print the scraped data as the follow:

Data scraped: {'foo': 'Foo data', 'bar': 2}

You just created an script that:

  • Downloads the source file from https://www.example.com/;
  • Scraped the foo and bar data;

The crawler.run() method returns all the scraped data, so if you want to write the data into a file, just assign it to a variable and process it.

More information

I'm so sorry for not being able to delivery all the information you may need, I'll be working on a more complete documentation for future versions.

Licence

This project is licensed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

robinwould-0.1.0.tar.gz (11.2 kB view hashes)

Uploaded Source

Built Distribution

robinwould-0.1.0-py2.py3-none-any.whl (8.1 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page