Ramby is a simple way to setup a webscraper

These details have not been verified by PyPI

Project links

Homepage

Project description

Ramby

Ramby is a simple way to setup a webscraper.

Installation

pip install ramby

Examples

from ramby import Ramby

scraper = Ramby('./exapmles/hackernews.yaml')
data = scraper.scrape("https://news.ycombinator.com/item?id=32237445")

Configuration

A configuration file needs two fields, HOST and RULES.

HOST

The HOST holds the base domain of the site you which to scrape, also keep in mind an error would be thrown if you choose to scrape a URL with a different HOST.

So in practice the HOST would be added to the configuration like so:

host: example.com

RULES

A RULE is basically a way to target certain elements in a webpage. For example you want to select all the titles of the top posts in hackernews you'd select them like so:

host: news.ycombinator.com

rules:
    hompage:
        pattern: '/' # The `/` path signifies we use the `homepage` rule 
        topics:    # This would denote a section in the homepage, making it easy to add other obejects if needed i.e all_authors
            title: # An object property
                selector: '.athing .title > a' # The title target
                text: true                     # We would want the text inside the target element
                # html: true is optional
                count: 2                       # The amount of elements to return
                attrs:                         # Specify the html attributes you'd want
                    - href                     # Also taking the link to the post

Sample returned Object based on the rules above

{'topics': {'title': {0: {'attrs': {'href': 'https://paulbutler.org/2022/why-is-it-so-hard-to-give-google-money/'},
                          'text': 'Why is it so hard to give Google money?'},
                      1: {'attrs': {'href': 'https://mullvad.net/en/blog/2022/7/26/mullvad-is-now-available-on-amazon-us-se/'},
                          'text': 'Mullvad is now available on Amazon'}}}}

And if you choose to scrape a post and it's comments

host: news.ycombinator.com

rules:
    hompage:
        pattern: '/' # The `/` path signifies we use the `homepage` rule 
        topics:    # This would denote a section in the homepage, making it easy to add other obejects if needed i.e all_authors
            title: # An object property
                selector: '.athing .title > a' # The title target
                text: true                     # We would want the text inside the target element
                # html: true is optional
                count: 2                       # The amount of elements to return
                attrs:                         # Specify the html attributes you'd want
                    - href                     # Also taking the link to the post
                  
    posts:
        pattern: /item/
        post:
            title: 
                selector: '.fatitem:first-child .title > a'
                count: 1
                text: true
                attrs: 
                    - href 

        comments:
            texts:
                selector: '.comment .commtext'
                count: 2
                text: true

Sample returned Object based on the rules above

{'comments': {'texts': {0: {'text': 'Wonder how much money & resources Shopify '
                                    'spent on all of their NFT features & '
                                    'integrations over the last months, how '
                                    'many people worked on it and how many of '
                                    "those are part of the lay-off now. I'd "
                                    "guess the support you'd need to provide "
                                    'for it and their tokengated commerce '
                                    "isn't little either.Tobi removed all the "
                                    'NFT stuff from his Twitter profile and '
                                    "didn't tweet much about it for months "
                                    'now, after being pretty vocal about it '
                                    'until earlier this year.Would love to '
                                    'hear his real thoughts on it and why '
                                    'he/they even (seemingly) invested so much '
                                    'into it. One of the few things I never '
                                    'got about Tobi / Shopify. Just seemed so '
                                    'late and weird to be so bullish there. '
                                    "Don't think he's the kind of person to "
                                    'push it just for personal gain, nor that '
                                    "he'd have to, but ..."},
                        1: {'text': 'I’m honestly still in disbelief at how '
                                    'many very smart people fell for the NFT '
                                    'trap. If you’ve spent even a single bull '
                                    'cycle in the crypto community you could '
                                    'tell right away NFTs we’re ICO level '
                                    'scams. The mental gymnastics very smart '
                                    'and technical people performed to '
                                    'rationalize paying for a jpeg still makes '
                                    'me question reality. I participate in '
                                    'crypto because I take a calculated risk, '
                                    'and I’m comfortable gambling. People who '
                                    'actually think something like an NFT has '
                                    'any real value still messes with my head. '
                                    'I really can’t grasp how they actually '
                                    'believe this. And yes, I understand '
                                    'technically how NFTs work.'}}},
 'post': {'title': {0: {'attrs': {'href': 'https://www.wsj.com/articles/shopify-to-lay-off-10-of-workers-in-broad-shake-up-11658839047'},
                        'text': 'Shopify to lay off 10% of workers in broad '
                                'shake-up'}}}}

See more examples here

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.5

Jul 26, 2022

0.0.3

Jul 26, 2022

0.0.2

Jul 26, 2022

0.0.1

Jul 26, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ramby-0.0.5.tar.gz (6.1 kB view details)

Uploaded Jul 26, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ramby-0.0.5-py3-none-any.whl (6.1 kB view details)

Uploaded Jul 26, 2022 Python 3

File details

Details for the file ramby-0.0.5.tar.gz.

File metadata

Download URL: ramby-0.0.5.tar.gz
Upload date: Jul 26, 2022
Size: 6.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.9.11

File hashes

Hashes for ramby-0.0.5.tar.gz
Algorithm	Hash digest
SHA256	`ef477c7bb6b9af1899c462153431d730d006df5c4c0056a171e4c1de83cc0ee3`
MD5	`00ce76dc138f984a6b5f665d549b3aaa`
BLAKE2b-256	`e7d907bb1b093821657a015b8a9bbe9b244676c2b5834e64a7542e3abfcf3469`

See more details on using hashes here.

File details

Details for the file ramby-0.0.5-py3-none-any.whl.

File metadata

Download URL: ramby-0.0.5-py3-none-any.whl
Upload date: Jul 26, 2022
Size: 6.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.9.11

File hashes

Hashes for ramby-0.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c505bd8f1a4ec4dd2a600b7b137152ef44f9fc888260eb46d28ce09cac76b1bf`
MD5	`84655762a0affa1a42c61e0b5d63cecf`
BLAKE2b-256	`45a32a88b0abb812e7eacc381110b2f1725084a4cbc4ceadbdab890afb45ca0e`

See more details on using hashes here.

ramby 0.0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Ramby

Installation

Examples

Configuration

HOST

RULES

Sample returned Object based on the rules above

And if you choose to scrape a post and it's comments

Sample returned Object based on the rules above

See more examples here

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes