alertscraper

Flexible tool for scraping for certain certain DOM elements, and then emailing if new ones are added.

These details have not been verified by PyPI

Project links

Homepage

Project description

alertscraper

General purpose flexible tool for scraping a given URL for a certain type of items, and then email if new items are added. Useful for monitoring ad or auction websites. Could also be useful for setting up email alerts on your own site.

WARNING

Check the Terms of Service of the site before you use this tool! For some sites, using this tool may violate their terms of service, and should not be used.

Limitations

This code ONLY scrapes based on the initial HTTP request. Websites that function as single-page apps will not work. This could be supported in the future using JSON, or integrating with something heavier weight like Selenium.

Usage

Installation

Assuming Python’s pip is installed (for Debian-based systems, this can be installed with sudo apt-get install python-pip), alertscraper can be installed directly from PyPI:

pip install alertscraper

Python versions 3.3+ (and 2.6+) are supported and tested against.

Quick start

alertscraper is based on URLs, and maintains a history file for each URL that you scrape so it knows when something is new.

Start by navigating in your web-browser to the website you want to scrape, and then copying and pasting the URL. Then, inspect the page source of the site and see if you can figure out the DOM path to the relevant element. In this case, it was a li element with the class name result so the combined thing becomes li.result.

alertscraper 'https://some-site.org/?query=guitar&maxprice=550' li.result

This will download the given URL and list the text content of each item specified. This lets you know your query is correct.

Now we want to save this to a database file, that is, say that “I’ve seen everything currently posted and am only now interested in new stuff”.

alertscraper 'https://some-site.org/?query=guitar&maxprice=550' li.result --file=guitars.txt

Notice that it prints out again all the links it found. If we were to run the command again, it would not print them out since it will have stored them as “already seen”.

Finally, lets run the command to email us everything that has not yet been seen.

alertscraper 'https://some-site.org/?query=guitar&maxprice=550' li.result --file=guitars.txt --email=myemail@gmail.com

This only runs once. If you want it to run continually, I’d recommend putting it in a cronjob. Eventually I may add a daemon mode, but this is good for now.

Happy scraping!

Contributing

CONDUCT.md

New features, tests, and bug fixes are welcome!

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.7

Nov 7, 2017

0.1.6

Nov 7, 2017

0.1.5

Nov 6, 2017

0.1.4

Nov 6, 2017

0.1.3

Nov 6, 2017

0.1.2

Nov 6, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alertscraper-0.1.7.tar.gz (20.1 kB view details)

Uploaded Nov 7, 2017 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

alertscraper-0.1.7-py2.py3-none-any.whl (8.3 kB view details)

Uploaded Nov 7, 2017 Python 2Python 3

File details

Details for the file alertscraper-0.1.7.tar.gz.

File metadata

Download URL: alertscraper-0.1.7.tar.gz
Upload date: Nov 7, 2017
Size: 20.1 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for alertscraper-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`704855d44a14b9833c275bb503e85d7e3fd2eeaaaa8edab67d5b6c46dea0599a`
MD5	`92c32f4e4a77f4aa57d3c2bd97d06ca6`
BLAKE2b-256	`fe671b56a087ae93225db1c849d7ecd8dbdcbe11ac471cd2531fe353217f052e`

See more details on using hashes here.

File details

Details for the file alertscraper-0.1.7-py2.py3-none-any.whl.

File metadata

Download URL: alertscraper-0.1.7-py2.py3-none-any.whl
Upload date: Nov 7, 2017
Size: 8.3 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No

File hashes

Hashes for alertscraper-0.1.7-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`b6d1f21d7e1a6ac906f64ce98b8ea729bc822cbc07d76d1cfc8f43d3d6684c6c`
MD5	`72a34198a74e1a6304fb64119b9b0695`
BLAKE2b-256	`57c7e9f44b84d727cb05e8f8d51d79da6b67810da32fdf8fcf882170e915e647`

See more details on using hashes here.

alertscraper 0.1.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

alertscraper

WARNING

Limitations

Usage

Installation

Quick start

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes