Skip to main content

Dango app to scrape web page

Project description

Django Easy Scraper

An standalone django app that can be used/intigrated with both django and no-django application easily. the scraping mechanism is fully based on Regular Expression but you can extends based on what you are familiar with very easily.

It requires to install python requests modules

Install

pip install django-easy-scraper

Basic Uses

from django_easy_scraper import scraper

class ScrapeExampleDotCom(scraper.Scraper):

    regex_fields = {
        'price': "Write Your Regex pattern for price here",
        'title': "Write your regex pattern for title here",
        # Like above way you can add as much fields/keys as you want
    }

Scrape now

url = 'www.example.com/bla-bla-details-page/
data = ScrapeExampleDotCom.regex_url_scraper(url)

print(data)

and the response should look like this if your regex pattern are correct:

{
    'price': 4,
    'title': 'an scraped title',
}

regex_url_scraper method always gives you json response,

So if you add many regex pattern in regex_fields, it will give you response that number of dictionary key with result that you added in that dictionary.

Multiple Sites Scraping together

You don't need to call different method for different site all the time !! Just call once and scrape all, something fun, right?

Like you are gonna scrape three sites:

www.example.com

www.exampletwo.com

www.examplethree.com

But how will those site product scrape automatically, it scares you ?

  • Wirte Regex pattern for all above site with the fields that you want to scrape:
from django_easy_scraper import scraper

class ScrapeExampleDotCom(scraper.Scraper):
    regex_fields = {
        'price': "Write Your Regex pattern for price here",
        'title': "Write your regex pattern for title here",
        # Like above way you can add as much fields/keys as you want
    }

class ScrapeExampleTwo(scraper.Scraper):
    regex_fields = {
        'price': "Write Your Regex pattern for price here",
        'title': "Write your regex pattern for title here",
        # Like above way you can add as much fields/keys as you want
    }

class ScrapeExampleThree(scraper.Scraper):
    regex_fields = {
        'price': "Write Your Regex pattern for price here",
        'title': "Write your regex pattern for title here",
        # Like above way you can add as much fields/keys as you want
    }


You have written regular expression for you all the site you are gonna scrape,

Now it's time use our Switch class that will route your script/class based on the site you are gonna scrape? Cool, right ? !!

It's where the magic really begins:

Just place all your class in the dictionary switcher.

Important Note:

key name should be domain name, pure domain name, no www or http or slash, dont add anything prefix/suffix value should be the class of that domain you have written for and place it's method `regex_url_scraper

from django_easy_scraper import switch

class Switch(switch.BaseSwitch):
    switcher = {
        'example.com': ScrapeExampleDotCom.regex_url_scraper,
        'exampletwo.com': ScrapeExampleTwo.regex_url_scraper,
        'examplethree.com': ScrapeExampleThree.regex_url_scraper,
    }

So you have done routing your script/class based on the url it gets.

Get response data as python dictionary like above site:

url = 'Any of site you have written class for the site and added in switch class'

response = Switch.get_data(url=url, raise_exception=False)

print(response) # Will give you an object of data that you trying to scrape

Switch class is giving you facilities to route your scraping class automacally based whatever site link pass to it's get_data method.

get_data method's raise_exception is it handle if you want to raise excepiton when your expected fields not found

Got an issue ?

Please open an issue on our github repo: https://github.com/dearopen/django-easy-scraper

Don't forget to star to this project if you like this.

Happy Scraping !!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django-easy-scraper-1.0.1.tar.gz (4.6 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page