Skip to main content

Parselab helper module

Project description

parselab

This package contains classes that help to write parsers in Python.

Usage

To use parelab just create a class derived from BasicParser.

from parselab.cache import FileCache
from parselab.network import NetworkManager
from parselab.parsing import BasicParser

class MyParser(BasicParser):

    def __init__(self):
        self.cache = FileCache(namespace='my-parser', path=os.environ.get('CACHE_PATH'))
        self.net = NetworkManager()
        db.connect(os.environ['PARSINGDB'])
        db.setup_project('my-project')

After that you will be able to download pages using BasicParser.get_page() method:

class MyParser(BasicParser):
    ...

    def run(self):
        page = self.get_page('https://google.com')

BasicParser will use network manager specified in __init__ method and will save all downloaded pages into directory specified by your $CACHE_PATH environment variable. Next time you invoke get_page() method it will get the requested page from cache if available.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parselab-0.1.8.tar.gz (8.2 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page