Skip to main content

Parselab helper module

Project description

parselab

This package contains classes that help to write parsers in Python.

Usage

To use parelab just create a class derived from BasicParser.

from parselab.cache import FileCache
from parselab.network import NetworkManager
from parselab.parsing import BasicParser

class MyParser(BasicParser):

    def __init__(self):
        self.cache = FileCache(namespace='my-parser', path=os.environ.get('CACHE_PATH'))
        self.net = NetworkManager()
        db.connect(os.environ['PARSINGDB'])
        db.setup_project('my-project')

After that you will be able to download pages using BasicParser.get_page() method:

class MyParser(BasicParser):
    ...

    def run(self):
        page = self.get_page('https://google.com')

BasicParser will use network manager specified in __init__ method and will save all downloaded pages into directory specified by your $CACHE_PATH environment variable. Next time you invoke get_page() method it will get the requested page from cache if available.

Project details


Release history Release notifications

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for parselab, version 0.1.8
Filename, size File type Python version Upload date Hashes
Filename, size parselab-0.1.8.tar.gz (8.2 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page