Parselab helper module
Project description
parselab
This package contains classes that help to write parsers in Python.
Usage
To use parelab
just create a class derived from BasicParser
.
from parselab.cache import FileCache
from parselab.network import NetworkManager
from parselab.parsing import BasicParser
class MyParser(BasicParser):
def __init__(self):
self.cache = FileCache(namespace='my-parser', path=os.environ.get('CACHE_PATH'))
self.net = NetworkManager()
db.connect(os.environ['PARSINGDB'])
db.setup_project('my-project')
After that you will be able to download pages using BasicParser.get_page()
method:
class MyParser(BasicParser):
...
def run(self):
page = self.get_page('https://google.com')
BasicParser
will use network manager specified in __init__
method and will save all
downloaded pages into directory specified by your $CACHE_PATH
environment variable.
Next time you invoke get_page()
method it will get the requested page from cache
if available.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
parselab-0.1.8.tar.gz
(8.2 kB
view details)
File details
Details for the file parselab-0.1.8.tar.gz
.
File metadata
- Download URL: parselab-0.1.8.tar.gz
- Upload date:
- Size: 8.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.39.0 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8bd73adca48d99bc7ff898b06197dcb1c1658c198680c79c1bd69812602a5a57 |
|
MD5 | 6904709ac795beed462d2f05f5ef592b |
|
BLAKE2b-256 | 1ff99f75464511315da69dac064458353e5a13f588f153b5e30b756eec79cdc9 |