Parselab helper module
Project description
parselab
This package contains classes that help to write parsers in Python.
Usage
To use parelab just create a class derived from BasicParser.
from parselab.cache import FileCache
from parselab.network import NetworkManager
from parselab.parsing import BasicParser
class MyParser(BasicParser):
def __init__(self):
self.cache = FileCache(namespace='my-parser', path=os.environ.get('CACHE_PATH'))
self.net = NetworkManager()
db.connect(os.environ['PARSINGDB'])
db.setup_project('my-project')
After that you will be able to download pages using BasicParser.get_page() method:
class MyParser(BasicParser):
...
def run(self):
page = self.get_page('https://google.com')
BasicParser will use network manager specified in __init__ method and will save all
downloaded pages into directory specified by your $CACHE_PATH environment variable.
Next time you invoke get_page() method it will get the requested page from cache
if available.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
parselab-0.1.8.tar.gz
(8.2 kB
view details)
File details
Details for the file parselab-0.1.8.tar.gz.
File metadata
- Download URL: parselab-0.1.8.tar.gz
- Upload date:
- Size: 8.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.39.0 CPython/3.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8bd73adca48d99bc7ff898b06197dcb1c1658c198680c79c1bd69812602a5a57
|
|
| MD5 |
6904709ac795beed462d2f05f5ef592b
|
|
| BLAKE2b-256 |
1ff99f75464511315da69dac064458353e5a13f588f153b5e30b756eec79cdc9
|