Dark Keeper is open source simple web-parser for podcast-sites
Project description
Dark Keeper
Dark Keeper is open source simple web-parser for podcast-sites. Also you can use it for any sites.
Goal idea: parsing full information per each podcast episodes like number, description and download link.
Features
[x] simple web-spider walking on site
[x] cache for all downloaded pages
[x] parse any information from pages
[x] export parsed data to MongoDB
Quick start
$ mkvirtualenv keeper
(keeper)$ pip install dark-keeper
(keeper)$ cat app.py
from dark_keeper import BaseParser, DarkKeeper, HttpClient, UrlsStorage, DataStorage, ExportMongo
class PodcastParser(BaseParser):
def parse_urls(self, content):
urls = content.parse_urls('.posts-list > .container-fluid .text-left a')
return urls
def parse_data(self, content):
data = []
for post_item in content.get_block_items('.posts-list .posts-list-item'):
post_data = dict(
title=post_item.parse_text('.number-title'),
desc=post_item.parse_text('.post-podcast-content'),
mp3=post_item.parse_attr('.post-podcast-content audio', 'src'),
)
if post_data['title'] and post_data['mp3']:
data.append(post_data)
return data
if __name__ == '__main__':
pk = DarkKeeper(
http_client=HttpClient(
delay=2,
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
'AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/81.0.4044.138 Safari/537.36 OPR/68.0.3618.125',
),
parser=PodcastParser(),
urls_storage=UrlsStorage(base_url='https://radio-t.com/'),
data_storage=DataStorage(),
export_mongo=ExportMongo(mongo_uri='mongodb://localhost/podcasts.radio-t.com'),
)
pk.run()
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dark-keeper-0.3.0.tar.gz
(5.5 kB
view details)
Built Distribution
File details
Details for the file dark-keeper-0.3.0.tar.gz
.
File metadata
- Download URL: dark-keeper-0.3.0.tar.gz
- Upload date:
- Size: 5.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b9fe5a42c8d0ced7731ca4ee3ba9ecbcc56a2aa2d0b7dd8a05dfa9971ef1349f |
|
MD5 | 05f846fb6aeeef2b86917284a2dfa52c |
|
BLAKE2b-256 | 2e517d1199aa522abd0dc2b2dce9755664952d68c72b136616e207f0f3872afc |
File details
Details for the file dark_keeper-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: dark_keeper-0.3.0-py3-none-any.whl
- Upload date:
- Size: 8.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c904aa0bbc632e33f56d6871b5916109beb044577b9c9ff33d376c2b6d751d8 |
|
MD5 | 5a87bbf32ef2cec37137956cb4e7bf34 |
|
BLAKE2b-256 | 20786e3b264eaeed2b5e67d07d2d3a48d388b3d285c3ba528c8a90207eeb0a66 |