Dark Keeper is open source simple web-parser for podcast-sites
Project description
Dark Keeper
Dark Keeper is open source simple web-parser for podcast-sites. Also you can use it for any sites.
Goal idea: parsing full information per each podcast episodes like number, description and download link.
Features
- simple web-spider walking on site
- cache for all downloaded pages
- parse any information from pages
- export parsed data to MongoDB
Quick start
$ mkvirtualenv keeper
(keeper)$ pip install dark-keeper
(keeper)$ cat app.py
from dark_keeper import BaseParser, DarkKeeper
from dark_keeper.exports import ExportMongo
from dark_keeper.http import HttpClient
from dark_keeper.storages import UrlsStorage, DataStorage
class PodcastParser(BaseParser):
def parse_urls(self, content):
urls = content.parse_urls('.posts-list > .container-fluid .text-left a')
return urls
def parse_data(self, content):
data = []
for post_item in content.get_block_items('.posts-list .posts-list-item'):
post_data = dict(
title=post_item.parse_text('.number-title'),
desc=post_item.parse_text('.post-podcast-content'),
mp3=post_item.parse_attr('.post-podcast-content audio', 'src'),
)
if post_data['title'] and post_data['mp3']:
data.append(post_data)
return data
if __name__ == '__main__':
pk = DarkKeeper(
http_client=HttpClient(
delay=2,
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
'AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/81.0.4044.138 Safari/537.36 OPR/68.0.3618.125',
),
parser=PodcastParser(),
urls_storage=UrlsStorage(base_url='https://radio-t.com/'),
data_storage=DataStorage(),
export_mongo=ExportMongo(mongo_uri='mongodb://localhost/podcasts.radio-t.com'),
)
pk.run()
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dark-keeper-0.3.1.tar.gz
(6.9 kB
view details)
Built Distribution
File details
Details for the file dark-keeper-0.3.1.tar.gz
.
File metadata
- Download URL: dark-keeper-0.3.1.tar.gz
- Upload date:
- Size: 6.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c887a886609b99e7ea86bd86b4d2f9d6fbeb7b9c49c01d8fdf83aad8fea34281 |
|
MD5 | bb54014cce566e195a05ff993cf881b5 |
|
BLAKE2b-256 | 7d7e760b6dc7d2e0796b4a78287f7eb3428aac20f11edce8fe20665217141955 |
File details
Details for the file dark_keeper-0.3.1-py3-none-any.whl
.
File metadata
- Download URL: dark_keeper-0.3.1-py3-none-any.whl
- Upload date:
- Size: 8.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ccd1d1ab03af8cca90b79be748528b333c53007f55380300559ebb7e3cd076b6 |
|
MD5 | 9dcb9683b5942b7d6749e636f5538b91 |
|
BLAKE2b-256 | 6a3f89405f4a1a15da1ed6b72f0e4a5be544f46cee4fbb51536eee332d005cc6 |