Dark Keeper is open source simple web-parser for podcast-sites
Project description
Dark Keeper
Dark Keeper is open source simple web-parser for podcast-sites. Also you can use it for any sites.
Goal idea: parsing full information per each podcast episodes like number, description and download link.
Features
- simple web-spider walking on site
- cache for all downloaded pages
- parse any information from pages
- export parsed data to MongoDB
Quick start
$ mkvirtualenv keeper
(keeper)$ pip install dark-keeper
(keeper)$ cat app.py
from dark_keeper import BaseParser, DarkKeeper
from dark_keeper.exports import ExportMongo
from dark_keeper.http import HttpClient
from dark_keeper.storages import UrlsStorage, DataStorage
class PodcastParser(BaseParser):
def parse_urls(self, content):
urls = content.parse_urls('.posts-list > .container-fluid .text-left a')
return urls
def parse_data(self, content):
data = []
for post_item in content.get_block_items('.posts-list .posts-list-item'):
post_data = dict(
title=post_item.parse_text('.number-title'),
desc=post_item.parse_text('.post-podcast-content'),
mp3=post_item.parse_attr('.post-podcast-content audio', 'src'),
)
if post_data['title'] and post_data['mp3']:
data.append(post_data)
return data
if __name__ == '__main__':
pk = DarkKeeper(
http_client=HttpClient(
delay=2,
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
'AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/81.0.4044.138 Safari/537.36 OPR/68.0.3618.125',
),
parser=PodcastParser(),
urls_storage=UrlsStorage(base_url='https://radio-t.com/'),
data_storage=DataStorage(),
export_mongo=ExportMongo(mongo_uri='mongodb://localhost/podcasts.radio-t.com'),
)
pk.run()
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dark-keeper-0.3.1.tar.gz.
File metadata
- Download URL: dark-keeper-0.3.1.tar.gz
- Upload date:
- Size: 6.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c887a886609b99e7ea86bd86b4d2f9d6fbeb7b9c49c01d8fdf83aad8fea34281
|
|
| MD5 |
bb54014cce566e195a05ff993cf881b5
|
|
| BLAKE2b-256 |
7d7e760b6dc7d2e0796b4a78287f7eb3428aac20f11edce8fe20665217141955
|
File details
Details for the file dark_keeper-0.3.1-py3-none-any.whl.
File metadata
- Download URL: dark_keeper-0.3.1-py3-none-any.whl
- Upload date:
- Size: 8.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ccd1d1ab03af8cca90b79be748528b333c53007f55380300559ebb7e3cd076b6
|
|
| MD5 |
9dcb9683b5942b7d6749e636f5538b91
|
|
| BLAKE2b-256 |
6a3f89405f4a1a15da1ed6b72f0e4a5be544f46cee4fbb51536eee332d005cc6
|