Skip to main content

Dark Keeper is open source simple web-parser for podcast-sites

Project description

GitHub Actions CI

Dark Keeper

Dark Keeper is open source simple web-parser for podcast-sites. Also you can use it for any sites.
Goal idea: parsing full information per each podcast episodes like number, description and download link.

Features

  • simple web-spider walking on site
  • cache for all downloaded pages
  • parse any information from pages
  • export parsed data to MongoDB

Quick start

$ mkvirtualenv keeper
(keeper)$ pip install dark-keeper
(keeper)$ cat app.py

from dark_keeper import BaseParser, DarkKeeper
from dark_keeper.exports import ExportMongo
from dark_keeper.http import HttpClient
from dark_keeper.storages import UrlsStorage, DataStorage


class PodcastParser(BaseParser):
    def parse_urls(self, content):
        urls = content.parse_urls('.posts-list > .container-fluid .text-left a')

        return urls

    def parse_data(self, content):
        data = []
        for post_item in content.get_block_items('.posts-list .posts-list-item'):
            post_data = dict(
                title=post_item.parse_text('.number-title'),
                desc=post_item.parse_text('.post-podcast-content'),
                mp3=post_item.parse_attr('.post-podcast-content audio', 'src'),
            )

            if post_data['title'] and post_data['mp3']:
                data.append(post_data)

        return data


if __name__ == '__main__':
    pk = DarkKeeper(
        http_client=HttpClient(
            delay=2,
            user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                       'AppleWebKit/537.36 (KHTML, like Gecko) '
                       'Chrome/81.0.4044.138 Safari/537.36 OPR/68.0.3618.125',
        ),
        parser=PodcastParser(),
        urls_storage=UrlsStorage(base_url='https://radio-t.com/'),
        data_storage=DataStorage(),
        export_mongo=ExportMongo(mongo_uri='mongodb://localhost/podcasts.radio-t.com'),
    )
    pk.run()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dark-keeper-0.3.1.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

dark_keeper-0.3.1-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file dark-keeper-0.3.1.tar.gz.

File metadata

  • Download URL: dark-keeper-0.3.1.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2

File hashes

Hashes for dark-keeper-0.3.1.tar.gz
Algorithm Hash digest
SHA256 c887a886609b99e7ea86bd86b4d2f9d6fbeb7b9c49c01d8fdf83aad8fea34281
MD5 bb54014cce566e195a05ff993cf881b5
BLAKE2b-256 7d7e760b6dc7d2e0796b4a78287f7eb3428aac20f11edce8fe20665217141955

See more details on using hashes here.

File details

Details for the file dark_keeper-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: dark_keeper-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2

File hashes

Hashes for dark_keeper-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ccd1d1ab03af8cca90b79be748528b333c53007f55380300559ebb7e3cd076b6
MD5 9dcb9683b5942b7d6749e636f5538b91
BLAKE2b-256 6a3f89405f4a1a15da1ed6b72f0e4a5be544f46cee4fbb51536eee332d005cc6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page