Skip to main content

Dark Keeper is open source simple web-parser for podcast-sites

Project description

Build Status Coverage Status

Dark Keeper

Dark Keeper is open source simple web-parser for podcast-sites. Also you can use it for any sites.

Goal idea: parsing full information per each podcast episodes like number, description and download link.

Features

  • [x] simple web-spider walking on site

  • [x] cache for all downloaded pages

  • [x] parse any information from pages

  • [x] export parsed data to MongoDB

Quick start

$ mkvirtualenv keeper

(keeper)$ pip install dark-keeper

(keeper)$ cat app.py

from dark_keeper import BaseParser, DarkKeeper, HttpClient, UrlsStorage, DataStorage, ExportMongo


class PodcastParser(BaseParser):
    def parse_urls(self, content):
        urls = content.parse_urls('.posts-list > .container-fluid .text-left a')

        return urls

    def parse_data(self, content):
        data = []
        for post_item in content.get_block_items('.posts-list .posts-list-item'):
            post_data = dict(
                title=post_item.parse_text('.number-title'),
                desc=post_item.parse_text('.post-podcast-content'),
                mp3=post_item.parse_attr('.post-podcast-content audio', 'src'),
            )

            if post_data['title'] and post_data['mp3']:
                data.append(post_data)

        return data


if __name__ == '__main__':
    pk = DarkKeeper(
        http_client=HttpClient(
            delay=2,
            user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                       'AppleWebKit/537.36 (KHTML, like Gecko) '
                       'Chrome/81.0.4044.138 Safari/537.36 OPR/68.0.3618.125',
        ),
        parser=PodcastParser(),
        urls_storage=UrlsStorage(base_url='https://radio-t.com/'),
        data_storage=DataStorage(),
        export_mongo=ExportMongo(mongo_uri='mongodb://localhost/podcasts.radio-t.com'),
    )
    pk.run()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dark-keeper-0.3.0.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

dark_keeper-0.3.0-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file dark-keeper-0.3.0.tar.gz.

File metadata

  • Download URL: dark-keeper-0.3.0.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.7

File hashes

Hashes for dark-keeper-0.3.0.tar.gz
Algorithm Hash digest
SHA256 b9fe5a42c8d0ced7731ca4ee3ba9ecbcc56a2aa2d0b7dd8a05dfa9971ef1349f
MD5 05f846fb6aeeef2b86917284a2dfa52c
BLAKE2b-256 2e517d1199aa522abd0dc2b2dce9755664952d68c72b136616e207f0f3872afc

See more details on using hashes here.

File details

Details for the file dark_keeper-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: dark_keeper-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.7

File hashes

Hashes for dark_keeper-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9c904aa0bbc632e33f56d6871b5916109beb044577b9c9ff33d376c2b6d751d8
MD5 5a87bbf32ef2cec37137956cb4e7bf34
BLAKE2b-256 20786e3b264eaeed2b5e67d07d2d3a48d388b3d285c3ba528c8a90207eeb0a66

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page