Skip to main content

A spider library of several data sources

Project description

DataSpider

Upload Python Package

A spider framework with several internal spiders.

Thanks

Thanks JetBrains provided FREE PyCharm Professional for this project.

Install

pip install --upgrade tsingspider

Features

  • Light-weight: do not have to start browser simulator, won't cost lots of resources
    • But not all the website can download in this way
  • Lazy: won't download anything before you actually use the data
  • Useful Utilities
    • Support HLS download
    • Support cookies from firefox
    • Support Proxies
    • Generate magnet link from torrent data

Write Your Own Spider

To define a resource, you can use LazySoup or LazyContent. LazyContent is for binary data, basically all kinds of the data are binary. LazySoup is for the XML format resource, widely be used for downloading web-page.

For example:

from tsing_spider.util import LazySoup, LazyContent

class YourOwnSpider(LazySoup):
    def __init__(self, url:str):
        LazySoup.__init__(self, url)

    @property
    def some_info(self) -> str:
        """
        Extract information from self.soup
        the data will be downloaded at the first time of using it
        """
        pass

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tsingspider-1.5.0-py3-none-any.whl (37.5 kB view details)

Uploaded Python 3

File details

Details for the file tsingspider-1.5.0-py3-none-any.whl.

File metadata

  • Download URL: tsingspider-1.5.0-py3-none-any.whl
  • Upload date:
  • Size: 37.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for tsingspider-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 76a0b2db75173319254f95b2f66546013cb4e297c793a8a3c815f4372225dd91
MD5 c85f85e3b36377c55b03ba022a77bd59
BLAKE2b-256 021e7b1120009cac41ef03a72b318ef3f437b57f16423aff1d3d98018ec26b19

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page