Skip to main content

A spider library of several data sources

Project description

DataSpider

Upload Python Package

A spider framework with several internal spiders.

Thanks

Thanks JetBrains provided FREE PyCharm Professional for this project.

Install

pip install --upgrade tsingspider

Features

  • Light-weight: do not have to start browser simulator, won't cost lots of resources
    • But not all the website can download in this way
  • Lazy: won't download anything before you actually use the data
  • Useful Utilities
    • Support HLS download
    • Support cookies from firefox
    • Support Proxies
    • Generate magnet link from torrent data

Write Your Own Spider

To define a resource, you can use LazySoup or LazyContent. LazyContent is for binary data, basically all kinds of the data are binary. LazySoup is for the XML format resource, widely be used for downloading web-page.

For example:

from tsing_spider.util import LazySoup, LazyContent

class YourOwnSpider(LazySoup):
    def __init__(self, url:str):
        LazySoup.__init__(self, url)

    @property
    def some_info(self) -> str:
        """
        Extract information from self.soup
        the data will be downloaded at the first time of using it
        """
        pass

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tsingspider-1.4.9.tar.gz (31.8 kB view details)

Uploaded Source

Built Distribution

tsingspider-1.4.9-py3-none-any.whl (37.4 kB view details)

Uploaded Python 3

File details

Details for the file tsingspider-1.4.9.tar.gz.

File metadata

  • Download URL: tsingspider-1.4.9.tar.gz
  • Upload date:
  • Size: 31.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.0

File hashes

Hashes for tsingspider-1.4.9.tar.gz
Algorithm Hash digest
SHA256 fe7964aeff171c431c53068c1da38a440917ef8b7435ef797b586500a635bd9d
MD5 d600a214fdc35823ae7c7f61d57beb3e
BLAKE2b-256 a8d590b9e4e2d1d5887b39b5c72d1b1052c9446d9b7357c9a3d7f34eac98faab

See more details on using hashes here.

File details

Details for the file tsingspider-1.4.9-py3-none-any.whl.

File metadata

  • Download URL: tsingspider-1.4.9-py3-none-any.whl
  • Upload date:
  • Size: 37.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.0

File hashes

Hashes for tsingspider-1.4.9-py3-none-any.whl
Algorithm Hash digest
SHA256 273edfe9eca4644b60c1eae9e2587036ac819931dc8a3007215cd540731c70d8
MD5 f8a2ec55f4ecf5475f53e588f9e82d4a
BLAKE2b-256 70f474199daf82185aa19dda12225be12b3469e792ba27a378e800050f839ecc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page