Skip to main content

A spider library of several data sources

Project description

DataSpider

Upload Python Package

A spider framework with several internal spiders.

Install

pip install --upgrade tsingspider

Features

  • Light-weight: do not have to start browser simulator, won't cost lots of resources
    • But not all the website can download in this way
  • Lazy: won't download anything before you actually use the data
  • Useful Utilities
    • Support HLS download
    • Support cookies from firefox
    • Support Proxies
    • Generate magnet link from torrent data

Write Your Own Spider

To define a resource, you can use LazySoup or LazyContent. LazyContent is for binary data, basically all kinds of the data are binary. LazySoup is for the XML format resource, widely be used for downloading web-page.

For example:

from tsing_spider.util import LazySoup, LazyContent

class YourOwnSpider(LazySoup):
    def __init__(self, url:str):
        LazySoup.__init__(self, url)

    @property
    def some_info(self) -> str:
        """
        Extract information from self.soup
        the data will be downloaded at the first time of using it
        """
        pass

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tsingspider-1.4.3.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

tsingspider-1.4.3-py3-none-any.whl (37.0 kB view details)

Uploaded Python 3

File details

Details for the file tsingspider-1.4.3.tar.gz.

File metadata

  • Download URL: tsingspider-1.4.3.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for tsingspider-1.4.3.tar.gz
Algorithm Hash digest
SHA256 9fad16cbd37bbf11fbe5f1ff3b0cf9e069a8dbcd63309c72d8ca3721f65a16be
MD5 4231e882e5910be6d399493b69a4335c
BLAKE2b-256 fa061d01d4571d68546b7653a4779d225314e59b0aeafa5dee22dde2a6d908c8

See more details on using hashes here.

File details

Details for the file tsingspider-1.4.3-py3-none-any.whl.

File metadata

  • Download URL: tsingspider-1.4.3-py3-none-any.whl
  • Upload date:
  • Size: 37.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for tsingspider-1.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3e1af5335410f476c5acd41be3ca74d614ac6871d65fc48af552914b17571ed9
MD5 f3516f33e218bf828124696e5d08acab
BLAKE2b-256 48a4f73b52d44d3564f30525ae53b52b8db432e81615f2d0c05c1710412f8ca9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page