Skip to main content

A spider library of several data sources

Project description

DataSpider

CircleCI

A spider framework with several internal spiders.

Install

pip install --upgrade tsingspider

Features

  • Light-weight: do not have to start browser simulator, won't cost lots of resources
    • But not all the website can download in this way
  • Lazy: won't download anything before you actually use the data
  • Useful Utilities
    • Support HLS download
    • Support cookies from firefox
    • Support Proxies
    • Generate magnet link from torrent data

Write Your Own Spider

To define a resource, you can use LazySoup or LazyContent. LazyContent is for binary data, basically all kinds of the data are binary. LazySoup is for the XML format resource, widely be used for downloading web-page.

For example:

from tsing_spider.util import LazySoup, LazyContent

class YourOwnSpider(LazySoup):
    def __init__(self, url:str):
        LazySoup.__init__(self, url)

    @property
    def some_info(self) -> str:
        """
        Extract information from self.soup
        the data will be downloaded at the first time of using it
        """
        pass

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tsingspider-1.4.1.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

tsingspider-1.4.1-py3-none-any.whl (36.7 kB view details)

Uploaded Python 3

File details

Details for the file tsingspider-1.4.1.tar.gz.

File metadata

  • Download URL: tsingspider-1.4.1.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.9.0

File hashes

Hashes for tsingspider-1.4.1.tar.gz
Algorithm Hash digest
SHA256 9555eb1903b5eb76fa05431575a90045d212e69966d37e7afb48c06233177de4
MD5 2f86defc423e6d5ac844d885bd4c3ad9
BLAKE2b-256 8b3a03aada4055c5fc47663800cadd0d5181708af76b72fb8b4ebc44c7883639

See more details on using hashes here.

File details

Details for the file tsingspider-1.4.1-py3-none-any.whl.

File metadata

  • Download URL: tsingspider-1.4.1-py3-none-any.whl
  • Upload date:
  • Size: 36.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.9.0

File hashes

Hashes for tsingspider-1.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f72dec3b8e21208b5b8affc3dc584c71db1b2c1df0f48e9e1106dd3ae9c604e7
MD5 eb0f48cb432674db6215189eb597c311
BLAKE2b-256 9b7090e9766b6bf1193d42c955589a5570905a1da5b928bf9df0f5ec357688e9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page