A spider library of several data sources
Project description
DataSpider
A spider framework with several internal spiders.
Thanks
Thanks JetBrains provided FREE PyCharm Professional for this project.
Install
pip install --upgrade tsingspider
Features
- Light-weight: do not have to start browser simulator, won't cost lots of resources
- But not all the website can download in this way
- Lazy: won't download anything before you actually use the data
- Useful Utilities
- Support HLS download
- Support cookies from firefox
- Support Proxies
- Generate magnet link from torrent data
Write Your Own Spider
To define a resource, you can use LazySoup or LazyContent.
LazyContent is for binary data, basically all kinds of the data are binary.
LazySoup is for the XML format resource, widely be used for downloading web-page.
For example:
from tsing_spider.util import LazySoup, LazyContent
class YourOwnSpider(LazySoup):
def __init__(self, url:str):
LazySoup.__init__(self, url)
@property
def some_info(self) -> str:
"""
Extract information from self.soup
the data will be downloaded at the first time of using it
"""
pass
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tsingspider-1.5.0-py3-none-any.whl.
File metadata
- Download URL: tsingspider-1.5.0-py3-none-any.whl
- Upload date:
- Size: 37.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76a0b2db75173319254f95b2f66546013cb4e297c793a8a3c815f4372225dd91
|
|
| MD5 |
c85f85e3b36377c55b03ba022a77bd59
|
|
| BLAKE2b-256 |
021e7b1120009cac41ef03a72b318ef3f437b57f16423aff1d3d98018ec26b19
|