A spider library of several data sources
Project description
DataSpider
A spider framework with several internal spiders.
Thanks
Thanks JetBrains provided FREE PyCharm Professional for this project.
Install
pip install --upgrade tsingspider
Features
- Light-weight: do not have to start browser simulator, won't cost lots of resources
- But not all the website can download in this way
- Lazy: won't download anything before you actually use the data
- Useful Utilities
- Support HLS download
- Support cookies from firefox
- Support Proxies
- Generate magnet link from torrent data
Write Your Own Spider
To define a resource, you can use LazySoup
or LazyContent
.
LazyContent
is for binary data, basically all kinds of the data are binary.
LazySoup
is for the XML format resource, widely be used for downloading web-page.
For example:
from tsing_spider.util import LazySoup, LazyContent
class YourOwnSpider(LazySoup):
def __init__(self, url:str):
LazySoup.__init__(self, url)
@property
def some_info(self) -> str:
"""
Extract information from self.soup
the data will be downloaded at the first time of using it
"""
pass
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tsingspider-1.4.7.tar.gz
(31.7 kB
view details)
Built Distribution
File details
Details for the file tsingspider-1.4.7.tar.gz
.
File metadata
- Download URL: tsingspider-1.4.7.tar.gz
- Upload date:
- Size: 31.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8ebd16604035b73f77cec96fec5a89ab4fede0cabb82f110050f5fd516558cbb |
|
MD5 | 56b75448f2faa61fdefa8370292eb7da |
|
BLAKE2b-256 | 73f48e7da18e53a09819feb1abc8d5284875a84ac15f50c832d8973f6889793a |
File details
Details for the file tsingspider-1.4.7-py3-none-any.whl
.
File metadata
- Download URL: tsingspider-1.4.7-py3-none-any.whl
- Upload date:
- Size: 37.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6adaddfa24639a9637ff0d98649a5798d8e914fdcdbd93d4df7154eb335a3567 |
|
MD5 | 9d859c0c7d8be8057132916a7fe9ba5c |
|
BLAKE2b-256 | cf40cac872adda845a942f1d426cb8c9e6aa7ac73f721e0591100fa22d3519e1 |