A collection of Crawlers
Project description
crawlers
介绍
爬虫集合
可获取的项目
- hugging face 上的模型文件
项目示例
1. hugging face
from pycrawlers import huggingface
urls = ['https://huggingface.co/albert-base-v2/tree/main',
'https://huggingface.co/dmis-lab/biosyn-sapbert-bc5cdr-disease/tree/main']
paths = ['./model_1/albert-base-v2', './model_2/']
# 实例化类
# 使用默认 base_url (https://huggingface.co)
hg = huggingface()
# 自定义 base_uel
# hg = huggingface('https://huggingface.co')
# 1. 单个获取
# 1.1 使用默认保存位置('./')
hg.get_data(urls[0])
# 1.2 自定义保存地址
# hg.get_data(urls[0], paths[0])
# 2.批量获取
# 2.1 使用默认保存位置('./')
hg.get_batch_data(urls)
# 2.2 自定义保存地址
# hg.get_batch_data(urls, paths)
2. 通用抓取网页
可以抓取那些反爬不厉害的网站
from pycrawlers import website
mongo_host = ''
mongo_port = '27017'
db_name = 'huxiu'
id_collection_name = 'huxiu_id'
collection_name = 'huxiu'
base_url = 'https://www.huxiu.com'
website(mongo_host, mongo_port, db_name, id_collection_name, collection_name, base_url)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pycrawlers-0.1.1.tar.gz
(12.0 kB
view hashes)