Skip to main content

A collection of Crawlers

Project description

crawlers

介绍

爬虫集合

可获取的项目

  1. hugging face 上的模型文件

项目示例

1. hugging face

from pycrawlers import huggingface

urls = ['https://huggingface.co/albert-base-v2/tree/main',
        'https://huggingface.co/dmis-lab/biosyn-sapbert-bc5cdr-disease/tree/main']

paths = ['./model_1/albert-base-v2', './model_2/']
# 实例化类
# 使用默认 base_url (https://huggingface.co)
hg = huggingface()
# 自定义 base_uel
# hg = huggingface('https://huggingface.co')

# 1. 单个获取
# 1.1 使用默认保存位置('./')
hg.get_data(urls[0])

# 1.2 自定义保存地址
# hg.get_data(urls[0], paths[0])

# 2.批量获取
# 2.1 使用默认保存位置('./')
hg.get_batch_data(urls)

# 2.2 自定义保存地址
# hg.get_batch_data(urls, paths)

2. 通用抓取网页

可以抓取那些反爬不厉害的网站

from pycrawlers import website

mongo_host = ''
mongo_port = '27017'
db_name = 'huxiu'
id_collection_name = 'huxiu_id'
collection_name = 'huxiu'
base_url = 'https://www.huxiu.com'


website(mongo_host, mongo_port, db_name, id_collection_name, collection_name, base_url)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycrawlers-0.1.1.tar.gz (12.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page