PaperCrawlerUtil

A small paper crawler

These details have not been verified by PyPI

Project links

Homepage

Project description

This project is an util package to create a crawler. It contains many tools which can finish part function. There is an example:

from PaperCrawlerUtil import util as u
import os
import time


for times in ["2019", "2020", "2021"]:
    if os.path.exists("CVPR_{}".format(times)):
        print("文件夹存在")
    else:
        os.makedirs("CVPR_{}".format(times))
    html = u.random_proxy_header_access("https://openaccess.thecvf.com/CVPR{}".format(times), require_proxy=False)
    attr_list = u.get_attribute_of_html(html, {'href': "in", 'CVPR': "in", "py": "in", "day": "in"})
    for ele in attr_list:
        path = ele.split("<a href=\"")[1].split("\">")[0]
        path = "https://openaccess.thecvf.com/" + path
        html = u.random_proxy_header_access(path)
        attr_list = u.get_attribute_of_html(html,
                                          {'href': "in", 'CVPR': "in", "content": "in", "papers": "in"})
        for eles in attr_list:
            pdf_path = eles.split("<a href=\"")[1].split("\">")[0]
            dir = os.path.abspath("CVPR_{}".format(times))
            work_path = os.path.join(dir, '{}.pdf').format(str(time.strftime("%H_%M_%S", time.localtime())))
            u.retrieve_file("https://openaccess.thecvf.com/" + pdf_path, work_path)

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.39

Feb 14, 2024

0.1.35

May 25, 2023

0.1.34

Apr 13, 2023

0.1.31

Feb 19, 2023

0.1.30

Feb 6, 2023

0.1.29

Jan 26, 2023

0.1.28

Jan 4, 2023

0.1.27

Nov 30, 2022

0.1.26

Nov 25, 2022

0.1.25

Nov 19, 2022

0.1.24

Nov 13, 2022

0.1.21

Oct 27, 2022

0.1.15

Oct 24, 2022

0.1.14

Oct 22, 2022

0.1.7

Oct 18, 2022

0.1.6

Oct 13, 2022

0.1.5

Oct 12, 2022

0.1.4

Oct 5, 2022

0.1.3

Oct 4, 2022

0.1.2

Oct 3, 2022

0.1.1

Oct 1, 2022

0.1.0

Sep 30, 2022

0.0.100

Sep 21, 2022

0.0.99

Sep 20, 2022

0.0.95

Sep 19, 2022

0.0.92

Sep 2, 2022

0.0.91

Aug 15, 2022

0.0.90

Aug 6, 2022

0.0.89

Aug 4, 2022

0.0.88

Aug 2, 2022

0.0.87

Aug 1, 2022

0.0.86

Jul 28, 2022

0.0.85

Jul 24, 2022

0.0.84

Jul 20, 2022

0.0.81

Jul 18, 2022

0.0.78

Jul 17, 2022

0.0.77

Jul 16, 2022

0.0.72

Jul 15, 2022

0.0.67

Jul 14, 2022

0.0.63

Jul 13, 2022

0.0.61

Jul 3, 2022

0.0.38

Jun 30, 2022

0.0.35

Jun 20, 2022

0.0.33

Jun 10, 2022

0.0.31

Jun 3, 2022

0.0.30

May 31, 2022

0.0.25

May 27, 2022

0.0.24

May 18, 2022

0.0.23

May 17, 2022

0.0.21

May 15, 2022

0.0.18

May 13, 2022

0.0.17

May 5, 2022

0.0.16

May 3, 2022

0.0.14

Apr 29, 2022

This version

0.0.12

Apr 24, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PaperCrawlerUtil-0.0.12.tar.gz (5.6 kB view hashes)

Uploaded Apr 24, 2022 Source

Built Distribution

PaperCrawlerUtil-0.0.12-py3-none-any.whl (5.9 kB view hashes)

Uploaded Apr 24, 2022 Python 3

Hashes for PaperCrawlerUtil-0.0.12.tar.gz

Hashes for PaperCrawlerUtil-0.0.12.tar.gz
Algorithm	Hash digest
SHA256	`6fb230d839761866331b59471fe176489efa6b98912a47bd12064e0514e1960c`
MD5	`0a4be83cf86ec4def9c6da0d7af6b181`
BLAKE2b-256	`359ba23ee046368d589a421e1a405fd6568f45768e73c7df13e97e19c2d095b8`

Hashes for PaperCrawlerUtil-0.0.12-py3-none-any.whl

Hashes for PaperCrawlerUtil-0.0.12-py3-none-any.whl
Algorithm	Hash digest
SHA256	`54d6d999e8f6d4564d10e52da00048aad6b0d3c9a8e3322bdc814d814ba9fc5f`
MD5	`34bbed21c8d8032c87301c5a81e3ec0f`
BLAKE2b-256	`2c5543eafe48d9d65a2c4c600ccb5942b8018869b3e49702323025afdeda943d`