A toolkit for quickly performing crawler functions
Project description
Quick Crawler
A toolkit for quickly performing crawler functions
Installation
pip install quick-crawler
Functions
- get a html page and can save the file if the file path is assigned.
- get a json object from html string
- get or download a series of url with similar format, like a page list
- remove unicode str
- get json object online
- read a series of obj from a json list online
- quick save csv file from a list of json objects
- quick read csv file to a list of fields
- quick download a file
Let Codes Speak
Example 1:
from quick_crawler.page import *
if __name__=="__main__":
# get a html page and can save the file if the file path is assigned.
url="https://learnersdictionary.com/3000-words/alpha/a"
html_str=quick_html_page(url)
print(html_str)
# get a json object from html string
html_obj=quick_html_object(html_str)
word_list=html_obj.find("ul",{"class":"a_words"}).findAll("li")
print("word list: ")
for word in word_list:
print(word.find("a").text.replace(" ","").strip())
# get or download a series of url with similar format, like a page list
url_range="https://learnersdictionary.com/3000-words/alpha/a/{pi}"
list_html_str=quick_html_page_range(url_range,min_page=1,max_page=10)
for idx,html in enumerate(list_html_str):
html_obj = quick_html_object(html)
word_list = html_obj.find("ul", {"class": "a_words"}).findAll("li")
list_w=[]
for word in word_list:
list_w.append(word.find("a").text.replace(" ", "").strip())
print(f"Page {idx+1}: ", ','.join(list_w))
Example 2:
from quick_crawler.page import *
if __name__=="__main__":
# remove unicode str
u_str = 'aà\xb9'
u_str_removed = quick_remove_unicode(u_str)
print("Removed str: ", u_str_removed)
# get json object online
json_url="http://soundcloud.com/oembed?url=http%3A//soundcloud.com/forss/flickermood&format=json"
json_obj=quick_json_obj(json_url)
print(json_obj)
for k in json_obj:
print(k,json_obj[k])
# read a series of obj from a json list online
json_list_url = "https://jsonplaceholder.typicode.com/posts"
json_list = quick_json_obj(json_list_url)
print(json_list)
for obj in json_list:
userId = obj["userId"]
title = obj["title"]
body = obj["body"]
print(obj)
# quick save csv file from a list of json objects
quick_save_csv("news_list.csv",['userId','id','title','body'],json_list)
# quick read csv file to a list of fields
list_result=quick_read_csv("news_list.csv",fields=['userId','title'])
print(list_result)
# quick download a file
quick_download_file("https://www.englishclub.com/images/english-club-C90.png",save_file_path="logo.png")
License
The quick-crawler
project is provided by Donghua Chen.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
quick-crawler-0.0.3a0.tar.gz
(18.3 kB
view hashes)
Built Distribution
Close
Hashes for quick_crawler-0.0.3a0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d5f0b7dbeebba277426cf414be30fcbd3468f51525553e48836dad3f21918209 |
|
MD5 | 65a3d94d9c13d6c3fcd90ce3ac13e232 |
|
BLAKE2b-256 | a89689de2055e8a0ad3f708e3b3e8f41653953485c0e7a30b729147dbcaff57b |