WEB Scraper for IT-Magazines.
Project description
itmagazines-webscraper
This libraly is a web scraper for web pages of following IT-Magazines.
Support magazine list
- 技術評論社
- Software Design
- WEB+DB PRESS
- CQ出版
- Interface
- トランジスタ技術
- 日経BP
- 日経ソフトウエア
- 日経Linux
Installaction
$ python -m pip install itmagazines-webscraper
Usage
Specify a magazine and execute
from pprint import pprint
from itmagazines_webscraper import ItMagazineType, scrape_magazine
magazines = scrape_magazine(ItMagazineType.SOFTWARE_DESIGN)
for magazine in magazines:
pprint(magazine.get_dict())
print(magazine.get_json())
Execute all
from pprint import pprint
from itmagazines_webscraper import scrape_magazines
magazines = scrape_magazines()
for magazine in magazines:
pprint(magazine.get_dict())
print(magazine.get_json())
Example: Retuned json data
{
"name": "日経Linux",
"number": "日経Linux20XX年X月号",
"price": "XXXX円",
"release_date": "20XX年X月X日",
"url": "https://info.nikkeibp.co.jp/media/LIN/",
"top_outlines": [
"【特集1】Linux学び直し",
"【特集2】Linux導入・活用法まで徹底紹介!"
],
"store_links": [
{
"name": "Amazon",
"link": "https://www.amazon.co.jp/dp/xxxxx"
},
{
"name": "Rakutenブックス",
"link": "https://books.rakuten.co.jp/rb/yyyyy/"
}
]
}
Data structure
Detail | Summary |
---|---|
name | Magazine name. |
number | Magazine name and volume number. |
price | Price. |
release_datee | Release date. |
url | URL of web page. |
top_outlines | Magazine outline list. |
store_links | Store link list. |
store_links
Detail | Summary |
---|---|
name | Store name. |
url | URL of store web page. |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Close
Hashes for itmagazines-webscraper-0.1.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88d398cb7cee823d5b15944e492622dc89d4c53634920bef2c1582cd2258d426 |
|
MD5 | 21f6c6c21922b375a8e28971b64c4dff |
|
BLAKE2b-256 | 1f156666aa00c35871f8e96d01432bdd0da97f59e008df0038f6a2121d79ba9e |