A library for parsing jingdong pages.
Project description
Currently only support book pages. All fields use same encoding ‘utf-8’.
Category fields: * name - Category name * url - Category url * children - Subcategories.
Book list fields: * links - A list of links in {‘title’: ‘’, ‘url’: ‘’} format * next_page_uri- Next page uri
Book detail fields: * title - Book title * author - Book authors, delimited by comma * images - Image url list * detail - Detail key-value pairs
Content
jd_page_parser.category_parsers.BookCategoryParser
jd_page_parser.product_list_parsers.BookListParser
jd_page_parser.detail_parsers.BookDetailParser
Installation
The simplest way is to install it via pip:
pip install jd-page-parser
Run Test
pip install -r requirements-dev.txt
tox
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for jd_page_parser-0.0.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 75d5f0fb6e37a3681c9320e19056992189ba0844816f0a3defe46e222c6cd198 |
|
MD5 | 37b5e37605d84cb2dbea899837a5b550 |
|
BLAKE2b-256 | 6594a7822ac63ead685f43d28d10f711aa5f5597bb39c8b9797d7a5825e942f1 |