Combine XPath, CSS Selectors and JSONPath for Web data extracting.
Project description
Combine XPath, CSS Selectors and JSONPath for Web data extracting.
Quickstarts
Installation
Install the stable version from PYPI.
pip install "data-extractor[jsonpath-extractor]" # for extracting JSON data
pip install "data-extractor[lxml]" # for extracting HTML data
Or install the latest version from Github.
pip install "data-extractor[jsonpath-extractor] @ git+https://github.com/linw1995/data_extractor.git@master"
Extract JSON data
Currently supports to extract JSON data with below optional dependencies
install one dependency of them to extract JSON data.
Extract HTML(XML) data
Currently supports to extract HTML(XML) data with below optional dependencies
cssselect for using CSS-Selectors
Usage
from data_extractor import Field, Item, JSONExtractor
class Count(Item):
followings = Field(JSONExtractor("countFollowings"))
fans = Field(JSONExtractor("countFans"))
class User(Item):
name_ = Field(JSONExtractor("name"), name="name")
age = Field(JSONExtractor("age"), default=17)
count = Count()
assert User(JSONExtractor("data.users[*]"), is_many=True).extract(
{
"data": {
"users": [
{
"name": "john",
"age": 19,
"countFollowings": 14,
"countFans": 212,
},
{
"name": "jack",
"description": "",
"countFollowings": 54,
"countFans": 312,
},
]
}
}
) == [
{"name": "john", "age": 19, "count": {"followings": 14, "fans": 212}},
{"name": "jack", "age": 17, "count": {"followings": 54, "fans": 312}},
]
Changelog
v0.9.0
Fix
type annotations #63 #64
Refactor
.utils.Property with “Customized names” support #64
rename .abc to .core and mark elder duplciated #65
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
data_extractor-0.9.0.tar.gz
(13.1 kB
view hashes)
Built Distribution
Close
Hashes for data_extractor-0.9.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 670ea7c426e8f23abd65ca6fee8298dc9a029501ed034fb844cce25528763952 |
|
MD5 | 558205e43287d6493eccea3b5f597fe4 |
|
BLAKE2b-256 | 32ec0b2a79750daf85e3df36ae283b82b8f3fc523033a4367f33e6ab067eea12 |