Combine XPath, CSS Selectors and JSONPath for Web data extracting.
Project description
Combine XPath, CSS Selectors and JSONPath for Web data extracting.
Quickstarts
Installation
Install the stable version from PYPI.
pip install data-extractor
Or install the latest version from Github.
pip install git+https://github.com/linw1995/data_extractor.git@master
Usage
from data_extractor import Field, Item, JSONExtractor
class Count(Item):
followings = Field(JSONExtractor("countFollowings"))
fans = Field(JSONExtractor("countFans"))
class User(Item):
name_ = Field(JSONExtractor("name"), name="name")
age = Field(JSONExtractor("age"), default=17)
count = Count()
assert User(JSONExtractor("data.users[*]"), is_many=True).extract(
{
"data": {
"users": [
{
"name": "john",
"age": 19,
"countFollowings": 14,
"countFans": 212,
},
{
"name": "jack",
"description": "",
"countFollowings": 54,
"countFans": 312,
},
]
}
}
) == [
{"name": "john", "age": 19, "count": {"followings": 14, "fans": 212}},
{"name": "jack", "age": 17, "count": {"followings": 54, "fans": 312}},
]
Changelog
v0.5.0
0056f37 Split AbstractExtractor into AbstractSimpleExtractor and AbstractComplexExtractor
c42aeb5 Feature/more friendly development setup (#34)
2f9a71c New:Support testing in 3.8
c8bd593 New:Stash unstaged code before testing
d2a18a8 New:Best way to raise new exc
90fa9c8 New:ExprError __str__ implementation
d961768 Fix:Update mypy pre-commit config
e5d59c3 New:Raise SyntaxError when field overwrites method (#38)
7720fb9 Feature/avoid field overwriting (#39)
b722717 Dev,Fix:Black configure not working
f8f0df8 New:Implement extractors’ build method
98ada74 Chg:Update docs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for data_extractor-0.5.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad81c615d065f218aa21e6e332b5befcf301d50f6a259acf2b20f32624b1b671 |
|
MD5 | cd241fa11a309e7ed1af193091f2f17c |
|
BLAKE2b-256 | c1e3a012e74d9fe6403383ff1ccc9fddc3a8a09258c852f8e4caee1c46be1b43 |