Combine XPath, CSS Selectors and JSONPath for Web data extracting.
Project description
Combine XPath, CSS Selectors and JSONPath for Web data extracting.
Quickstarts
Installation
Install the stable version from PYPI.
pip install data-extractor
Or install the latest version from Github.
pip install git+https://github.com/linw1995/data_extractor.git@master
Usage
from data_extractor import Field, Item, JSONExtractor
class Count(Item):
followings = Field(JSONExtractor("countFollowings"))
fans = Field(JSONExtractor("countFans"))
class User(Item):
name_ = Field(JSONExtractor("name"), name="name")
age = Field(JSONExtractor("age"), default=17)
count = Count()
assert User(JSONExtractor("data.users[*]"), is_many=True).extract(
{
"data": {
"users": [
{
"name": "john",
"age": 19,
"countFollowings": 14,
"countFans": 212,
},
{
"name": "jack",
"description": "",
"countFollowings": 54,
"countFans": 312,
},
]
}
}
) == [
{"name": "john", "age": 19, "count": {"followings": 14, "fans": 212}},
{"name": "jack", "age": 17, "count": {"followings": 54, "fans": 312}},
]
Changelog
v0.6.0.dev1
2459f7d Dev,New:Add Github Actions for CI
a151a91 Dev,New:Add scripts/export_requirements_txt.sh
f7cdaa3 Dev,Chg:Remove travis-ci
f1d21fe New:Make different implementations of JSONExtractor optional
9f74619 Fix:Use __getattr__ on the module in the wrong way
25a8bf8 Dev,Fix:Cannot use pytest.mark.usefixtures() in pytest.param
8f51603 Dev,Chg:Upgrade poetry version in Makefile
21aa08e Dev,Chg:Test in two ways
4cb4678 Chg:Upgrade dependencies
4177b98 Dev,Fix:remove the venv before pretest installation
0175cde New:Add jsonpath-extractor as opitional json extractor backend
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for data_extractor-0.6.0.dev1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | acc1a8695482b853e41fea75eb1b14c31a3284d02f3b8e244e0cdf92e8b87d58 |
|
MD5 | f47d2fce1cf9de8e40cf4f23f9051bf5 |
|
BLAKE2b-256 | b80be25d3bc80b9889e96a37b260fe979dae6c677a2d59caea84c06d499838f9 |
Hashes for data_extractor-0.6.0.dev1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a47583226045dc6bf10d9efc8ab722fd353c1c36fdfedd2af6308f0db124059f |
|
MD5 | 9744b63227c310df7713065d7122d4a5 |
|
BLAKE2b-256 | 26e313ccb6968732c4790f6f3ead139e3e2f45e6b5318ed98d2325b87001af09 |